scraper

package
v0.1.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 14, 2023 License: MIT Imports: 11 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Links struct {
	Href string
}

Links model

type Page

type Page struct {
	URL       string
	Canonical string
	Links     []Links
	NoIndex   bool
	HTML      string
}

Page model

type Scraper

type Scraper struct {
	// Original domain
	OldDomain string

	// New domain to rewrite the download HTML sites
	NewDomain string

	// Root domain
	Root string

	// Path where to save the downloads
	Path string

	// Use args on URLs
	UseQueries bool
}

func (*Scraper) GetInsideAttachments

func (s *Scraper) GetInsideAttachments(url string) (attachments []string)

GetInsideAttachments gets inside CSS Files

func (*Scraper) GetPath

func (s *Scraper) GetPath(link string) string

GetPath returns only the path, without domain, from the given link

func (*Scraper) IsLinkScanned

func (s *Scraper) IsLinkScanned(link string, scanned []string) (exists bool)

IsLinkScanned checks if a link has already been scanned

func (*Scraper) IsURLInSlice

func (s *Scraper) IsURLInSlice(search string, array []string) bool

IsURLInSlice checks if a URL is in a slice

func (*Scraper) IsValidExtension

func (s *Scraper) IsValidExtension(link string) bool

IsValidExtension check if an extension is valid

func (*Scraper) SaveAttachment

func (s *Scraper) SaveAttachment(url string) (err error)

Download a single link

func (*Scraper) SaveHTML

func (s *Scraper) SaveHTML(url string, html string) (err error)

Download a single link

func (s *Scraper) TakeLinks(
	toScan string,
	started chan int,
	finished chan int,
	scanning chan int,
	newLinks chan []Links,
	pages chan Page,
	attachments chan []string,
)

TakeLinks take links from the given site

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL