crawler

package

v0.0.0-...-24e6800 Latest Latest Go to latest Published: Jul 15, 2022 License: AGPL-3.0 Imports: 20 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

git.sr.ht/~sircmpwn/searchhut

Links

Open Source Insights

Documentation ¶

Index ¶

type Crawler
- func NewCrawler(ua string, db *sql.DB, domain string) *Crawler
type Metadata

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Crawler ¶

type Crawler struct {
	Client        *http.Client
	Domain        string
	DomainID      int
	Authoritative bool
	Exclude       []*regexp.Regexp
	Delay         time.Duration
	RetryAfter    time.Duration
	Robots        *robotstxt.Group
	UserAgent     string
	Start         time.Time
	// contains filtered or unexported fields
}

func NewCrawler ¶

func NewCrawler(ua string, db *sql.DB, domain string) *Crawler

func (*Crawler) Crawl ¶

func (c *Crawler) Crawl()

func (*Crawler) Get ¶

func (c *Crawler) Get(ctx context.Context, url *url.URL) (*http.Response, error)

func (*Crawler) Head ¶

func (c *Crawler) Head(ctx context.Context, url *url.URL) (*http.Response, error)

func (*Crawler) Index ¶

func (c *Crawler) Index(ctx context.Context, url *url.URL) error

func (*Crawler) Schedule ¶

func (c *Crawler) Schedule(url *url.URL)

func (*Crawler) ScheduleLinks ¶

func (c *Crawler) ScheduleLinks(from *url.URL, node *html.Node)

type Metadata ¶

type Metadata struct {
	Title       *string
	Robots      []string
	Author      *string
	Description *string
	Canonical   *url.URL
	JavaScript  bool
}

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL