crawler

package module
v0.0.0-...-2c4d4a4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2020 License: MIT Imports: 8 Imported by: 0

README

go-crawler

Will crawl through the website, and scrape all endpoints, paths, hashtags, etc.

Installation

Installation is done using go get.

go get -u github.com/ericz99/go-crawler

Example

package example

import (
	crawler "github.com/ericz99/go-crawler"
)

func main() {
	// # create a crawler instance
	spider := crawler.Crawler{}
	// # crawl the page
	result, domain := spider.Crawl("https://kith.com/")
	// # download result
	spider.Download(result, domain)
}

Todo

  • Find all links base on regex, instead of relying on goquery

License

This project is licensed under the MIT License - see the LICENSE file for details

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Get

func Get(startURL string) (*http.Response, error)

Get - REQUEST METHOD

func GetAllLink(tag string, doc *goquery.Document, c chan []ScrapeResult)

GetAllLink - Method

func GetDomain

func GetDomain(startURL string) string

GetDomain - Method | Returns domain of the link

Types

type Crawler

type Crawler struct{}

Crawler struct (MODEL)

func (Crawler) Crawl

func (c Crawler) Crawl(startURL string) ([]ScrapeResult, string)

Crawl - METHOD

func (Crawler) Download

func (c Crawler) Download(data []ScrapeResult, domain string)

Download - Method | download the data to file system directory

type ScrapeResult

type ScrapeResult struct {
	Link string `json:"link"`
}

ScrapeResult struct (MODEL)

func CrawlPage

func CrawlPage(startURL string) []ScrapeResult

CrawlPage - Method | scrape all urls, paths, endpoints on the page

func ExtractLink(doc *goquery.Document) []ScrapeResult

ExtractLink - Method | return all links in the current page. Including path, link, etc

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL