easycrawl

package module
v0.0.0-...-c9022f1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 2, 2019 License: Apache-2.0 Imports: 6 Imported by: 0

README

easy-crawl

Go GoDoc Go Report Card Build Status apache licensed

easy-crawl is library for crawling smoothly and set callback method easily.

Installation

To install easy-crawl package, you need to install Go and set your Go workspace first.

  1. Download and install it:
$ go get -u github.com/kcwebapply/easy-crawl
  1. Import it in your code:
import "github.com/kcwebapply/easy-crawl"

Usage

crawling
func main() {

  // initialize Easycrawler{} with crawling depth.
  crawler := easyCrawl.EasyCrawler{Depth: 3}


  // you should implements CallBackInterface and set it in SetCallBack() method.
  crawler.SetCallBack(CallBackImpl{})


  // you can monitor how crawling is being done by call SetLogging() and set `true`.
  crawler.SetLogging(true)

  // crawling!
  crawler.Crawl("http://spring-boot-reference.jp/")
}


callback interface

you should implements CallBackInterface to set callback method.

  • url is content's base-url
  • urls is list of url in html body (extracted from href tag).
  • body is html body of content
type CallBackInterface interface {
	Callback(url string, urls []string, body string)
}

Here is the example of implementing of CallBackInterface

type CallBackImpl struct {
}

func (callbackImpl CallBackImpl) Callback(url string, urls []string, body string) {
   // implements as you like .
}

Demo

you can try crawling running this go file.

go run example/crawl.go

sample-demo

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type CallBackInterface

type CallBackInterface interface {
	Callback(url string, urls []string, body string)
}

CallBackInterface is interface for setting callback method as user like.

type Content

type Content struct {
	URL  string
	Urls []string
	Body string
}

Content is struct of html contents

type EasyCrawler

type EasyCrawler struct {
	Depth int
	// contains filtered or unexported fields
}

EasyCrawler is struct for crawling web pages

func (*EasyCrawler) Crawl

func (crawler *EasyCrawler) Crawl(u string) error

Crawl method

func (*EasyCrawler) SetCallBack

func (crawler *EasyCrawler) SetCallBack(callBackInterface CallBackInterface)

SetCallBack method is method for set callback method which will be called when contents is acquired

func (*EasyCrawler) SetLogging

func (crawler *EasyCrawler) SetLogging(enabled bool)

SetLogging method is for setting printing log or not

type Logging

type Logging struct {
	// contains filtered or unexported fields
}

Logging struct implements logging methods.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL