crawler

package module
v0.0.0-...-c18a360 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 12, 2022 License: GPL-3.0 Imports: 8 Imported by: 0

README

crawler

A simple web crawler, implemented in Go

Description

Implementation of a simple concurrent web crawler using Go. Given a starting URL, the crawler visits each URL it found on the same domain. It prints each visited URL, and a list of links found on that page. The crawler is limited to one subdomain. So when it starts with https://www.github.com/, it crawls all pages within github.com, but not follows external links, for example to youtube.com or docs.github.com.

Example of usage:

package main

import (
	"os"
	"runtime"

	"github.com/vyeve/crawler"
)

func init() {
	runtime.GOMAXPROCS(runtime.NumCPU())
}

func main() {
	siteName := "https://github.com/"

	st, err := crawler.New(siteName)
	if err != nil {
		panic(err)
	}
	st.Crawl().Print(os.Stdout)
}

Test coverage

To see test coverage run:

go test -v -tags=unit -coverprofile=cover.out && go tool cover -func=cover.out >> test.out && go tool cover -html=cover.out

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Crawler

type Crawler interface {
	Crawl() Printer
}

Crawler scans link and returns Printer

func New

func New(initLink string) (Crawler, error)

New initialize Crawler

type Printer

type Printer interface {
	Print(io.Writer) error
}

Printer represents scanned links to io.Writer

Directories

Path Synopsis
Package mocks is a generated GoMock package.
Package mocks is a generated GoMock package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL