db

package
v0.0.0-...-92ec744 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 9, 2015 License: MIT Imports: 4 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Connection

type Connection interface {
	// Create the job in the database.
	CreateJob(string) error
	// Processing increments the counter of currently processing urls for a given job.
	Processing(string) error
	// Done increments the counter of done urls
	// and decrements the counter of processing urls for a given job.
	Done(string) error
	// Save adds an image source to the set of images for a given job.
	Save(string, string) error
	// Status returns the processing and done counters of a given job.
	Status(string) (*Info, error)
	// Results returns the processed images for a given job.
	Results(string) ([][]byte, error)
	// ViewPage decides whether a page needs to be crawled or not.
	// One url should only be crawled once by a given job,
	// but it depends on the guarantees that the storage provides.
	ViewPage(string, string) (bool, error)
}

Connection is an interface that defines how data is saved and retrieved from a storage.

func NewMapConn

func NewMapConn() (Connection, error)

NewMapConn creates a new map connection.

func NewRiakConn

func NewRiakConn(conn *riak.Client) (Connection, error)

NewRiakConn creates a new new instance of the database to talk with Riak. It assumes that the client has already been initialized by the application context.

type Info

type Info struct {
	Processing int64
	Done       int64
	// contains filtered or unexported fields
}

Info stores information about a specific job.

func (*Info) PageViews

func (i *Info) PageViews() Pages

PageViews returns the urls found by a specific job in descending order by the number or occurrences.

type MapConn

type MapConn struct {
	// contains filtered or unexported fields
}

MapConn implements the Connection interface using memory maps as backends. This interface is only suitable for testing. It offers no guarantees about the elements saved in it and it is not thread safe.

func (*MapConn) CreateJob

func (c *MapConn) CreateJob(jobUUID string) error

CreateJob is NOOP in the map storage.

func (*MapConn) Done

func (c *MapConn) Done(jobUUID string) error

Done increments the counter of urls processed and decrements the counter of urls currently processing.

func (*MapConn) Processing

func (c *MapConn) Processing(jobUUID string) error

Processing increments the counter of current urls processing.

func (*MapConn) Results

func (c *MapConn) Results(jobUUID string) ([][]byte, error)

Results returns the list of images crawled by a specific job.

func (*MapConn) Save

func (c *MapConn) Save(jobUUID string, src string) error

Save stores new images found by a job in the database.

func (*MapConn) Status

func (c *MapConn) Status(jobUUID string) (*Info, error)

Status gives you information about the current job. It returns the currently processing urls and the urls already processed. It also returns the urls detected by the job.

func (*MapConn) ViewPage

func (c *MapConn) ViewPage(jobUUID string, url string) (bool, error)

ViewPage decides whether a url needs to be visited or not. It assumed that you don't want to crawl the same url more than once in the current job.

type Page

type Page struct {
	URL  string
	Hits int64
}

Page represents a visited url. It stores how many times a job has seen the page.

type Pages

type Pages []Page

Pages is a sortable collection of pages.

func (Pages) Len

func (p Pages) Len() int

func (Pages) Less

func (p Pages) Less(i, j int) bool

func (Pages) Swap

func (p Pages) Swap(i, j int)

type RiakConn

type RiakConn struct {
	// contains filtered or unexported fields
}

RiakConn implements the Connection interface using Riak as a backend. This is the prefered interface to use when running in a distributed environment.

func (RiakConn) CreateJob

func (d RiakConn) CreateJob(jobUUID string) error

CreateJob initializes the job map in the Riak cluster. This operation must be performed before any crawling starts to guarantee that the process stores the data properly.

func (RiakConn) Done

func (d RiakConn) Done(jobUUID string) error

Done increments one element the counter of done urls and decrements the counter of processing urls for a given job.

func (RiakConn) Processing

func (d RiakConn) Processing(jobUUID string) error

Processing increments the counter of currently processing urls for a given job.

func (RiakConn) Results

func (d RiakConn) Results(jobUUID string) ([][]byte, error)

Results returns the processed images for a given job.

func (RiakConn) Save

func (d RiakConn) Save(jobUUID, src string) error

Save adds an image source to the set of images for a given job.

func (RiakConn) Status

func (d RiakConn) Status(jobUUID string) (*Info, error)

Status returns the processing and done counters of a given job.

func (RiakConn) ViewPage

func (d RiakConn) ViewPage(jobUUID string, url string) (bool, error)

ViewPage decides whether a url needs to be visited or not. It assumed that you don't want to crawl the same url more than once in the current job.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL