storage

package
v0.0.0-...-cd6da7f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2021 License: MIT Imports: 5 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Client

type Client struct {
	// contains filtered or unexported fields
}

Client for communicating with the storage service. Provides a way to Create jobs, update jobs, and manipulate URL entries

func NewClient

func NewClient(cfg ClientConfig) (*Client, error)

Creates a new instance of the storage client. returning a client instance to perform operations with. The client is safe across multiple go routines.

func (*Client) Close

func (c *Client) Close() error

Close the Storage when it is no longer in use. No more requests via this client should be made after the storage connection has been closed.

func (*Client) JobClient

func (c *Client) JobClient() *JobClient

Return an Job which can be used to perform queries and manipulation of job data stored in storage.

func (*Client) URLClient

func (c *Client) URLClient() *URLClient

Return an URL which can be used to perform queries and manipulation of URL data stored in storage.

type ClientConfig

type ClientConfig struct {
	// User name the storage will connect as
	User string `json:"user"`
	// Password for the user
	Pass string `json:"pass"`
	// Database name to connect to
	DBName string `json:"dbname"`
	// Host of the storage service
	Host string `json:"host"`
	// Port of the host for the storage service
	Port int `json:"port"`
	// If SSL mode will be enabled/disabled
	SSLMode bool `json:"sslmode"`
}

Configuration for the storage connection info

func (ClientConfig) String

func (c ClientConfig) String() string

Converts the configuration into a string for the sql.Open's connInfo parameter

type Job

type Job struct {
	// ID (primary key) of the job
	Id common.JobId

	// List of URLs belonging to this job. Includes their
	// competition status.
	URLs []JobURL

	// The time stamp the Job was created on.
	CreatedOn time.Time
}

Job Entry for the 'job' record. The Job also includes the URLs that were specified as tasks of a Job.

func (*Job) Status

func (j *Job) Status() *common.JobStatus

Returns the status of the job. The status includes the progress of completed vs pending, and total elapsed time.

type JobClient

type JobClient struct {
	// contains filtered or unexported fields
}

Provides a name spaced collection of Job based storage operations. JobClient does not hold non go-routine state, and is safe to share across multiples.

func (*JobClient) CreateJobFromURLs

func (j *JobClient) CreateJobFromURLs(urls []string) (*Job, error)

Create a new job entry with its URLS, returning a pointer to the newly created Job.

func (*JobClient) GetJob

func (j *JobClient) GetJob(id common.JobId) (*Job, error)

Searches for a job, and returns it and its URLs if the job exist. Nil is return if the job does not exist

func (*JobClient) JobExists

func (j *JobClient) JobExists(id common.JobId) (bool, error)

Returns if the Job id matches an existing job.

func (*JobClient) Result

func (j *JobClient) Result(id common.JobId, mimeFilter string) (common.JobResults, error)

Queries the result URLs for a job by id, and generates the JobResult object. Results will be grouped in list under the refer URL which those result URLs were found from. Duplicate results under the same refer URL will be removed, and not included in the JobResults returned.

type JobURL

type JobURL struct {
	// Job URL that was requested
	URLId common.URLId

	// URL for this job item
	URL string

	// If this Job URL has been completely crawled
	Completed bool

	// The time stamp the URL was finished crawling. Only valid if 'Completed'
	// is also set.
	CompletedOn time.Time

	// The JobId this URL belongs to.
	JobId common.JobId
}

Job URL entry for the _'job_url' table. The CompletedOn value will only be valid if the 'Completed' flag is true.

type URL

type URL struct {
	// ID (primary key) of this entry
	Id common.URLId

	// This URL the record is for
	URL string

	// The URL which this URL record was found on
	Refer string

	// The Content type of the URL, e.g: text/html
	Mime string

	// If the URL has been crawled the Crawled flag
	// will be true, This includes if it was crawled
	// but found no descendants. The crawledOn field
	// is only valid if this field is true
	Crawled bool

	// The time stamp the URL entry was created.
	CrawledOn time.Time
}

Definition of a 'url' table record. A URL is defined as a URL + Refer URL that the URL was found on.

type URLClient

type URLClient struct {
	// contains filtered or unexported fields
}

Provides a name spaced collection of URL based storage operations. JURLClient does not hold non go-routine state, and is safe to share across multiples.

func (*URLClient) Add

func (u *URLClient) Add(url, mime string) (*URL, error)

Adds a new URL to the database returning a URL object for it. If no mime is known us common.DefaultMime in its place.

func (u *URLClient) AddLink(urlId, referId common.URLId) error

Attempts to insert a link between a refer and URL into the storage. If the link already exists, the insert statement will be ignored.

func (*URLClient) AddPending

func (u *URLClient) AddPending(jobId common.JobId, urlId, originId common.URLId) error

Adds the URL as pending under a origin URL and job Id. If the record already exists the insert statement will be ignored.

func (*URLClient) AddResult

func (u *URLClient) AddResult(jobId common.JobId, referId, urlId common.URLId) error

Records a new crawled URL into the job results, for a specific jobId. If the result record already exists, the insert statement will be ignored.

func (*URLClient) AddURLsToResults

func (u *URLClient) AddURLsToResults(jobId common.JobId, referId common.URLId, urls []*URL) error

Adds a batch of URLs to the job results. Will update the job result for each job Id provided

func (*URLClient) DeletePending

func (u *URLClient) DeletePending(jobId common.JobId, urlId, originId common.URLId) error

Deletes a pending record for a URL that no longer needs be crawled. The pending record is a combination of job + url + origin, where origin is the origin URL the Job was created with.

func (*URLClient) GetAllURLsWithReferById

func (u *URLClient) GetAllURLsWithReferById(referId common.URLId) ([]*URL, error)

Returns a list of direct descendants of the passed in URL. The passed in URL will be the 'refer' value for each of the returned URLs, if there are any.

func (*URLClient) GetOrAddURLByURL

func (u *URLClient) GetOrAddURLByURL(urlStr, mime string) (*URL, error)

Attempts to get a URL if it already exists. If the URL does not exist a new entry will be added, and that URL entry will be returned. The 'mime' value will only be used if the URL needs to be added.

func (*URLClient) GetURLById

func (u *URLClient) GetURLById(id common.URLId) (*URL, error)

Requests a URL record by Id. If no URL is found, nil will be returned for the URL

func (*URLClient) GetURLByURL

func (u *URLClient) GetURLByURL(url string) (*URL, error)

Requests a URL record for the URL by URL string value. If no URL is found, nil will be returned for the URL

func (*URLClient) HasPending

func (u *URLClient) HasPending(jobId common.JobId, originId common.URLId) (bool, error)

Returns true if the Origin Job URL is still has pending entries in the pending table.

func (*URLClient) MarkCrawled

func (u *URLClient) MarkCrawled(urlId common.URLId, mime string) error

Updates the mime content-type of a preexisting URL.

func (*URLClient) MarkJobURLComplete

func (u *URLClient) MarkJobURLComplete(jobId common.JobId, urlId common.URLId) error

Marks a pre-existing job's URL as completed. This means that all descendants have been crawled up to the max level.

func (*URLClient) UpdateJobURLIfComplete

func (u *URLClient) UpdateJobURLIfComplete(jobId common.JobId, urlId common.URLId) (bool, error)

Checks if a URL has any pending entries in the job URL pending table. If there are no longer any entries, The URL associated with this job will be marked as completed.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL