Documentation ¶
Index ¶
- type Client
- type ClientConfig
- type Job
- type JobClient
- type JobURL
- type URL
- type URLClient
- func (u *URLClient) Add(url, mime string) (*URL, error)
- func (u *URLClient) AddLink(urlId, referId common.URLId) error
- func (u *URLClient) AddPending(jobId common.JobId, urlId, originId common.URLId) error
- func (u *URLClient) AddResult(jobId common.JobId, referId, urlId common.URLId) error
- func (u *URLClient) AddURLsToResults(jobId common.JobId, referId common.URLId, urls []*URL) error
- func (u *URLClient) DeletePending(jobId common.JobId, urlId, originId common.URLId) error
- func (u *URLClient) GetAllURLsWithReferById(referId common.URLId) ([]*URL, error)
- func (u *URLClient) GetOrAddURLByURL(urlStr, mime string) (*URL, error)
- func (u *URLClient) GetURLById(id common.URLId) (*URL, error)
- func (u *URLClient) GetURLByURL(url string) (*URL, error)
- func (u *URLClient) HasPending(jobId common.JobId, originId common.URLId) (bool, error)
- func (u *URLClient) MarkCrawled(urlId common.URLId, mime string) error
- func (u *URLClient) MarkJobURLComplete(jobId common.JobId, urlId common.URLId) error
- func (u *URLClient) UpdateJobURLIfComplete(jobId common.JobId, urlId common.URLId) (bool, error)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type Client ¶
type Client struct {
// contains filtered or unexported fields
}
Client for communicating with the storage service. Provides a way to Create jobs, update jobs, and manipulate URL entries
func NewClient ¶
func NewClient(cfg ClientConfig) (*Client, error)
Creates a new instance of the storage client. returning a client instance to perform operations with. The client is safe across multiple go routines.
func (*Client) Close ¶
Close the Storage when it is no longer in use. No more requests via this client should be made after the storage connection has been closed.
type ClientConfig ¶
type ClientConfig struct { // User name the storage will connect as User string `json:"user"` // Password for the user Pass string `json:"pass"` // Database name to connect to DBName string `json:"dbname"` // Host of the storage service Host string `json:"host"` // Port of the host for the storage service Port int `json:"port"` // If SSL mode will be enabled/disabled SSLMode bool `json:"sslmode"` }
Configuration for the storage connection info
func (ClientConfig) String ¶
func (c ClientConfig) String() string
Converts the configuration into a string for the sql.Open's connInfo parameter
type Job ¶
type Job struct { // ID (primary key) of the job Id common.JobId // List of URLs belonging to this job. Includes their // competition status. URLs []JobURL // The time stamp the Job was created on. CreatedOn time.Time }
Job Entry for the 'job' record. The Job also includes the URLs that were specified as tasks of a Job.
type JobClient ¶
type JobClient struct {
// contains filtered or unexported fields
}
Provides a name spaced collection of Job based storage operations. JobClient does not hold non go-routine state, and is safe to share across multiples.
func (*JobClient) CreateJobFromURLs ¶
Create a new job entry with its URLS, returning a pointer to the newly created Job.
func (*JobClient) GetJob ¶
Searches for a job, and returns it and its URLs if the job exist. Nil is return if the job does not exist
func (*JobClient) Result ¶
Queries the result URLs for a job by id, and generates the JobResult object. Results will be grouped in list under the refer URL which those result URLs were found from. Duplicate results under the same refer URL will be removed, and not included in the JobResults returned.
type JobURL ¶
type JobURL struct { // Job URL that was requested URLId common.URLId // URL for this job item URL string // If this Job URL has been completely crawled Completed bool // The time stamp the URL was finished crawling. Only valid if 'Completed' // is also set. CompletedOn time.Time // The JobId this URL belongs to. JobId common.JobId }
Job URL entry for the _'job_url' table. The CompletedOn value will only be valid if the 'Completed' flag is true.
type URL ¶
type URL struct { // ID (primary key) of this entry Id common.URLId // This URL the record is for URL string // The URL which this URL record was found on Refer string // The Content type of the URL, e.g: text/html Mime string // If the URL has been crawled the Crawled flag // will be true, This includes if it was crawled // but found no descendants. The crawledOn field // is only valid if this field is true Crawled bool // The time stamp the URL entry was created. CrawledOn time.Time }
Definition of a 'url' table record. A URL is defined as a URL + Refer URL that the URL was found on.
type URLClient ¶
type URLClient struct {
// contains filtered or unexported fields
}
Provides a name spaced collection of URL based storage operations. JURLClient does not hold non go-routine state, and is safe to share across multiples.
func (*URLClient) Add ¶
Adds a new URL to the database returning a URL object for it. If no mime is known us common.DefaultMime in its place.
func (*URLClient) AddLink ¶
Attempts to insert a link between a refer and URL into the storage. If the link already exists, the insert statement will be ignored.
func (*URLClient) AddPending ¶
Adds the URL as pending under a origin URL and job Id. If the record already exists the insert statement will be ignored.
func (*URLClient) AddResult ¶
Records a new crawled URL into the job results, for a specific jobId. If the result record already exists, the insert statement will be ignored.
func (*URLClient) AddURLsToResults ¶
Adds a batch of URLs to the job results. Will update the job result for each job Id provided
func (*URLClient) DeletePending ¶
Deletes a pending record for a URL that no longer needs be crawled. The pending record is a combination of job + url + origin, where origin is the origin URL the Job was created with.
func (*URLClient) GetAllURLsWithReferById ¶
Returns a list of direct descendants of the passed in URL. The passed in URL will be the 'refer' value for each of the returned URLs, if there are any.
func (*URLClient) GetOrAddURLByURL ¶
Attempts to get a URL if it already exists. If the URL does not exist a new entry will be added, and that URL entry will be returned. The 'mime' value will only be used if the URL needs to be added.
func (*URLClient) GetURLById ¶
Requests a URL record by Id. If no URL is found, nil will be returned for the URL
func (*URLClient) GetURLByURL ¶
Requests a URL record for the URL by URL string value. If no URL is found, nil will be returned for the URL
func (*URLClient) HasPending ¶
Returns true if the Origin Job URL is still has pending entries in the pending table.
func (*URLClient) MarkCrawled ¶
Updates the mime content-type of a preexisting URL.
func (*URLClient) MarkJobURLComplete ¶
Marks a pre-existing job's URL as completed. This means that all descendants have been crawled up to the max level.
func (*URLClient) UpdateJobURLIfComplete ¶
Checks if a URL has any pending entries in the job URL pending table. If there are no longer any entries, The URL associated with this job will be marked as completed.