Documentation ¶
Index ¶
- type S
- func (s *S) GetErrors() []error
- func (s *S) GetErrorsCount() int64
- func (s *S) GetRandomURLs(n int) []URL
- func (s *S) GetURLCount() int64
- func (s *S) GetURLs() []URL
- func (s *S) Parse(url string, urlContent *string) (*S, error)
- func (s *S) SetFetchTimeout(fetchTimeout uint8) *S
- func (s *S) SetUserAgent(userAgent string) *S
- type URL
- type URLSet
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
This section is empty.
Types ¶
type S ¶
type S struct {
// contains filtered or unexported fields
}
S is a structure that holds various data related to processing URLs. It contains a cfg field of type `config`, which stores configuration settings. The mainURL field of type string represents the main URL being processed. The mainURLContent field of type string stores the content of the main URL. The robotsTxtSitemapURLs field is a slice of strings that contains the URLs present in the robots.txt file's sitemap directive. The sitemapLocations field is a slice of strings that represents the locations of the sitemap files. The urls field is a slice of URL structs that stores the URLs to be processed. The errs field is a slice of errors that holds any encountered errors during processing.
func New ¶
func New() *S
New creates a new instance of the S structure. It initializes the structure with default configuration values and returns a pointer to the created instance.
func (*S) GetErrorsCount ¶
func (*S) GetRandomURLs ¶
GetRandomURLs returns a slice of randomly selected URLs from the S object's URL list. The number of URLs to select is specified by the parameter n. If the S object is nil, an empty slice is returned. The function creates a copy of the original URLs list and randomly selects n URLs from it, removing them to avoid duplicates. The selected URLs are returned as a new slice.
func (*S) GetURLCount ¶
GetURLCount returns the count of URLs in the S struct.
func (*S) Parse ¶
Parse is a method of the S structure. It parses the given URL and its content. It sets the mainURL field to the given URL and the mainURLContent field to the given URL content. It returns an error if there was an error setting the content. If the URL ends with "/robots.txt", it parses the robots.txt file and fetches URLs from the sitemap files mentioned in the robots.txt. The URLs are fetched concurrently using goroutines and the wait group wg. If there was an error fetching a sitemap file, the error is appended to the errs field. The fetched content is checked and unzipped if necessary. The fetched sitemap file URLs are parsed and fetched. If the URL does not end with "/robots.txt", the mainURLContent is checked and unzipped if necessary. The mainURLContent is then parsed and fetched. After all URLs are fetched and parsed, the method waits for all goroutines to complete using wg.Wait(). It returns the S structure and nil error if the method was able to complete successfully.
func (*S) SetFetchTimeout ¶
SetFetchTimeout sets the fetch timeout for the Sitemap Parser. The fetch timeout determines how long the parser will wait for an HTTP request to complete. It should be specified in seconds as an uint8 value. The function returns a pointer to the S structure to allow method chaining.
func (*S) SetUserAgent ¶
SetUserAgent sets the user agent for the Sitemap Parser. The user agent is used for making HTTP requests when parsing and fetching URLs. It should be a string representing the user agent header value. The function returns a pointer to the S structure to allow method chaining.