Documentation ¶
Overview ¶
Package urls normalizes URLs and implements its helpers
Package urls normalizes URLs and implements its helpers ¶
Package urls normalizes URLs and implements its helpers ¶
Package urls normalizes URLs and implements its helpers ¶
Package urls normalizes URLs and implements its helpers ¶
Package urls normalizes URLs and implements its helpers ¶
Package urls normalizes URLs and implements its helpers
Index ¶
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidScheme = errors.New("invalid scheme")
Functions ¶
func FirstNormalizeURL ¶
FirstNormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.
func R1NormalizeURL ¶ added in v1.6.0
R1NormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.
func R2NormalizeURL ¶ added in v1.6.0
R2NormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.
func SecondNormalizeURL ¶
SecondNormalizeURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.
Types ¶
type Normalizer ¶
type Normalizer struct {
// contains filtered or unexported fields
}
Normalizer normalizes the URL into the crawlable URL and the key for KVS use.
func NewNormalizer ¶
func NewNormalizer(ul *url.URL) (*Normalizer, error)
NewNormalizer generate a new Normalizer structure when the input URL is supported.
func (*Normalizer) CrawlingURL ¶
func (n *Normalizer) CrawlingURL() string
CrawlingURL returns the preferred URL for crawling.
func (*Normalizer) FirstNormalizedURL ¶
func (n *Normalizer) FirstNormalizedURL() string
FirstNormalizedURL returns a unique URL of the input URL, which contributes to reduce the database footprint.
func (*Normalizer) SecondNormalizedURL ¶
func (n *Normalizer) SecondNormalizedURL() string
SecondNormalizedURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.
type NormalizerR ¶ added in v1.6.0
type NormalizerR struct {
// contains filtered or unexported fields
}
NormalizerR normalizes the URL into the crawlable URL and the key for KVS use Most behavior is as same as `Normalizer`, except `normalizeSPHost` in `FirstNormalizedURL()`.
func NewNormalizerR ¶ added in v1.6.0
func NewNormalizerR(ul *url.URL) (*NormalizerR, error)
NewNormalizerR generate a new NormalizerR structure when the input URL is supported. Unlike `NewNormalizer`, argument `ul` will not be destroyed.
func (*NormalizerR) CrawlingURL ¶ added in v1.6.0
func (n *NormalizerR) CrawlingURL() string
CrawlingURL returns the preferred URL for crawling.
func (*NormalizerR) FirstNormalizedURL ¶ added in v1.6.0
func (n *NormalizerR) FirstNormalizedURL() string
FirstNormalizedURL returns a unique URL of the input URL, which contributes to reduce the database footprint.
func (*NormalizerR) SecondNormalizedURL ¶ added in v1.6.0
func (n *NormalizerR) SecondNormalizedURL() string
SecondNormalizedURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.