urls

package
v1.11.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 1, 2024 License: Apache-2.0 Imports: 11 Imported by: 2

Documentation

Overview

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Package urls normalizes URLs and implements its helpers

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidScheme = errors.New("invalid scheme")

Functions

func CrawlingURL

func CrawlingURL(ul *url.URL) string

CrawlingURL convert URL for crawling.

func FirstNormalizeURL

func FirstNormalizeURL(ul *url.URL) string

FirstNormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.

func R1NormalizeURL added in v1.6.0

func R1NormalizeURL(ul *url.URL) string

R1NormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.

func R2NormalizeURL added in v1.6.0

func R2NormalizeURL(ul *url.URL) string

R2NormalizeURL returns a unique URL of the input URL, which contributes to reduce the database footprint.

func SecondNormalizeURL

func SecondNormalizeURL(ul *url.URL) string

SecondNormalizeURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.

Types

type Normalizer

type Normalizer struct {
	// contains filtered or unexported fields
}

Normalizer normalizes the URL into the crawlable URL and the key for KVS use.

func NewNormalizer

func NewNormalizer(ul *url.URL) (*Normalizer, error)

NewNormalizer generate a new Normalizer structure when the input URL is supported.

func (*Normalizer) CrawlingURL

func (n *Normalizer) CrawlingURL() string

CrawlingURL returns the preferred URL for crawling.

func (*Normalizer) FirstNormalizedURL

func (n *Normalizer) FirstNormalizedURL() string

FirstNormalizedURL returns a unique URL of the input URL, which contributes to reduce the database footprint.

func (*Normalizer) SecondNormalizedURL

func (n *Normalizer) SecondNormalizedURL() string

SecondNormalizedURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.

type NormalizerR added in v1.6.0

type NormalizerR struct {
	// contains filtered or unexported fields
}

NormalizerR normalizes the URL into the crawlable URL and the key for KVS use Most behavior is as same as `Normalizer`, except `normalizeSPHost` in `FirstNormalizedURL()`.

func NewNormalizerR added in v1.6.0

func NewNormalizerR(ul *url.URL) (*NormalizerR, error)

NewNormalizerR generate a new NormalizerR structure when the input URL is supported. Unlike `NewNormalizer`, argument `ul` will not be destroyed.

func (*NormalizerR) CrawlingURL added in v1.6.0

func (n *NormalizerR) CrawlingURL() string

CrawlingURL returns the preferred URL for crawling.

func (*NormalizerR) FirstNormalizedURL added in v1.6.0

func (n *NormalizerR) FirstNormalizedURL() string

FirstNormalizedURL returns a unique URL of the input URL, which contributes to reduce the database footprint.

func (*NormalizerR) SecondNormalizedURL added in v1.6.0

func (n *NormalizerR) SecondNormalizedURL() string

SecondNormalizedURL does the FirstNormalizeURL first, then shrinks the URL by website as much as possible.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL