purell: github.com/PuerkitoBio/purell Index | Examples | Files

package purell

import "github.com/PuerkitoBio/purell"

Package purell offers URL normalization as described on the wikipedia page: http://en.wikipedia.org/wiki/URL_normalization

Index

Examples

Package Files

purell.go

func MustNormalizeURLString Uses

func MustNormalizeURLString(u string, f NormalizationFlags) string

MustNormalizeURLString returns the normalized string, and panics if an error occurs. It takes an URL string as input, as well as the normalization flags.

Code:

normalized := MustNormalizeURLString("hTTpS://someWEBsite.com:443/Amazing%fa/url/",
    FlagsUnsafeGreedy)
fmt.Print(normalized)

Output:

http://somewebsite.com/Amazing%FA/url

func NormalizeURL Uses

func NormalizeURL(u *url.URL, f NormalizationFlags) string

NormalizeURL returns the normalized string. It takes a parsed URL object as input, as well as the normalization flags.

Code:

if u, err := url.Parse("Http://SomeUrl.com:8080/a/b/.././c///g?c=3&a=1&b=9&c=0#target"); err != nil {
    panic(err)
} else {
    normalized := NormalizeURL(u, FlagsUsuallySafeGreedy|FlagRemoveDuplicateSlashes|FlagRemoveFragment)
    fmt.Print(normalized)
}

Output:

http://someurl.com:8080/a/c/g?c=3&a=1&b=9&c=0

func NormalizeURLString Uses

func NormalizeURLString(u string, f NormalizationFlags) (string, error)

NormalizeURLString returns the normalized string, or an error if it can't be parsed into an URL object. It takes an URL string as input, as well as the normalization flags.

Code:

if normalized, err := NormalizeURLString("hTTp://someWEBsite.com:80/Amazing%3f/url/",
    FlagLowercaseScheme|FlagLowercaseHost|FlagUppercaseEscapes); err != nil {
    panic(err)
} else {
    fmt.Print(normalized)
}

Output:

http://somewebsite.com:80/Amazing%3F/url/

type NormalizationFlags Uses

type NormalizationFlags uint

A set of normalization flags determines how a URL will be normalized.

const (
    // Safe normalizations
    FlagLowercaseScheme           NormalizationFlags = 1 << iota // HTTP://host -> http://host, applied by default in Go1.1
    FlagLowercaseHost                                            // http://HOST -> http://host
    FlagUppercaseEscapes                                         // http://host/t%ef -> http://host/t%EF
    FlagDecodeUnnecessaryEscapes                                 // http://host/t%41 -> http://host/tA
    FlagEncodeNecessaryEscapes                                   // http://host/!"#$ -> http://host/%21%22#$
    FlagRemoveDefaultPort                                        // http://host:80 -> http://host
    FlagRemoveEmptyQuerySeparator                                // http://host/path? -> http://host/path

    // Usually safe normalizations
    FlagRemoveTrailingSlash // http://host/path/ -> http://host/path
    FlagAddTrailingSlash    // http://host/path -> http://host/path/ (should choose only one of these add/remove trailing slash flags)
    FlagRemoveDotSegments   // http://host/path/./a/b/../c -> http://host/path/a/c

    // Unsafe normalizations
    FlagRemoveDirectoryIndex   // http://host/path/index.html -> http://host/path/
    FlagRemoveFragment         // http://host/path#fragment -> http://host/path
    FlagForceHTTP              // https://host -> http://host
    FlagRemoveDuplicateSlashes // http://host/path//a///b -> http://host/path/a/b
    FlagRemoveWWW              // http://www.host/ -> http://host/
    FlagAddWWW                 // http://host/ -> http://www.host/ (should choose only one of these add/remove WWW flags)
    FlagSortQuery              // http://host/path?c=3&b=2&a=1&b=1 -> http://host/path?a=1&b=1&b=2&c=3

    // Normalizations not in the wikipedia article, required to cover tests cases
    // submitted by jehiah
    FlagDecodeDWORDHost           // http://1113982867 -> http://66.102.7.147
    FlagDecodeOctalHost           // http://0102.0146.07.0223 -> http://66.102.7.147
    FlagDecodeHexHost             // http://0x42660793 -> http://66.102.7.147
    FlagRemoveUnnecessaryHostDots // http://.host../path -> http://host/path
    FlagRemoveEmptyPortSeparator  // http://host:/path -> http://host/path

    // Convenience set of safe normalizations
    FlagsSafe NormalizationFlags = FlagLowercaseHost | FlagLowercaseScheme | FlagUppercaseEscapes | FlagDecodeUnnecessaryEscapes | FlagEncodeNecessaryEscapes | FlagRemoveDefaultPort | FlagRemoveEmptyQuerySeparator

    // Convenience set of usually safe normalizations (includes FlagsSafe)
    FlagsUsuallySafeGreedy    NormalizationFlags = FlagsSafe | FlagRemoveTrailingSlash | FlagRemoveDotSegments
    FlagsUsuallySafeNonGreedy NormalizationFlags = FlagsSafe | FlagAddTrailingSlash | FlagRemoveDotSegments

    // Convenience set of unsafe normalizations (includes FlagsUsuallySafe)
    FlagsUnsafeGreedy    NormalizationFlags = FlagsUsuallySafeGreedy | FlagRemoveDirectoryIndex | FlagRemoveFragment | FlagForceHTTP | FlagRemoveDuplicateSlashes | FlagRemoveWWW | FlagSortQuery
    FlagsUnsafeNonGreedy NormalizationFlags = FlagsUsuallySafeNonGreedy | FlagRemoveDirectoryIndex | FlagRemoveFragment | FlagForceHTTP | FlagRemoveDuplicateSlashes | FlagAddWWW | FlagSortQuery

    // Convenience set of all available flags
    FlagsAllGreedy    = FlagsUnsafeGreedy | FlagDecodeDWORDHost | FlagDecodeOctalHost | FlagDecodeHexHost | FlagRemoveUnnecessaryHostDots | FlagRemoveEmptyPortSeparator
    FlagsAllNonGreedy = FlagsUnsafeNonGreedy | FlagDecodeDWORDHost | FlagDecodeOctalHost | FlagDecodeHexHost | FlagRemoveUnnecessaryHostDots | FlagRemoveEmptyPortSeparator
)

Package purell imports 11 packages (graph) and is imported by 371 packages. Updated 2019-10-02. Refresh now. Tools for package owners.