xurls: mvdan.cc/xurls Index | Examples | Files | Directories

package xurls

import "mvdan.cc/xurls"

Package xurls extracts urls from plain text using regular expressions.

Code:

rx := xurls.Relaxed()
fmt.Println(rx.FindString("Do gophers live in http://golang.org?"))
fmt.Println(rx.FindAllString("foo.com is http://foo.com/.", -1))

Output:

http://golang.org
[foo.com http://foo.com/]

Index

Examples

Package Files

schemes.go tlds.go tlds_pseudo.go unicode.go xurls.go

Variables

var AnyScheme = `([a-zA-Z][a-zA-Z.\-+]*://|` + anyOf(SchemesNoAuthority...) + `:)`

AnyScheme can be passed to StrictMatchingScheme to match any possibly valid scheme, and not just the known ones.

var PseudoTLDs = []string{
    `bit`,
    `example`,
    `exit`,
    `gnu`,
    `i2p`,
    `invalid`,
    `local`,
    `localhost`,
    `test`,
    `zkey`,
}

PseudoTLDs is a sorted list of some widely used unofficial TLDs.

Sources:

* https://en.wikipedia.org/wiki/Pseudo-top-level_domain
* https://en.wikipedia.org/wiki/Category:Pseudo-top-level_domains
* https://tools.ietf.org/html/draft-grothoff-iesg-special-use-p2p-names-00
* https://www.iana.org/assignments/special-use-domain-names/special-use-domain-names.xhtml
var Schemes = []string{ /* 336 elements not displayed */

}

Schemes is a sorted list of all IANA assigned schemes.

Source:

https://www.iana.org/assignments/uri-schemes/uri-schemes-1.csv
var SchemesNoAuthority = []string{
    `bitcoin`,
    `cid`,
    `file`,
    `magnet`,
    `mailto`,
    `mid`,
    `sms`,
    `tel`,
    `xmpp`,
}

SchemesNoAuthority is a sorted list of some well-known url schemes that are followed by ":" instead of "://". The list includes both officially registered and unofficial schemes.

var SchemesUnofficial = []string{
    `jdbc`,
    `postgres`,
    `postgresql`,
    `slack`,
    `zoommtg`,
    `zoomus`,
}

SchemesUnofficial is a sorted list of some well-known url schemes which aren't officially registered just yet. They tend to correspond to software.

Mostly collected from https://en.wikipedia.org/wiki/List_of_URI_schemes#Unofficial_but_common_URI_schemes.

var TLDs = []string{ /* 1516 elements not displayed */

}

TLDs is a sorted list of all public top-level domains.

Sources:

* https://data.iana.org/TLD/tlds-alpha-by-domain.txt
* https://publicsuffix.org/list/effective_tld_names.dat

func Relaxed Uses

func Relaxed() *regexp.Regexp

Relaxed produces a regexp that matches any URL matched by Strict, plus any URL with no scheme.

func Strict Uses

func Strict() *regexp.Regexp

Strict produces a regexp that matches any URL with a scheme in either the Schemes or SchemesNoAuthority lists.

func StrictMatchingScheme Uses

func StrictMatchingScheme(exp string) (*regexp.Regexp, error)

StrictMatchingScheme produces a regexp similar to Strict, but requiring that the scheme match the given regular expression. See AnyScheme too.

Directories

PathSynopsis
cmd/xurls

Package xurls imports 2 packages (graph) and is imported by 23 packages. Updated 2020-11-16. Refresh now. Tools for package owners.