html

package

v0.5.0 Latest Latest Go to latest Published: Jul 31, 2018 License: BSD-3-Clause Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/martinplaner/felix

Links

Open Source Insights

Documentation ¶

Index ¶

Variables

Constants ¶

This section is empty.

Variables ¶

View Source

var LinkScanner = felix.ScanFunc(func(ctx context.Context, r io.Reader, e felix.Emitter) error {
	doc, err := goquery.NewDocumentFromReader(r)

	if err != nil {
		return errors.Wrap(err, "could not read HTML document")
	}

	foundURLs := make(map[string]bool)

	doc.Find("a").Each(func(index int, item *goquery.Selection) {
		if href, ok := item.Attr("href"); ok && !foundURLs[href] {
			title := item.Text()
			if strings.TrimSpace(title) == "" {
				title = href
			}
			foundURLs[href] = true
			e.EmitLink(felix.Link{
				Title: title,
				URL:   href,
			})
		}
	})

	if s, err := doc.Html(); err == nil {
		urls := urlPattern.FindAllString(s, -1)
		for _, u := range urls {
			if !foundURLs[u] {
				foundURLs[u] = true
				e.EmitLink(felix.Link{
					Title: u,
					URL:   u,
				})
			}
		}
	}

	return nil
})

LinkScanner parses r as an HTML document and extracts all links. Links are uniquely identified by the links URL. Multiple instances of the same URL (e.g. href), will only be reported once (i.e. the first found instance).

Functions ¶

This section is empty.

Types ¶

This section is empty.

Source Files ¶

View all Source files

scanner.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL