wikidump

package
v0.0.0-...-5fecf9c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 11, 2016 License: Apache-2.0 Imports: 16 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func Cleanup

func Cleanup(s string) string

Get rid of tables, template calls, quasi-XML. Throws away their content.

Assumes tables, templates and tags are properly nested, except for spurious end-of-{table,template,element} tags, which are ignored.

func Download

func Download(wikiname, path string, logProgress bool) (string, error)

Download database dump for wikiname (e.g., "en", "sco", "nds_nl") from WikiMedia.

If path is not nil, writes the dump to path. Else, derives an appropriate path from the URL and returns that.

Logs its progress on the standard log if logProgress is true.

func ExtractLinks(s string) map[Link]int

Extract all the wikilinks from s. Returns a frequency table.

func GetPages

func GetPages(r io.Reader, pages chan<- *Page, redirs chan<- *Redirect)

Get pages and redirects from wikidump r. Only retrieves the pages in the main namespace.

XXX needs cleaner error handling. Currently panics.

Types

type Link struct {
	Anchor, Target string
}

A link to the article Target with anchor text Anchor.

type Page

type Page struct {
	Title, Text string
}

A Wikipedia page.

type Redirect

type Redirect struct {
	Title, Target string
}

A Wikipedia redirect to Target.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL