wikidump

package module
v0.0.0-...-ffea927 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 24, 2019 License: MIT Imports: 20 Imported by: 6

README

wikidump

GoDoc Reference Build Status Go Report Card Bugs Coverage Lines of Code Maintainability Rating Reliability Rating Security Rating Vulnerabilities

Description

Package wikidump is a golang package that provides utility functions for downloading and extracting wikipedia dumps.

Installation

This package can be installed with the go get command:

go get github.com/negapedia/wikidump

Dependencies

This package depends on p7zip>=16.02 for 7zip files extraction.

Documentation

API documentation can be found in the associated godoc reference

Documentation

Overview

Package wikidump provides utility functions for downloading and extracting wikipedia dumps.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func SQL2CSV

func SQL2CSV(r io.Reader) io.Reader

SQL2CSV transforms on the fly a SQL data dump from dumps.wikimedia.org into a clean CSV

Types

type Wikidump

type Wikidump struct {
	// contains filtered or unexported fields
}

Wikidump represent a hub from which request particular dump files of wikipedia.

func From

func From(tmpDir, lang string, t time.Time) (w Wikidump, err error)

From creates a new wikidump from the specified date.

func Latest

func Latest(tmpDir, lang string, checkFor ...string) (w Wikidump, err error)

Latest creates a new wikidump from the latest valid wikipedia dump.

func (Wikidump) CheckFor

func (w Wikidump) CheckFor(filenames ...string) error

CheckFor checks for file existence in the wikidump

func (Wikidump) Date

func (w Wikidump) Date() time.Time

Date returns the date of the current Dump

func (Wikidump) Open

func (w Wikidump) Open(filename string) func(context.Context) (io.ReadCloser, error)

Open returns an iterator over the resources associated with the current filename, the download can be stopped by the context. Once the iterator is depleted, it returns an io.EOF error. Once an error is returned by the iterator, any subsequent call will return the same error. It is the caller's responsibility to call Close on the Reader when done. Open takes care of checking SHA1 sum, retry download and decompressing files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL