kapok

package module
v0.0.0-...-192e4b2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 20, 2013 License: MIT Imports: 4 Imported by: 0

README

Kapok

A Knowledge Graph of Wikipedia.

Description

Kapok aims to create a knowledge graph from Wikipedia. In this graph, each node is an article, and links between articles are the edges between nodes.

Structure

Kapok is split into 3 modular sections:

  • Parsing: extracting relevant data from a 45GB archive of Wikipedia
  • Graph: morphing the parsed data into a graph for analysis
  • Visualisation: creating interesting visualisations with the data

The parsing section of Kapok could be easily extended to replace aging Wikimedia tools like MWDumper. I'll probably do this soon.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GenerateAndStore

func GenerateAndStore(in io.Reader, outPath string, maxPages, saveInterval int) error

GenerateAndStore generates a graph from the given reader and continually exports it to disk.

It will only consume maxPages pages, or unlimited if maxPages is -1

saveInterval represents the number of pages between each export.

func GenerateGraph

func GenerateGraph(in io.Reader, maxPages int) *graph.Graph

GenerateGraph reads the contents of the given reader and creates a graph from the first maxPages pages.

If maxPages is -1, GenerateGraph will read until the database channel closes.

Types

This section is empty.

Directories

Path Synopsis
A memory-efficient graph for large datasets.
A memory-efficient graph for large datasets.
An ad-hoc parser for Wikipedia's 45GB (and growing) XML database.
An ad-hoc parser for Wikipedia's 45GB (and growing) XML database.
Stats provides utilities for analysing Wikipedia articles.
Stats provides utilities for analysing Wikipedia articles.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL