conllu

package module
v1.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 4, 2020 License: MIT Imports: 6 Imported by: 0

README

go-conllu

CoNLL-U parser written in Go. Convert CoNLL-U files to in-memory structs

The Computational Natural Language Learning - U format (CoNLL-U) is used by the Universal Dependencies project to represent natural language annotations. go-conllu parses CoNNL-U file formats and exposes the data via in-memory Go structs.

⚙️ Installation

go get github.com/nuvi/go-conllu

🚀 Quick Start

package main

import (
	"fmt"
	"log"

	conllu "github.com/nuvi/go-conllu"
)

func main() {
	sentences, err := conllu.ParseFile("../../test_data/en_ewt-ud-train.small.conllu")
	if err != nil {
		log.Fatal(err)
	}

	for _, sentence := range sentences {
		for _, token := range sentence.Tokens {
			fmt.Println(token)
		}
		fmt.Println()
	}
}

Issues

All issues should be submitted via the issues tab on Github. Please provide the code and data used in order for us to reproduce the issue.

💬 Contact

Feel free to reach out with questions/comments to maintainers:

Twitter Follow

Transient Dependencies

None, and we plan to keep it that way.

👏 Contributing

We love help! Contribute by forking the repo and opening pull requests. Please ensure that your code passes the existing tests and linting processes, and write new tests to test your changes if applicable.

All pull requests should be submitted to the "master" branch.

go test
go fmt

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Dep

type Dep struct {
	Head   float64
	Deprel string
}

Dep is a representation of a single part of the enhanced dependency graph

type MorphologicalFeature

type MorphologicalFeature struct {
	Feature string
	Value   string
}

MorphologicalFeature from the universal feature inventory (https://universaldependencies.org/u/feat/index.html) or from a defined language-specific extension (https://universaldependencies.org/ext-feat-index.html)

type Sentence

type Sentence struct {
	Tokens []Token
}

Sentence represents a sentence of parsed CoNLL-U tokens

func Parse

func Parse(r io.Reader) ([]Sentence, error)

Parse parses conllu via the io.Reader interface and returns all of the tokens found Parse doesn't close the reader when finished, that must be done manually

func ParseFile

func ParseFile(filepath string) ([]Sentence, error)

ParseFile opens, reads, and parses a file in conllu format and returns all of the tokens found. ParseFile is a convencience wrapper for the Parse() function when working with files on disk

type Token

type Token struct {
	ID float64 // Word index, integer starting at 1 for each new sentence; may be a range for multiword tokens; may be a decimal number for empty nodes (decimal numbers can be lower than 1 but must be greater than 0)

	Form string // Word form or punctuation symbol

	Lemma string // Lemma or stem of word form

	UPOS string // Universal part-of-speech tag

	XPOS string // Language-specific part-of-speech tag; empty if not available

	// List of morphological features, which are described on the type; nil if not available
	Feats []MorphologicalFeature

	// Head of the current word, which is either the id of the head token for this word, or 0 if none
	// https://universaldependencies.org/format.html#syntactic-annotation
	Head float64

	// Universal dependency relation to the HEAD (root iff HEAD = 0) or a defined language-specific subtype of one
	Deprel string

	// Enhanced dependency graph in the form of a list of head-deprel pairs. See Dep type for more information; nil if none.
	// Dependencies that are shared between the basic and the enhanced dependency representations must be repeated in the Deps field
	Deps []Dep

	// Any other annotation, represented as a list separated by "|". Nil if none.
	// https://universaldependencies.org/format.html#miscellaneous
	Misc []string
}

Token represents a single token, e.g. "hello", "goodby" and holds all associated annotations https://universaldependencies.org/format.html#conll-u-format

Directories

Path Synopsis
examples

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL