gobert: github.com/buckhx/gobert/tokenize/vocab Index | Files

package vocab

import "github.com/buckhx/gobert/tokenize/vocab"


Package Files


type Dict Uses

type Dict struct {
    // contains filtered or unexported fields

Dict is a container for tokens NOTE: python uses an OrderedDict, unsure of implications

func FromFile Uses

func FromFile(path string) (Dict, error)

FromFile will read a newline delimited file into a Dict

func New Uses

func New(tokens []string) Dict

New iwll return a a covab dict from the given tokens, IDs will match index

func (Dict) Add Uses

func (v Dict) Add(token string)

Add will add an item to the vocabulary, is not thread-safe

func (Dict) GetID Uses

func (v Dict) GetID(token string) ID

GetID will return the ID of the token in the vocab. Will be negative if it doesn't exists

func (Dict) LongestSubstring Uses

func (v Dict) LongestSubstring(token string) string

LongestSubstring returns the longest token that is a substring of the token

func (Dict) Size Uses

func (v Dict) Size() int

Size returns the size of the vocabulary

type ID Uses

type ID int32

ID is used to identify vocab items

func (ID) Int32 Uses

func (id ID) Int32() int32

Int32 int32 representation of an ID

type Provider Uses

type Provider interface {
    Vocab() Dict

Provider is an interface for exposing a vocab

Package vocab imports 2 packages (graph) and is imported by 3 packages. Updated 2019-07-29. Refresh now. Tools for package owners.