go-index: github.com/daviddengcn/go-index Index | Files

package index

import "github.com/daviddengcn/go-index"

Package index contains text-indexing (search engine) related functions, including indexing, tokenizer, word marking, and snippet selecting, etc.

Index

Package Files

const_array.go index.go markdown.go marktext.go searcher.go ti.go token.go

Variables

var (
    // error of invalid doc-id (out of range)
    ErrInvalidDocID = errors.New("Invalid doc-ID")
)

func MarkText Uses

func MarkText(text []byte, runeType func(last, current rune) RuneType,
    needMark func([]byte) bool, output, mark func([]byte) error) error

MarkText seperates text into separator parts and tokens, mark a token if the token needMark. output and mark functions are called for unmarked/marked texts.

func SingleFieldQuery Uses

func SingleFieldQuery(field string, tokens ...string) map[string]stringsp.Set

SingleFieldQuery returns a map[strig]stringsp.Set (same type as query int Search method) with a single field.

func Tokenize Uses

func Tokenize(runeType RuneTypeFunc, in io.RuneReader,
    output func(token []byte) error) error

Tokenize separates a rune sequence into some tokens defining a RuneType function.

func TokenizeBySeparators Uses

func TokenizeBySeparators(seps string, in io.RuneReader, output func([]byte) error) error

TokenizeBySeparators uses the runes of seps as seprators to tokenize in.

type ConstArrayReader Uses

type ConstArrayReader struct {
    // contains filtered or unexported fields
}

func OpenConstArray Uses

func OpenConstArray(dir string) (*ConstArrayReader, error)

func (*ConstArrayReader) Close Uses

func (r *ConstArrayReader) Close() error

func (*ConstArrayReader) FetchBytes Uses

func (r *ConstArrayReader) FetchBytes(output func(int, []byte) error, indexes ...int) error

func (*ConstArrayReader) FetchGobs Uses

func (r *ConstArrayReader) FetchGobs(output func(int, interface{}) error, indexes ...int) error

func (*ConstArrayReader) ForEachBytes Uses

func (r *ConstArrayReader) ForEachBytes(output func(int, []byte) error) error

func (*ConstArrayReader) ForEachGob Uses

func (r *ConstArrayReader) ForEachGob(output func(int, interface{}) error) error

func (*ConstArrayReader) GetBytes Uses

func (r *ConstArrayReader) GetBytes(index int) ([]byte, error)

func (*ConstArrayReader) GetGob Uses

func (r *ConstArrayReader) GetGob(index int) (interface{}, error)

type ConstArrayWriter Uses

type ConstArrayWriter struct {
    // contains filtered or unexported fields
}

func CreateConstArray Uses

func CreateConstArray(dir string) (*ConstArrayWriter, error)

func (*ConstArrayWriter) AppendBytes Uses

func (sa *ConstArrayWriter) AppendBytes(bs []byte) (int, error)

func (*ConstArrayWriter) AppendGob Uses

func (sa *ConstArrayWriter) AppendGob(e interface{}) (int, error)

func (*ConstArrayWriter) Close Uses

func (sa *ConstArrayWriter) Close() error
type Link struct {
    URL    string
    Anchor string
    Title  string
}

data-structure for a link

type MarkdownData Uses

type MarkdownData struct {
    Text  []byte // plain text
    Links []Link // all links
}

Parsed data for a markdown text.

func ParseMarkdown Uses

func ParseMarkdown(src []byte) *MarkdownData

ParseMarkdown parses the markdown source and returns the plain text and link information.

type RuneType Uses

type RuneType int
 ,----> TokenBody
////

Hello my friend | \___\________.> TokenSep (spaces) `-> TokenStart

const (
    TokenSep   RuneType = iota // token breaker, should ignored
    TokenStart                 // start of a new token, end current token, if any
    TokenBody                  // body of a token. It's ok for the first rune to be a TokenBody
)

type RuneTypeFunc Uses

type RuneTypeFunc func(last, current rune) RuneType

the type of func for determine RuneType give last and current runes.

func SeparatorFRuneTypeFunc Uses

func SeparatorFRuneTypeFunc(IsSeparator func(r rune) bool) RuneTypeFunc

SeparatorFRuneTypeF returns a rune-type function (used in func Tokenize) which can splits text by separators defined by func IsSeparator.

type TokenIndexer Uses

type TokenIndexer struct {
    // contains filtered or unexported fields
}

TokenIndexer is mainly used to compute outlinks from inlinks.

func (*TokenIndexer) IdsOfToken Uses

func (ti *TokenIndexer) IdsOfToken(token string) []string

IdsOfToken returns a sorted slice of ids for a specified token.

NOTE Do NOT change the elements of the returned slice

func (*TokenIndexer) Load Uses

func (ti *TokenIndexer) Load(r io.Reader) error

Load restores the TokenIndexer data from a Reader with the gob decoder.

func (*TokenIndexer) Put Uses

func (ti *TokenIndexer) Put(id string, tokens villa.StrSet)

func (*TokenIndexer) PutTokens Uses

func (ti *TokenIndexer) PutTokens(id string, tokens stringsp.Set)

Put sets the tokens for a specified ID. If the ID was put before, the tokens are updated.

func (*TokenIndexer) Save Uses

func (ti *TokenIndexer) Save(w io.Writer) error

Saves serializes the TokenIndexer data to a Writer with the gob encoder.

func (*TokenIndexer) TokensOfId Uses

func (ti *TokenIndexer) TokensOfId(id string) []string

TokensOfId returns a sorted slice of tokens for a specified id.

NOTE Do NOT change the elements of the returned slice

type TokenSetSearcher Uses

type TokenSetSearcher struct {
    // contains filtered or unexported fields
}

TokenSetSearcher can index documents, with which represented as a set of tokens. All data are stored in memory.

Indexed data can be saved, and loaded again.

If a customized type needs to be saved and loaded again, it must be registered by calling gob.Register.

func (*TokenSetSearcher) AddDoc Uses

func (s *TokenSetSearcher) AddDoc(fields map[string]stringsp.Set, data interface{}) int32

AddDoc indexes a document to the searcher. It returns a local doc ID.

func (*TokenSetSearcher) DocCount Uses

func (s *TokenSetSearcher) DocCount() int

DocCount returns the number of docs.

func (*TokenSetSearcher) DocInfo Uses

func (s *TokenSetSearcher) DocInfo(docID int32) interface{}

DocInfo returns the doc-info of specified doc

func (*TokenSetSearcher) Load Uses

func (s *TokenSetSearcher) Load(r io.Reader) error

Load restores the searcher data from a Reader with the gob decoder.

func (*TokenSetSearcher) Save Uses

func (s *TokenSetSearcher) Save(w io.Writer) error

Save serializes the searcher data to a Writer with the gob encoder.

func (*TokenSetSearcher) Search Uses

func (s *TokenSetSearcher) Search(query map[string]stringsp.Set, output func(docID int32, data interface{}) error) error

Search outputs all documents (docID and associated data) with all tokens hit, in the same order as they were added. If output returns an error, the search stops, and the error is returned. If no tokens in query, all documents are returned.

func (*TokenSetSearcher) TokenDocList Uses

func (s *TokenSetSearcher) TokenDocList(field, token string) []int32

Returns the docIDs of a speicified token.

type Tokenizer Uses

type Tokenizer interface {
    /*
    	Tokenize separates a rune sequence into some tokens. If output returns
    	a non-nil error, tokenizing stops and the error is returned.
    */
    Tokenize(in io.RuneReader, output func(token []byte) error) error
}

A Tokenizer interface can tokenize text into tokens.

Package index imports 15 packages (graph) and is imported by 25 packages. Updated 2018-01-31. Refresh now. Tools for package owners.