bleve: github.com/blevesearch/bleve/analysis Index | Files | Directories

package analysis

import "github.com/blevesearch/bleve/analysis"

Index

Package Files

freq.go tokenmap.go type.go util.go

Variables

var ErrInvalidDateTime = fmt.Errorf("unable to parse datetime with any of the layouts")

func BuildTermFromRunes Uses

func BuildTermFromRunes(runes []rune) []byte

func BuildTermFromRunesOptimistic Uses

func BuildTermFromRunesOptimistic(buf []byte, runes []rune) []byte

BuildTermFromRunesOptimistic will build a term from the provided runes AND optimistically attempt to encode into the provided buffer if at any point it appears the buffer is too small, a new buffer is allocated and that is used instead this should be used in cases where frequently the new term is the same length or shorter than the original term (in number of bytes)

func DeleteRune Uses

func DeleteRune(in []rune, pos int) []rune

func InsertRune Uses

func InsertRune(in []rune, pos int, r rune) []rune

func RunesEndsWith Uses

func RunesEndsWith(input []rune, suffix string) bool

func TruncateRunes Uses

func TruncateRunes(input []byte, num int) []byte

type Analyzer Uses

type Analyzer struct {
    CharFilters  []CharFilter
    Tokenizer    Tokenizer
    TokenFilters []TokenFilter
}

func (*Analyzer) Analyze Uses

func (a *Analyzer) Analyze(input []byte) TokenStream

type ByteArrayConverter Uses

type ByteArrayConverter interface {
    Convert([]byte) (interface{}, error)
}

type CharFilter Uses

type CharFilter interface {
    Filter([]byte) []byte
}

type DateTimeParser Uses

type DateTimeParser interface {
    ParseDateTime(string) (time.Time, error)
}

type Token Uses

type Token struct {
    // Start specifies the byte offset of the beginning of the term in the
    // field.
    Start int `json:"start"`

    // End specifies the byte offset of the end of the term in the field.
    End  int    `json:"end"`
    Term []byte `json:"term"`

    // Position specifies the 1-based index of the token in the sequence of
    // occurrences of its term in the field.
    Position int       `json:"position"`
    Type     TokenType `json:"type"`
    KeyWord  bool      `json:"keyword"`
}

Token represents one occurrence of a term at a particular location in a field.

func (*Token) String Uses

func (t *Token) String() string

type TokenFilter Uses

type TokenFilter interface {
    Filter(TokenStream) TokenStream
}

A TokenFilter adds, transforms or removes tokens from a token stream.

type TokenFreq Uses

type TokenFreq struct {
    Term      []byte
    Locations []*TokenLocation
    // contains filtered or unexported fields
}

TokenFreq represents all the occurrences of a term in all fields of a document.

func (*TokenFreq) Frequency Uses

func (tf *TokenFreq) Frequency() int

func (*TokenFreq) Size Uses

func (tf *TokenFreq) Size() int

type TokenFrequencies Uses

type TokenFrequencies map[string]*TokenFreq

TokenFrequencies maps document terms to their combined frequencies from all fields.

func TokenFrequency Uses

func TokenFrequency(tokens TokenStream, arrayPositions []uint64, includeTermVectors bool) TokenFrequencies

func (TokenFrequencies) MergeAll Uses

func (tfs TokenFrequencies) MergeAll(remoteField string, other TokenFrequencies)

func (TokenFrequencies) Size Uses

func (tfs TokenFrequencies) Size() int

type TokenLocation Uses

type TokenLocation struct {
    Field          string
    ArrayPositions []uint64
    Start          int
    End            int
    Position       int
}

TokenLocation represents one occurrence of a term at a particular location in a field. Start, End and Position have the same meaning as in analysis.Token. Field and ArrayPositions identify the field value in the source document. See document.Field for details.

func (*TokenLocation) Size Uses

func (tl *TokenLocation) Size() int

type TokenMap Uses

type TokenMap map[string]bool

func NewTokenMap Uses

func NewTokenMap() TokenMap

func (TokenMap) AddToken Uses

func (t TokenMap) AddToken(token string)

func (TokenMap) LoadBytes Uses

func (t TokenMap) LoadBytes(data []byte) error

LoadBytes reads in a list of tokens from memory, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadFile Uses

func (t TokenMap) LoadFile(filename string) error

LoadFile reads in a list of tokens from a text file, one per line. Comments are supported using `#` or `|`

func (TokenMap) LoadLine Uses

func (t TokenMap) LoadLine(line string)

type TokenStream Uses

type TokenStream []*Token

type TokenType Uses

type TokenType int
const (
    AlphaNumeric TokenType = iota
    Ideographic
    Numeric
    DateTime
    Shingle
    Single
    Double
    Boolean
)

type Tokenizer Uses

type Tokenizer interface {
    Tokenize([]byte) TokenStream
}

A Tokenizer splits an input string into tokens, the usual behaviour being to map words to tokens.

Directories

PathSynopsis
analyzer/custom
analyzer/keyword
analyzer/simple
analyzer/standard
analyzer/web
char/html
char/regexp
char/zerowidthnonjoiner
datetime/flexible
datetime/optional
lang/ar
lang/bg
lang/ca
lang/cjk
lang/ckb
lang/cs
lang/da
lang/de
lang/el
lang/enPackage en implements an analyzer with reasonable defaults for processing English text.
lang/es
lang/eu
lang/fa
lang/fi
lang/fr
lang/ga
lang/gl
lang/hi
lang/hu
lang/hy
lang/id
lang/in
lang/it
lang/nl
lang/no
lang/pt
lang/ro
lang/ru
lang/sv
lang/tr
token/apostrophe
token/camelcase
token/compound
token/edgengram
token/elision
tokenizer/character
tokenizer/exceptionpackage exception implements a Tokenizer which extracts pieces matched by a regular expression from the input data, delegates the rest to another tokenizer, then insert back extracted parts in the token stream.
tokenizer/letter
tokenizer/regexp
tokenizer/single
tokenizer/unicode
tokenizer/web
tokenizer/whitespace
token/keyword
token/length
token/lowercasePackage lowercase implements a TokenFilter which converts tokens to lower case according to unicode rules.
tokenmappackage token_map implements a generic TokenMap, often used in conjunction with filters to remove or process specific tokens.
token/ngram
token/porter
token/shingle
token/snowball
token/stopPackage stop implements a TokenFilter removing tokens found in a TokenMap.
token/truncate
token/unicodenorm
token/unique

Package analysis imports 10 packages (graph) and is imported by 1112 packages. Updated 2018-10-17. Refresh now. Tools for package owners.