be_indexer

package module
v0.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 6, 2023 License: MIT Imports: 11 Imported by: 0

README

Boolean Expression Index

ChangeLog

20230325: 支持在同一个Conjunction中添加同一个field的逻辑表达

eg: {field in [1, 2, 3], not-in [2, 3, 4]} and ..... input field:4 ... => true input field:3 ... => false // 即not有更高逻辑优先级; 更严格 同一个DNF多个字段之间逻辑关系会存在一些边界情况与冲突逻辑的假设前提;本库实现是对逻辑true更严格的实现, 并在roaringidx/be_indexer 两份逻辑实现中保持一致; 更多明细见: ./example/repeat_fields_test 中的说明与示例

Boolean expression index

算法描述来源于论文:Boolean expression indexing, 代码中附带了一份vldb09-indexing.pdf. 索引数据构建后的示意图见: boolean indexing arch, 本库的作用是为了使用统一且规范的方式解决下面这种问题:

# 对于一系列规则(布尔规则)描述数据; 对于一个输入.判断哪些规则满足条件的库
# 广告/商品检索便是典型的例子, 某些规则引擎也非常合适

dataset:
item1:  {city in [city1, city2] && isVIP is true}
item2:  {age > 18 && city not in [city1, city2] && isVIP is true}  #(eg: aldult video)
.... 百万级如此的数据; 当然更多数据建议使用工程分片实现更快的检索

当给定一个数据:
<=
city: beijing
age:  24
tag:  [xx-fans, xx, xx, xxx] # 多值特征
vip:  true|fals

检索输出数据中所有满足布尔条件描述限制的条目:
=> [item1, itemn, .....]

为什么写它:

  • 大佬(Wolfhead)的实现并没有开源
  • 网络上能看到的描述和实现模糊不清,完全不能够工程化
  • 在线广告很多功能模块借助其可以实现非常优雅的代码实现
  • 论文仅仅描述了核心算法逻辑,没有给出类似于多值查询等其他工程时实践和扩展的设计建议

本库是基于C++实现移步逻辑的基础上, 进行整理改进之后的版本, 因为存在对信息存在编码和压缩,所以存在一些限制,使用时注意规避;

  • 文档ID最大值限制为:[-2^43, 2^43]
  • 支持自定义Tokenizer,见parser的定义
  • 支持容器扩展(eg:外部kv存储); 默认容器实现使用内置map(hashmap)存储
  • 内置支持模式匹配容器:(基于AC自动机,常用于上下文内容检索等)
  • 内置数值容器:支持>,<,in/not,between运算符,用于支持无限集合范围布尔表达
    • 简而言之支持: score > 20 x between l and r 这样的布尔表达,
    • 常用于不方便业务转化具体枚举值的数值范围

在引入Aho-corasick模式匹配查找容器后,Index构建可能失败,因此对不可恢复错误引入了panic, 需要集成的应用自己recover对应的panic进行业务逻辑的处理,而对于AddDocument等返回error的API, 需要业务自行判断是否继续构建索引;目前当一个文档包含一个或者多个Conjunction时, 如果某个 Conjunction因提供的值不能被Parser/Holder 正确的解析成所需要的数据时,会跳过错误导致对应的 文档不被索引到; 可以通过WithBadConjBehavior(Panic) 指定具体的行为ERR(default), Skip, Panic 暴露此类问题或者检测对应的日志;

usage:

详细使用例子查看:be_indexer usage example

package main

func buildTestDoc() []*be_indexer.Document {
	return []*be_indexer.Document{}
}

func main() {
	builder := be_indexer.NewIndexerBuilder(
		be_indexer.WithBadConjBehavior(be_indexer.SkipBadConj),
	)
	// or use a compacted version, it faster about 12% than default
	// builder := be_indexer.NewCompactIndexerBuilder()

	// optional special a holder/container for field
	// can also register customized container: see: entries_holder_factory.go
	builder.ConfigField("keyword", be_indexer.FieldOption{
		Container: be_indexer.HolderNameACMatcher,
	})

	for _, doc := range buildTestDoc() {
		_ = builder.AddDocument(doc) // see: document.go for how to construct doc
	}

	indexer := builder.BuildIndex()

	// indexing satisfied docs
	assigns := map[be_indexer.BEField]be_indexer.Values{
		"age":  be_indexer.NewIntValues(1),
		"city": be_indexer.NewStrValues("sh", "bj"),
		"tag":  be_indexer.NewStrValues("tag1", "tagn"),
	}

	result, e := indexer.Retrieve(assigns,
		be_indexer.WithStepDetail(),
		be_indexer.WithDumpEntries())
	fmt.Println(e, result)
}

roaringidx roaring-bitmap based boolean expression indexing

design detail see: roaring boolean indexing design:

基于roaring bitmap的布尔索引实现,区别于Boolean expression indexing论文的实现, 利用bitmap在集合运算方面的优势实现的DNF索引逻辑,目前支持普通的倒排以及基于 AhoCorasick的字符串模式匹配逻辑实现。从benchmark 结果来看,在fields数量较多的 场景下性能相对于Boolean expression index的实现性能相对来说要差一些,但roaringidx其可理解性 要好一点。 同时借助roaring bitmap的实现,在文档数规模大、特征数较小的场景下可以节省大量的内存。 模式匹配的索引数据存储也使用了基于double array tree的aho-corasick方案。

NOTE:

  • 文档ID范围[-2^56, 2^56]
  • 单个Conjunction数量小于256个
  • 使用前需要为业务中出现的每个字段提前完成配置
usage

详细使用例子查看:roaringidx usage example


  builder := roaringidx.NewIndexerBuilder()
  _ = builder.ConfigureField("package", roaringidx.FieldSetting{
    Container: roaringidx.ContainerNameDefault,
    Parser:    parser.NewStrHashParser(),
  })

  doc1 := be_indexer.NewDocument(1)
  doc1.AddConjunction(be_indexer.NewConjunction().
    Include("age", be_indexer.NewIntValues(10, 20, 100)).
    Exclude("package", be_indexer.NewStrValues("com.echoface.not")))
  
  builder.AddDocuments(doc1)

  indexer, err := builder.BuildIndexer()
  util.PanicIfErr(err, "should not err here")

  scanner := roaringidx.NewScanner(indexer)
  docs, err := scanner.Retrieve(map[be_indexer.BEField]be_indexer.Values{
	  "age": []int64{12, 20},
	  "package": []interface{}{"com.echoface.be", "com.echoface.not"},
  })
  util.PanicIfErr(err, "retrieve fail, err:%v", err)
  fmt.Println("docs:", docs)
  fmt.Println("raw result:", roaringidx.FormatBitMapResult(scanner.GetRawResult().ToArray()))
  scanner.Reset()
}

Copyright (C) 2018, by gonghuan.dev.

Under the MIT License.

See the LICENSE file for details.

Documentation

Index

Constants

View Source
const (
	HolderNameDefault     = "default"
	HolderNameACMatcher   = "ac_matcher"
	HolderNameExtendRange = "ext_range"
)
View Source
const (
	ErrorBadConj = 0
	SkipBadConj  = 1
	PanicBadConj = 2

	IndexerTypeDefault = IndexerType(0)
	IndexerTypeCompact = IndexerType(1)
)
View Source
const (
	DebugLevel = iota
	InfoLevel
	ErrorLevel
)
View Source
const (
	LinearSkipDistance = 8
)
View Source
const (
	WildcardFieldName = BEField("_Z_")
)

Variables

View Source
var (
	LogLevel int           = InfoLevel // control defaultLogger log level
	Logger   BEIndexLogger = &DefaultLogger{}
)
View Source
var (
	BetterToCacheMaxItemsCount = 512
)

Functions

func HasHolderBuilder

func HasHolderBuilder(name string) bool

func InitHolderDefaults added in v0.2.0

func InitHolderDefaults()

func LogDebug added in v0.2.0

func LogDebug(format string, v ...interface{})

func LogDebugIf

func LogDebugIf(condition bool, format string, v ...interface{})

func LogErr added in v0.2.0

func LogErr(format string, v ...interface{})

func LogErrIf

func LogErrIf(condition bool, format string, v ...interface{})

func LogIfErr

func LogIfErr(err error, format string, v ...interface{})

func LogInfo added in v0.2.0

func LogInfo(format string, v ...interface{})

func LogInfoIf

func LogInfoIf(condition bool, format string, v ...interface{})

func PrintIndexEntries added in v0.2.0

func PrintIndexEntries(index BEIndex)

func PrintIndexInfo added in v0.2.0

func PrintIndexInfo(index BEIndex)

func PutCollector

func PutCollector(c *DocIDCollector)

func RegisterEntriesHolder

func RegisterEntriesHolder(name string, builder HolderBuilder)

func ValidDocID added in v0.2.0

func ValidDocID(id DocID) bool

func ValidIdxOrSize added in v0.2.0

func ValidIdxOrSize(v int) bool

Types

type Assignments

type Assignments map[BEField]Values

func (Assignments) Size

func (ass Assignments) Size() (size int)

type BEField

type BEField string

type BEIndex

type BEIndex interface {

	// Retrieve scan index data and retrieve satisfied document
	Retrieve(queries Assignments, opt ...IndexOpt) (DocIDList, error)

	// RetrieveWithCollector scan index data and retrieve satisfied document
	RetrieveWithCollector(Assignments, ResultCollector, ...IndexOpt) error

	// DumpEntries debug api
	DumpEntries(sb *strings.Builder)

	DumpIndexInfo(sb *strings.Builder)
	// contains filtered or unexported methods
}

func NewKGroupsBEIndex added in v0.2.0

func NewKGroupsBEIndex() BEIndex

type BEIndexLogger

type BEIndexLogger interface {
	Debugf(format string, v ...interface{})
	Infof(format string, v ...interface{})
	Errorf(format string, v ...interface{})
}

type BadConjBehavior added in v0.2.0

type BadConjBehavior int

type BoolValues

type BoolValues struct {
	Incl     bool     `json:"inc"`                // include: true exclude: false
	Value    Values   `json:"value"`              // values can be parser parse to id
	Operator ValueOpt `json:"operator,omitempty"` // value对应数值空间的描述符, 默认: EQ
}

BoolValues expression a bool logic like: (in) [15,16,17], (not in) [shanghai,yz] 默认opt: ValueOptEQ 包含: [5, *) 的布尔描述等同于 "排除: (-*, 5)"

func NewBoolValue added in v0.2.0

func NewBoolValue(op ValueOpt, value Values, incl bool) BoolValues

func NewGTBoolValue added in v0.2.0

func NewGTBoolValue(value int64) BoolValues

func NewLTBoolValue added in v0.2.0

func NewLTBoolValue(value int64) BoolValues

func (*BoolValues) JSONString

func (v *BoolValues) JSONString() string

func (*BoolValues) String

func (v *BoolValues) String() string

type BooleanExpr

type BooleanExpr struct {
	BoolValues
	Field BEField `json:"field"`
}

BooleanExpr expression a bool logic like: age (in) [15,16,17], city (not in) [shanghai,yz]

func NewBoolExpr

func NewBoolExpr(field BEField, inc bool, v Values) *BooleanExpr

func NewBoolExpr2

func NewBoolExpr2(field BEField, expr BoolValues) *BooleanExpr

type BuilderOpt added in v0.2.0

type BuilderOpt func(builder *IndexerBuilder)

func WithBadConjBehavior added in v0.2.0

func WithBadConjBehavior(v BadConjBehavior) BuilderOpt

func WithCacheProvider added in v0.2.0

func WithCacheProvider(provider CacheProvider) BuilderOpt

func WithIndexerType added in v0.2.0

func WithIndexerType(t IndexerType) BuilderOpt

type BuilderOption added in v0.2.0

type BuilderOption struct {
	// contains filtered or unexported fields
}

type CacheProvider added in v0.2.0

type CacheProvider interface {
	// Reset expire all existing cache data
	Reset()

	Get(conjID ConjID) ([]byte, bool)

	Set(conjID ConjID, data []byte)
}

CacheProvider a interface

type CompactBEIndex added in v0.2.0

type CompactBEIndex struct {
	// contains filtered or unexported fields
}

func NewCompactedBEIndex

func NewCompactedBEIndex() *CompactBEIndex

func (*CompactBEIndex) DumpEntries added in v0.2.0

func (bi *CompactBEIndex) DumpEntries(sb *strings.Builder)

func (*CompactBEIndex) DumpIndexInfo added in v0.2.0

func (bi *CompactBEIndex) DumpIndexInfo(sb *strings.Builder)

DumpIndexInfo summary info about this indexer +++++++ compact boolean indexing info +++++++++++ wildcard info: count: N default holder: {name:%s value_count:%d, max_entries:%d avg_entries:%d} field holder:

>field:%s {name: %s, value_count:%d max_entries:%d avg_entries:%d}
>field:%s {name: %s, value_count:%d max_entries:%d avg_entries:%d}

func (*CompactBEIndex) Retrieve added in v0.2.0

func (bi *CompactBEIndex) Retrieve(
	queries Assignments, opts ...IndexOpt) (result DocIDList, err error)

func (*CompactBEIndex) RetrieveWithCollector added in v0.2.0

func (bi *CompactBEIndex) RetrieveWithCollector(
	queries Assignments, collector ResultCollector, opts ...IndexOpt) (err error)

type ConjID

type ConjID uint64

ConjID max support 60bit len |--[ reserved(4bit) | size(8bit) | index(8bit) | negSign(1bit) | docID(43bit)]

func NewConjID

func NewConjID(docID DocID, index, size int) ConjID

NewConjID |--[ reserved(4bit) | size(8bit) | index(8bit) | negSign(1bit) | docID(43bit)]

func (ConjID) DocID

func (id ConjID) DocID() DocID

func (ConjID) Index

func (id ConjID) Index() int

func (ConjID) Size

func (id ConjID) Size() int

func (ConjID) String added in v0.2.0

func (id ConjID) String() string

type Conjunction

type Conjunction struct {
	Expressions map[BEField][]*BoolValues `json:"exprs"` // 同一个Conj内不允许重复的Field
}

func NewConjunction

func NewConjunction() *Conjunction

func (*Conjunction) AddBoolExprs

func (conj *Conjunction) AddBoolExprs(exprs ...*BooleanExpr) *Conjunction

AddBoolExprs append boolean expression, don't allow same field added twice in one conjunction

func (*Conjunction) AddExpression3

func (conj *Conjunction) AddExpression3(field string, include bool, values Values) *Conjunction

func (*Conjunction) Between added in v0.2.0

func (conj *Conjunction) Between(field BEField, l, h int64) *Conjunction

func (*Conjunction) CalcConjSize

func (conj *Conjunction) CalcConjSize() (size int)

func (*Conjunction) Exclude

func (conj *Conjunction) Exclude(field BEField, values Values) *Conjunction

func (*Conjunction) ExpressionCount added in v0.2.0

func (conj *Conjunction) ExpressionCount() (size int)

func (*Conjunction) GreatThan added in v0.2.0

func (conj *Conjunction) GreatThan(field BEField, value int64) *Conjunction

func (*Conjunction) In

func (conj *Conjunction) In(field BEField, values Values) *Conjunction

In any value in values is a **true** expression

func (*Conjunction) Include

func (conj *Conjunction) Include(field BEField, values Values) *Conjunction

func (*Conjunction) JSONString

func (conj *Conjunction) JSONString() string

func (*Conjunction) LessThan added in v0.2.0

func (conj *Conjunction) LessThan(field BEField, value int64) *Conjunction

func (*Conjunction) NotIn

func (conj *Conjunction) NotIn(field BEField, values Values) *Conjunction

NotIn any value in values is a **false** expression

func (*Conjunction) String

func (conj *Conjunction) String() string

type DefaultEntriesHolder

type DefaultEntriesHolder struct {
	Parser      parser.FieldValueParser
	FieldParser map[BEField]parser.FieldValueParser
	// contains filtered or unexported fields
}

DefaultEntriesHolder EntriesHolder implement base on hash map holder map<key, Entries> 默认容器,目前支持表达式最大256个field; 支持多个field复用默认容器; 见:Key编码逻辑 如果需要打破这个限制,可以自己实现容器.

func NewDefaultEntriesHolder

func NewDefaultEntriesHolder() *DefaultEntriesHolder

func (*DefaultEntriesHolder) CommitIndexingBETx added in v0.2.0

func (h *DefaultEntriesHolder) CommitIndexingBETx(tx IndexingBETx) error

func (*DefaultEntriesHolder) CompileEntries

func (h *DefaultEntriesHolder) CompileEntries() error

func (*DefaultEntriesHolder) DecodeTxData added in v0.2.0

func (h *DefaultEntriesHolder) DecodeTxData(data []byte) (TxData, error)

DecodeTxData decode data; used for building progress cache

func (*DefaultEntriesHolder) DumpEntries

func (h *DefaultEntriesHolder) DumpEntries(buffer *strings.Builder)

func (*DefaultEntriesHolder) DumpInfo added in v0.2.0

func (h *DefaultEntriesHolder) DumpInfo(buffer *strings.Builder)

DumpInfo {name: %s, value_count:%d max_entries:%d avg_entries:%d}

func (*DefaultEntriesHolder) EnableDebug

func (h *DefaultEntriesHolder) EnableDebug(debug bool)

func (*DefaultEntriesHolder) GetEntries

func (h *DefaultEntriesHolder) GetEntries(field *FieldDesc, assigns Values) (r EntriesCursors, e error)

func (*DefaultEntriesHolder) GetParser added in v0.2.0

func (*DefaultEntriesHolder) IndexingBETx added in v0.2.0

func (h *DefaultEntriesHolder) IndexingBETx(field *FieldDesc, bv *BoolValues) (TxData, error)

type DefaultLogger

type DefaultLogger struct {
}

DefaultLogger a console logger use fmt lib

func (*DefaultLogger) Debugf

func (l *DefaultLogger) Debugf(format string, v ...interface{})

func (*DefaultLogger) Errorf

func (l *DefaultLogger) Errorf(format string, v ...interface{})

func (*DefaultLogger) Infof

func (l *DefaultLogger) Infof(format string, v ...interface{})

type DocID

type DocID int64

type DocIDCollector

type DocIDCollector struct {
	// contains filtered or unexported fields
}

DocIDCollector Default Collector with removing duplicated doc

func NewDocIDCollector

func NewDocIDCollector() *DocIDCollector

func PickCollector

func PickCollector() *DocIDCollector

func (*DocIDCollector) Add

func (c *DocIDCollector) Add(docID DocID, _ ConjID)

func (*DocIDCollector) DocCount

func (c *DocIDCollector) DocCount() int

func (*DocIDCollector) GetDocIDs

func (c *DocIDCollector) GetDocIDs() (ids DocIDList)

func (*DocIDCollector) GetDocIDsInto

func (c *DocIDCollector) GetDocIDsInto(ids *DocIDList)

func (*DocIDCollector) Reset

func (c *DocIDCollector) Reset()

type DocIDList

type DocIDList []DocID

func (DocIDList) Contain

func (s DocIDList) Contain(id DocID) bool

func (DocIDList) Len

func (s DocIDList) Len() int

Len sort API

func (DocIDList) Less

func (s DocIDList) Less(i, j int) bool

func (DocIDList) Sub

func (s DocIDList) Sub(other DocIDList) (r DocIDList)

func (DocIDList) Swap

func (s DocIDList) Swap(i, j int)

type Document

type Document struct {
	ID   DocID          `json:"id"`   // 只支持int32最大值个Doc
	Cons []*Conjunction `json:"cons"` // conjunction之间的关系是或,具体描述可以看论文的表述
}

func NewDocument

func NewDocument(id DocID) *Document

func (*Document) AddConjunction

func (doc *Document) AddConjunction(cons ...*Conjunction) *Document

AddConjunction 一组完整的expression, 必须是完整一个描述文档的DNF Bool表达的条件组合*/

func (*Document) AddConjunctions

func (doc *Document) AddConjunctions(conj *Conjunction, others ...*Conjunction) *Document

func (*Document) JSONString

func (doc *Document) JSONString() string

func (*Document) String

func (doc *Document) String() string

String a more compacted string

type Entries

type Entries []EntryID

Entries a type define for sort option

func (Entries) DocString

func (s Entries) DocString() []string

func (Entries) Len

func (s Entries) Len() int

Len Entries sort API

func (Entries) Less

func (s Entries) Less(i, j int) bool

func (Entries) Swap

func (s Entries) Swap(i, j int)

type EntriesContainer

type EntriesContainer struct {
	// contains filtered or unexported fields
}

EntriesContainer for default Entries Holder, it can hold different field's entries, but for ACMatcher or other Holder, it may only hold entries for one field

func (*EntriesContainer) CreateHolder added in v0.2.0

func (c *EntriesContainer) CreateHolder(desc *FieldDesc) EntriesHolder

func (*EntriesContainer) DumpEntries added in v0.2.0

func (c *EntriesContainer) DumpEntries(buf *strings.Builder)

func (*EntriesContainer) DumpInfo added in v0.2.0

func (c *EntriesContainer) DumpInfo(buf *strings.Builder)

DumpInfo default holder: {name:%s value_count:%d, max_entries:%d avg_entries:%d} field holder:

>field:%s {name: %s, value_count:%d max_entries:%d avg_entries:%d}
>field:%s {name: %s, value_count:%d max_entries:%d avg_entries:%d}

func (*EntriesContainer) GetHolder added in v0.2.0

func (c *EntriesContainer) GetHolder(desc *FieldDesc) EntriesHolder

type EntriesCursor

type EntriesCursor struct {
	// contains filtered or unexported fields
}

EntriesCursor represent a posting list for one Assign (age, 15): [1, 2, 5, 19, 22] cursor: ^

func NewEntriesCursor

func NewEntriesCursor(key QKey, entries Entries) EntriesCursor

func (*EntriesCursor) DumpEntries

func (ec *EntriesCursor) DumpEntries(sb *strings.Builder)

DumpEntries in normal cases, posting-list has thousands/million ids, so here only dump part of (nearby) ids about current cursor [age,12]^<2,false>:<1,true>,<2,false><nil,nil>

func (*EntriesCursor) GetCurEntryID

func (ec *EntriesCursor) GetCurEntryID() EntryID

func (*EntriesCursor) SkipTo

func (ec *EntriesCursor) SkipTo(id EntryID) EntryID

type EntriesCursors added in v0.2.0

type EntriesCursors []EntriesCursor

type EntriesHolder

type EntriesHolder interface {
	EnableDebug(debug bool)

	DumpInfo(buffer *strings.Builder)

	DumpEntries(buffer *strings.Builder)

	// GetEntries retrieve all satisfied PostingList from holder
	GetEntries(field *FieldDesc, assigns Values) (EntriesCursors, error)

	// IndexingBETx holder tokenize/parse values into what its needed data
	// then wait IndexerBuilder call CommitAppend to apply 'Data' into holder
	// when all expression prepare success in a conjunction
	IndexingBETx(field *FieldDesc, bv *BoolValues) (TxData, error)

	// CommitIndexingBETx NOTE: builder will panic when error return,
	// because partial success for a conjunction will cause logic error
	CommitIndexingBETx(tx IndexingBETx) error

	// DecodeTxData decode data; used for building progress cache
	DecodeTxData(data []byte) (TxData, error)

	// CompileEntries finalize entries status for query, build or make sorted
	// according to the paper, entries must be sorted
	CompileEntries() error
}

EntriesHolder 存储索引的PostingList数据 目前的三种典型场景: 1. 内存KV 存储所有Field值对应的EntryID列表(PostingList) 2. AC自动机:用于将所有的values 构建生成AC自动机,对输入的语句找到匹配的PostingList 3. 1的一种扩展,引入网络\磁盘存储,内部维护一个LRU/LFU cache减轻内存压力

func NewEntriesHolder

func NewEntriesHolder(name string) EntriesHolder

type EntryID

type EntryID uint64

EntryID [--ConjID(60bit)--|--empty(3bit)--|--incl/excl(1bit)--]

const (
	MaxDocID = 0x7FFFFFFFFFF

	NULLENTRY EntryID = 0xFFFFFFFFFFFFFFFF
)

func NewEntryID

func NewEntryID(id ConjID, incl bool) EntryID

NewEntryID encode entry id |-- ConjID(60bit) --|-- empty(3bit) --|--incl/excl(1bit) --| |--[ size(8bit) | index(8bit) | negSign(1bit) | docID(43bit)]--|-- empty(3bit) --|--incl/excl(1bit) --|

func (EntryID) DocString

func (entry EntryID) DocString() string

func (EntryID) GetConjID

func (entry EntryID) GetConjID() ConjID

func (EntryID) IsExclude

func (entry EntryID) IsExclude() bool

func (EntryID) IsInclude

func (entry EntryID) IsInclude() bool

func (EntryID) IsNULLEntry

func (entry EntryID) IsNULLEntry() bool

type FieldCursor

type FieldCursor struct {
	// contains filtered or unexported fields
}

FieldCursor for a boolean expression: {"tag", "in", [1, 2, 3]} tag_2: [ID5] tag_1: [ID1, ID2, ID7]

func NewFieldCursor

func NewFieldCursor(cursors ...EntriesCursor) FieldCursor

func (*FieldCursor) DumpCursorEntryID added in v0.2.0

func (fc *FieldCursor) DumpCursorEntryID(sb *strings.Builder)

func (*FieldCursor) DumpEntries

func (fc *FieldCursor) DumpEntries(sb *strings.Builder)

func (*FieldCursor) GetCurEntryID

func (fc *FieldCursor) GetCurEntryID() EntryID

func (*FieldCursor) ReachEnd

func (fc *FieldCursor) ReachEnd() bool

func (*FieldCursor) SkipTo

func (fc *FieldCursor) SkipTo(id EntryID) (newMin EntryID)

type FieldCursors

type FieldCursors []FieldCursor

func (FieldCursors) Dump

func (s FieldCursors) Dump() string

func (FieldCursors) DumpJustCursors added in v0.2.0

func (s FieldCursors) DumpJustCursors() string

func (FieldCursors) Len

func (s FieldCursors) Len() int

Len FieldCursors sort API

func (FieldCursors) Less

func (s FieldCursors) Less(i, j int) bool

func (FieldCursors) Sort

func (s FieldCursors) Sort()

Sort golang's internal sort.Sort method have obvious overhead in performance.(runtime convTSlice) so here use a simple insert sort replace it. bz not much Element, may another quickSort here later

func (FieldCursors) Swap

func (s FieldCursors) Swap(i, j int)

type FieldDesc added in v0.2.0

type FieldDesc struct {
	FieldOption

	ID    uint64
	Field BEField
}

type FieldOption

type FieldOption struct {
	Container string // specify Entries holder for all tokenized value Entries
}

type HolderBuilder

type HolderBuilder func() EntriesHolder

type IndexOpt

type IndexOpt func(ctx *retrieveContext)

func WithCollector

func WithCollector(fn ResultCollector) IndexOpt

WithCollector specify a user defined collector

func WithDumpEntries

func WithDumpEntries() IndexOpt

func WithStepDetail

func WithStepDetail() IndexOpt

type IndexerBuilder

type IndexerBuilder struct {
	BuilderOption
	// contains filtered or unexported fields
}

func NewCompactIndexerBuilder

func NewCompactIndexerBuilder(opts ...BuilderOpt) *IndexerBuilder

func NewIndexerBuilder

func NewIndexerBuilder(opts ...BuilderOpt) *IndexerBuilder

func (*IndexerBuilder) AddDocument

func (b *IndexerBuilder) AddDocument(docs ...*Document) error

func (*IndexerBuilder) BuildIndex

func (b *IndexerBuilder) BuildIndex() BEIndex

func (*IndexerBuilder) ConfigField

func (b *IndexerBuilder) ConfigField(field BEField, settings FieldOption)

func (*IndexerBuilder) Reset added in v0.2.0

func (b *IndexerBuilder) Reset()

type IndexerSettings

type IndexerSettings struct {
	FieldConfig map[BEField]FieldOption
}

type IndexerType added in v0.2.0

type IndexerType int

type IndexingBETx added in v0.2.0

type IndexingBETx struct {
	EID  EntryID
	Data TxData
	// contains filtered or unexported fields
}

type KGroupsBEIndex added in v0.2.0

type KGroupsBEIndex struct {
	// contains filtered or unexported fields
}

func (*KGroupsBEIndex) DumpEntries added in v0.2.0

func (bi *KGroupsBEIndex) DumpEntries(sb *strings.Builder)

func (*KGroupsBEIndex) DumpIndexInfo added in v0.2.0

func (bi *KGroupsBEIndex) DumpIndexInfo(sb *strings.Builder)

func (*KGroupsBEIndex) Retrieve added in v0.2.0

func (bi *KGroupsBEIndex) Retrieve(
	queries Assignments, opts ...IndexOpt) (result DocIDList, err error)

func (*KGroupsBEIndex) RetrieveWithCollector added in v0.2.0

func (bi *KGroupsBEIndex) RetrieveWithCollector(
	queries Assignments, collector ResultCollector, opts ...IndexOpt) (err error)

type QKey

type QKey struct {
	// contains filtered or unexported fields
}

func NewQKey added in v0.2.0

func NewQKey(field BEField, v interface{}) QKey

func (*QKey) String

func (key *QKey) String() string

type ResultCollector

type ResultCollector interface {
	Add(id DocID, conj ConjID)

	GetDocIDs() (ids DocIDList)

	GetDocIDsInto(ids *DocIDList)
}

type Term added in v0.2.0

type Term struct {
	FieldID uint64
	IDValue uint64
}

func NewTerm added in v0.2.0

func NewTerm(fid, idValue uint64) Term

func (Term) String added in v0.2.0

func (tm Term) String() string

type TxData added in v0.2.0

type TxData interface {
	// BetterToCache if txData big enough, prefer to cache it; builder will
	// detect all expressions in a conjunction and decide whether cache it or not
	BetterToCache() bool

	// Encode serialize TxData for caching
	Encode() ([]byte, error)
}

type Uint64TxData added in v0.2.0

type Uint64TxData cache.Uint64ListValues

func (*Uint64TxData) BetterToCache added in v0.2.0

func (txd *Uint64TxData) BetterToCache() bool

func (*Uint64TxData) Encode added in v0.2.0

func (txd *Uint64TxData) Encode() ([]byte, error)

type ValueOpt added in v0.2.0

type ValueOpt int

ValueOpt value数值的描述符; 注意这里将其与最终的bool逻辑运算符区分开; 描述一个值: >5 代表了所有数值空间中[5, *)所有的值; 结合布尔描述 中的Incl/Excl 才构成一个布尔描述; 简而言之它用于描述存在"哪些值"

const (
	// ValueOptEQ ...数值范围描述符
	ValueOptEQ      ValueOpt = 0
	ValueOptGT      ValueOpt = 1
	ValueOptLT      ValueOpt = 2
	ValueOptBetween ValueOpt = 3
)

type Values

type Values interface{}

func NewInt32Values

func NewInt32Values(v int32, o ...int32) Values

func NewInt64Values

func NewInt64Values(v int64, o ...int64) Values

func NewIntValues

func NewIntValues(v int, o ...int) Values

func NewStrValues

func NewStrValues(v string, ss ...string) Values

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL