Juno(朱诺)
朱诺号木星探测器是目前人类是制造出最快的宇宙飞行器。
这里,朱诺是一个通用的易用的高性能的内存型广告检索引擎
目标
- 通用性: 能试用广告检索的大部分情况
- 易用性: 可以极低的代价从0搭建搜索引擎
- 高性能: 本身搜索性能20ms内,单机QPS>1-2K
- 插件化,可扩展: 检索各模块都是接口的形式,可以根据需求轻松定制
示例
通过代码构建索引
func main() {
// build index
idx := index.NewIndex("default")
_ = idx.Add(&document.DocInfo{
Id: 1,
Fields: []*document.Field{
{Name: "field1", IndexType: document.InvertedIndexType, Value: int64(1), ValueType: document.IntFieldType},
{Name: "field2", IndexType: document.InvertedIndexType, Value: "abc", ValueType: document.StringFieldType},
},
})
// search
s := search.NewSearcher()
s.Search(idx, query.NewTermQuery(idx.GetInvertedIndex().Iterator("field1", "1")))
fmt.Println(s.Docs)
}
mongo读数据建立索引
juno内置支持从mongo读取数据建立索引,支持两种模式,全量与增量
只需要用户实现 MongoParser接口, 将mongo的一条记录转化成DocInfo即可
// mongo解析结果
type ParserResult struct {
DataMod DataMod
Value *document.DocInfo
}
// mongo解析器
type MongoParser interface {
Parse([]byte, interface{}) *ParserResult
}
// example2 build index with mongo
func main() {
ctx, cancel := context.WithCancel(context.Background())
defer cancel()
// build index
b, e := builder.NewMongoIndexBuilder(&builder.MongoIndexManagerOps{
URI: "mongodb://13.250.108.190:27017",
IncInterval: 5, // 增量间隔
BaseInterval: 120, // 全量间隔
IncParser: &CampaignParser{},
BaseParser: &CampaignParser{},
BaseQuery: bson.M{"status": 1},
IncQuery: bson.M{"updated": bson.M{"$gte": time.Now().Unix() - 5, "$lte": time.Now().Unix()}},
DB: "new_adn",
Collection: "campaign",
ConnectTimeout: 10000,
ReadTimeout: 20000,
UserData: &UserData{},
Logger: logrus.New(),
OnBeforeInc: func(userData interface{}) interface{} {
ud, ok := userData.(*UserData)
if !ok {
return nil
}
incQuery := bson.M{"updated": bson.M{"$gte": ud.upTime - 5, "$lte": time.Now().Unix()}}
return incQuery
},
})
if e != nil {
fmt.Println(e)
return
}
if e := b.Build(ctx, "indexName"); e != nil {
fmt.Println("build error", e.Error())
}
// 获取构建的索引
tIndex := b.GetIndex()
}
查询语法
juno目前支持类sql,go struct两种查询语法,同时支持debug模式,可以获取特定文档没召回的原因
保留字段: where search index
类SQL查询语法 示例
基本query: 支持=,!=, >, >=, <, <=, in, has等操作符
- campaignId = 1
- campaignId in [1, 2, 3]
- adid has [1, 2, 3]
- price > 10
- price < 100
- campain != 5
复核query: 基本query的组合, 支持 and, or, not
- campainId = 1 && price > 10
- adid has [1, 2, 3] && (price > 10 || os = 1)
- adid has [1, 2, 3] || (not campaignId = 5)
- adid has not [1, 2, 3]
自定义函数:
func(fieldName, query) bool
- func1(price, 100)
- campainId = 1 && price > 10 && func(price, 100)
- regex_func(fieldName, "xxx")
文档过滤原因
Query: {adv = 1 && price > 10 | business1} && {adv = 1 && price > 10 && func(price, 100) | business2} docid in [1,2]
返回结果:1: business1=true, business=false;2:business1=fasle,business2=false
go struct 查询
q := query.NewOrQuery([]query.Query{
query.NewOrQuery([]query.Query{
query.NewTermQuery(invertIdx.Iterator("Platform", "1")),
}, nil),
query.NewOrQuery([]query.Query{
query.NewTermQuery(invertIdx.Iterator("AdvertiserId", "457")),
}, nil),
/* special example */
query.NewOrQuery([]query.Query{
query.NewTermQuery(storageIdx.Iterator("DeviceTypeV2")),
}, []check.Checker{
check.NewInChecker(storageIdx.Iterator("DeviceTypeV2"), devi, nil, false),
}),
query.NewAndQuery([]query.Query{
query.NewAndQuery([]query.Query{
query.NewTermQuery(storageIdx.Iterator("Price")),
}, []check.Checker{
check.NewInChecker(storageIdx.Iterator("Price"), pi, nil, false),
}),
query.NewAndQuery([]query.Query{
query.NewTermQuery(storageIdx.Iterator("AdvertiserId")),
}, []check.Checker{
check.NewNotChecker(storageIdx.Iterator("AdvertiserId"), ai, nil, false),
})}, nil)},
nil,
)
r := search.Search(index, q)
文档过滤原因
性能
未来特性
- 多数据源构建索引
- 索引持久化