hyb

package module
v0.0.0-...-09ad92d Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 11, 2016 License: MIT Imports: 10 Imported by: 0

README

hyb

Package hyb implements the HYB structure described in Type Less, Find More: Fast Autocompletion Search with a Succinct Index by Holger Bast and Ingmar Weber. It provides an index which gives the word completions of the last query word and returns the best hits for any of those completions.

Requirements

This package uses SIMD-BP128 integer encoding and decoding, so it requires an x86_64/AMD64 CPU that supports SSE2 instructions.

Installation

go get github.com/robskie/hyb

Usage


// Sample document structure
type Doc struct {
	ID       int
	Keywords []string
	Rank     int
}

// Create a builder and add documents to it
builder := hyb.NewBuilder()
for _, d := range docs {
  builder.Add(d.ID, d.Keywords, d.Rank)
}

// Build the index
index := builder.Build()

// Search the index
result := &hyb.Result{}
query := strings.Fields("abc")
index.Search(query, result)

// Get the top 10 hits
hits := result.TopHits(10)
for hits.Next() {
  id := hits.ID()
}

// Get the top 5 completions
comps := result.TopCompletions(5)
for comps.Next() {
  c := comps.Completion()
}

API Reference

Godoc documentation can be found here.

Benchmarks

The machine used in this benchmark is a Core i5 at 2.3GHz. The test index contains 1159741 postings taken from 357007 movie titles. The benchmark is done by choosing a random movie and taking all its prefixes and using those as query parameters. For example, given the movie, "Alien", the resulting queries would be "A", "Al", "Ali", "Alie", and "Alien" in that order.

You can run this benchmark on your machine by typing this command go test github.com/robskie/hyb -bench=.* in terminal.

BenchmarkIndexSearch-4	    5000	   386297 ns/op

Documentation

Overview

Package hyb implements the HYB structure described in the paper — Type Less, Find More: Fast Autocompletion Search with a Succinct Index by Holger Bast and Ingmar Weber. It provides an index which gives the word completions of the last query word and returns the best hits for any of those completions.

See https://people.mpi-inf.mpg.de/~bast/papers/autocompletion-sigir.pdf for more details.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Builder

type Builder struct {
	// contains filtered or unexported fields
}

Builder creates a searchable index from the added documents.

func NewBuilder

func NewBuilder() *Builder

NewBuilder creates an empty builder.

func (*Builder) Add

func (b *Builder) Add(id int, keywords []string, rank int)

Add adds a document given its ID, search keywords, and rank.

func (*Builder) Build

func (b *Builder) Build() *Index

Build creates an index.

func (*Builder) Delete

func (b *Builder) Delete(id int)

Delete removes a document given its ID.

type Completion

type Completion struct {
	Word string
	Hits int
}

Completion represents the completions of the last query word.

type Completions

type Completions struct {
	// contains filtered or unexported fields
}

Completions iterates over the completions of the last query word.

func (*Completions) Completion

func (c *Completions) Completion() Completion

Completion returns the next word completion.

func (*Completions) Next

func (c *Completions) Next() bool

Next increments the iterator to the next completion. It returns false if there are no more results to go through.

type Hits

type Hits struct {
	// contains filtered or unexported fields
}

Hits iterates over the result of a search.

func (*Hits) ID

func (h *Hits) ID() int

ID returns the ID of the next result.

func (*Hits) Next

func (h *Hits) Next() bool

Next increments the iterator to the next result. It returns false if there are no more results to go through.

type Index

type Index struct {
	// contains filtered or unexported fields
}

Index represents a group of searchable documents.

func NewIndex

func NewIndex() *Index

NewIndex returns an empty index. Call Index.Read to populate it.

func (*Index) GobDecode

func (idx *Index) GobDecode(data []byte) error

GobDecode decodes an index from gob streams.

func (*Index) GobEncode

func (idx *Index) GobEncode() ([]byte, error)

GobEncode transforms an index into gob streams.

func (*Index) Read

func (idx *Index) Read(r io.Reader) error

Read deserializes the index.

func (*Index) Search

func (idx *Index) Search(query []string, prev *Result)

Search performs a search on the index given a query. If this search is a continuation of a previous search, prev should point to the previous result. This speeds up the search because it only needs to consider the documents included in the previous result.

func (*Index) Size

func (idx *Index) Size() int

Size returns the size of the index in bytes.

func (*Index) Write

func (idx *Index) Write(w io.Writer) error

Write serializes the index.

type Result

type Result struct {
	// contains filtered or unexported fields
}

Result contains the search result.

func (*Result) Completions

func (r *Result) Completions() *Completions

Completions returns all word completions of the last query word sorted by decreasing number of hits.

func (*Result) Hits

func (r *Result) Hits() *Hits

Hits returns all the IDs that match a given query sorted by decreasing rank.

func (*Result) TopCompletions

func (r *Result) TopCompletions(k int) *Completions

TopCompletions returns the top k completions of the last query word sorted by decreasing number of hits.

func (*Result) TopHits

func (r *Result) TopHits(k int) *Hits

TopHits returns the top k document IDs that match the given query sorted by decreasing rank.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL