fts

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 25, 2023 License: Apache-2.0 Imports: 14 Imported by: 1

README

FTS

a SQLite-based full-text search engine


Concept

FTS is a Go library that facilitates fast full-text search with an index that can live in-memory or persisted to a file in the OS.

The index is backed by a (pure-Go-driver) SQLite database, leveraging its FTS5 feature to deliver the fastest, simplest, yet modular full-text search engine that can live completely in-memory.

This strategy makes FTS an ideal solution for small, concise key-value datasets that perform queries on the client side directly -- and even on bigger datasets where it promises an incredibly fast indexing speed. With it, the library tries to provide:

  • A simple and clean API for querying, inserting and deleting attributes to a full-text search index, as key-value pairs
  • Support for multiple data types, both for the keys and the values, leveraging Generics in Go and ensuring statically-typed attributes.
  • Support for complex search terms, as noted in the section 3. of the FTS5 documentation.
  • Blazing-fast indexing when comparing to other solutions -- all inserts are done in a single database transaction.
  • No sugar: plain SQL, leveraging a SQLite full-text search feature, no processing or manipulation.

Motivation

While exploring the Steam HTTP API, I've decided to create an alerting system (with webhooks) for when a certain product was on sale, with a desired discount (-% off).

In itself, that isn't much despite my complaints on handling the HTTP responses from Steam; however I was having an issue: since the Steam API only refers to products by their app_id, I'd still need to figure out what that was from a product's name. And it doesn't make much sense to just head over to their website and getting it from there. No. That was not the point of a Steam CLI app.

Fortunately, they expose their entire listing through a dedicated endpoint, so you can have all the app_id values alongside the product name. But this (as of today) consists of 177,590 entries. It's not too big, but it's not small either. Also, I wanted this app as something an end-user could run (as a client), and not have to communicate to a server to use full-text search. Elastic is crossed out, basically.

Dabbling with a couple of pure-Go solutions, I wasn't happy at all with the indexing times. I tried Bleve for example, since it was the Go go-to (hah!) library for full-text search and provided amazing features; but it was taking over 2 minutes just to index the app IDs.

I kept looking -- suddenly SQLite pops up as an option. FTS5 they called it. I didn't even know SQLite supported full-text search. But I love SQLite since day one and personally try to use it where I need extremely fast SQL in a project. Please never underestimate in-memory SQLite. I was sold. And I didn't regret it.

At the end, I can get my search results in under two seconds (wait that is a lot!) -- but under two seconds including:

  1. Reading a JSON document with the app ID and name entries, as previously downloaded separately (as the HTTP response body content).
  2. Creating an in-memory SQLite instance and initializing it with the FTS5 virtual table.
  3. Inserting all 177k items.
  4. Performing a query using a certain search term as provided by the caller
$ time go run github.com/zalgonoise/x/steam/cmd/steam search -name 'assetto corsa' 
# (...results)
go run github.com/zalgonoise/x/steam/cmd/steam search -name 'assetto corsa'  1.81s user 0.44s system 102% cpu 2.197 total

But you can also persist this SQLite database in the filesystem. What if you point to a local instance of an index?

$ time go run github.com/zalgonoise/x/steam/cmd/steam search -name 'assetto corsa' -db '/tmp/steam.db' 
# (...results)
go run github.com/zalgonoise/x/steam/cmd/steam search -name 'assetto corsa' -db '/tmp/steam.db' 0.57s user 0.40s system 142% cpu 0.681 total

That's about 600ms. It's still no ElasticSearch! Or Postgres if you configure it right! But it's something usable, and it's a tool to consider up to a certain size and complexity, and to consider having on the client instead of on the server! It was good enough for my use-case and I took it. Are there better ways of doing this? For sure. But I used SQLite. :)


Usage

FTS is served as a Go library. You need to import it in your project.

Getting FTS

You can get the library as a Go module, by importing it and running go mod tidy on the top-level directory of your module, or where go.mod is placed.

Then, you're able to initialize it with or without persistence, with or without a set of attributes used in an initial load. Also, the Indexer type also supports options to add observability to your full-text search engine, adding logging, metrics and tracing as optional configuration.

Creating an index

You can create a full-text search index from its concrete-type constructor fts.NewIndex(), or its interface constructor fts.New(); however, only the latter allows decorating the index with a logger, metrics and / or tracing in one-go. Regardless, when successful, both are an *fts.Index[K fts.SQLType, V fts.SQLType] type.

Options

If you choose to create an Indexer, you're free to add some configuration options, as described below:

Function Input type Description
fts.WithURI string Sets a path URI when connecting to the SQLite database, as a means to persist the database in the filesystem.
fts.WithLogger *slog.Logger Decorates the Indexer with the input slog.Logger.
fts.WithLogHandler slog.Handler Decorates the Indexer with a slog.Logger, using the input slog.Handler.
fts.WithMetrics fts.Metrics Decorates the Indexer with the input Metrics instance.
fts.WithTrace trace.Tracer Decorates the Indexer with the input trace.Tracer.

Below is an example where an in-memory index with a logger is created with some attributes, and is also searched on:

package main

import (
	"context"
	"log/slog"
	"os"

	"github.com/zalgonoise/fts"
)

func main() {
	ctx := context.Background()
	logger := slog.New(slog.NewTextHandler(os.Stderr, nil))
	
	// prepare some attributes to index
	attrs := []fts.Attribute[int64, string]{
		{Key: 1, Value: "some entry"},
		{Key: 2, Value: "another entry"},
		{Key: 3, Value: "gold entry"},
		{Key: 4, Value: "different entry"},
		// ...
	}

	// create an in-memory index with a logger, using the created attributes
	indexer, err := fts.New(attrs, fts.WithLogger(logger))
	if err != nil {
		logger.ErrorContext(ctx, "failed to create indexer", slog.String("error", err.Error()))
		
		os.Exit(1)
	}
	
	// search for gold
	results, err := indexer.Search(ctx, "gold")
	if err != nil {
		logger.ErrorContext(ctx, "couldn't get results", slog.String("error", err.Error()))

		os.Exit(1)
	}
	
	// find gold with key 3 and value "gold entry"
	logger.InfoContext(ctx, "found matches", slog.Int("num_results", len(results)))
	for i := range results {
		logger.InfoContext(ctx, "match entry", 
			slog.Int("index", i),
			slog.Int64("key", results[i].Key),
			slog.String("value", results[i].Value),
        )
    }
}

But similarly, you can even create an empty index and work with it within your application:

package index

import (
    "context"
    
    "github.com/zalgonoise/fts"
)

type MyIndex struct {
    index *fts.Index[int, string]
    // ...
}

func New(uri string) (*MyIndex, error) {
    index, err := fts.NewIndex[int, string](uri)
    if err != nil {
      return nil, err
    }
  		
    return &MyIndex{
      index: index,
    }, nil
}

func (i *MyIndex) Search(ctx context.Context, searchTerm string) ([]int, error) {
    // ...
}


func (i *MyIndex) Insert(ctx context.Context, attrs ...fts.Attribute[int, string]) error {
    // ...
}
Using the index

The Index type and Indexer interface offer a simple CRD set of operations and a graceful shutdown method:

type Indexer[K SQLType, V SQLType] interface {
	// Search will look for matches for the input value through the indexed terms, returning a collection of matching
	// Attribute, which will contain both key and (full) value for that match.
	//
	// This call returns an error if the underlying SQL query fails, if scanning for the results fails, or an
	// ErrNotFoundKeyword error if there are zero results from the query.
	Search(ctx context.Context, searchTerm V) (res []Attribute[K, V], err error)

	// Insert indexes new attributes in the Indexer, via the input Attribute's key and value content.
	//
	// A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case
	// multiple items are provided as input. This is especially useful for the initial load sequence.
	Insert(ctx context.Context, attrs ...Attribute[K, V]) error

	// Delete removes attributes in the Indexer, which match input K-type keys.
	//
	// A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case
	// multiple items are provided as input.
	Delete(ctx context.Context, keys ...K) error

	// Shutdown gracefully closes the Indexer.
	Shutdown(ctx context.Context) error
}

These methods can be called any time within the lifetime of an Index (before Shutdown is called or the application hypothetically crashes), meaning that callers are not limited to writing once and querying forever -- they can safely add new attributes to the index and remove attributes by their keys, too.

Performing complex queries

Complex queries with matcher expressions and globs are also supported, as noted in the SQLite FTS5 feature specification, which can be found in the section 3. of the FTS5 documentation.

This is by far the best resource to use reference for your complex search terms. The (value) data types that support these terms are character data types like string, []byte and []rune.

Disclaimer

This is not considered to be an enterprise-grade solution! While you're free to use this library as you please, it is highly recommended to explore other, more performant options. Many of them have drawbacks just like this one, many of them are usable via a server only, unlike this one. Please research and explore your options thoroughly!

Documentation

Index

Constants

View Source
const (
	ErrZero     = errs.Kind("zero")
	ErrNotFound = errs.Kind("not found")

	ErrAttributes = errs.Entity("attributes")
	ErrKeyword    = errs.Entity("keyword")
)

Variables

View Source
var (
	ErrZeroAttributes  = errs.WithDomain(errDomain, ErrZero, ErrAttributes)
	ErrNotFoundKeyword = errs.WithDomain(errDomain, ErrNotFound, ErrKeyword)
)

Functions

func WithLogHandler

func WithLogHandler(handler slog.Handler) cfg.Option[Config]

WithLogHandler decorates the Indexer with a slog.Logger, using the input slog.Handler.

func WithLogger

func WithLogger(logger *slog.Logger) cfg.Option[Config]

WithLogger decorates the Indexer with the input slog.Logger.

func WithMetrics

func WithMetrics(metrics Metrics) cfg.Option[Config]

WithMetrics decorates the Index with the input Metrics instance.

func WithTrace

func WithTrace(tracer trace.Tracer) cfg.Option[Config]

WithTrace decorates the Index with the input trace.Tracer.

func WithURI

func WithURI(uri string) cfg.Option[Config]

WithURI sets a path URI when connecting to the SQLite database, as a means to persist the database in the filesystem.

This option is not mandatory and thus set in the configuration if desired.

Types

type Attribute

type Attribute[K SQLType, V SQLType] struct {
	Key   K
	Value V
}

Attribute describes an entry to be added or returned from the Index, supporting types that are compatible with the SQLite FTS feature and implementation.

type Char

type Char interface {
	string | []byte | []rune
}

Char is a type constraint that comprises all types that represent character sets.

type Config

type Config struct {
	// contains filtered or unexported fields
}

Config defines optional settings in an Indexer

type Index

type Index[K SQLType, V SQLType] struct {
	// contains filtered or unexported fields
}

Index exposes fast full-text search by leveraging the SQLite FTS5 feature.

ref: https://www.sqlite.org/fts5.html

This implementation, using the modernc.org's pure-Go SQLite driver, allows having a very broadly-typed and yet lightweight full-text search implementation, with optional persistence.

Effectively, the Index uses a SQLite database as a cache, where it is storing (indexed) data as key-value pairs, allowing callers to find these key-value pairs by using keywords and search expressions against this data set.

The expressions, syntax and example phrases for these queries can be found in section 3. of the reference document above; providing means of performing more complex queries over indexed data.

func NewIndex

func NewIndex[K SQLType, V SQLType](uri string, attrs ...Attribute[K, V]) (*Index[K, V], error)

NewIndex creates an Index using the provided URI and set of Attribute.

If the provided URI is an empty string or ":memory:", the SQLite implementation will comply and run in-memory. Otherwise, the URI is treated as a database URI and validated as an OS path. The latter option allows persistence of the Index.

An error is returned if the database fails when being open, initialized, and loaded with the input Attribute.

func (*Index[K, V]) Delete

func (i *Index[K, V]) Delete(ctx context.Context, keys ...K) error

Delete removes attributes in the Index, which match input K-type keys.

A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case multiple items are provided as input.

func (*Index[K, V]) Insert

func (i *Index[K, V]) Insert(ctx context.Context, attrs ...Attribute[K, V]) error

Insert indexes new attributes in the Index, via the input Attribute's key and value content.

A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case multiple items are provided as input. This is especially useful for the initial load sequence.

func (*Index[K, V]) Search

func (i *Index[K, V]) Search(ctx context.Context, searchTerm V) (res []Attribute[K, V], err error)

Search will look for matches for the input value through the indexed terms, returning a collection of matching Attribute, which will contain both key and (full) value for that match.

This call returns an error if the underlying SQL query fails, if scanning for the results fails, or an ErrNotFoundKeyword error if there are zero results from the query.

func (*Index[K, V]) Shutdown

func (i *Index[K, V]) Shutdown(_ context.Context) error

Shutdown gracefully closes the Index SQLite database, by calling its Close method

type Indexer

type Indexer[K SQLType, V SQLType] interface {

	// Search will look for matches for the input value through the indexed terms, returning a collection of matching
	// Attribute, which will contain both key and (full) value for that match.
	//
	// This call returns an error if the underlying SQL query fails, if scanning for the results fails, or an
	// ErrNotFoundKeyword error if there are zero results from the query.
	Search(ctx context.Context, searchTerm V) (res []Attribute[K, V], err error)

	// Insert indexes new attributes in the Indexer, via the input Attribute's key and value content.
	//
	// A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case
	// multiple items are provided as input. This is especially useful for the initial load sequence.
	Insert(ctx context.Context, attrs ...Attribute[K, V]) error

	// Delete removes attributes in the Indexer, which match input K-type keys.
	//
	// A database transaction is performed in order to ensure that the query is executed as quickly as possible; in case
	// multiple items are provided as input.
	Delete(ctx context.Context, keys ...K) error

	// Shutdown gracefully closes the Indexer.
	Shutdown(ctx context.Context) error
}

Indexer describes the actions that a full-text search index should expose. It is declared as an interface so that a no-op implementation and observability decorators can be used interchangeably through a single constructor.

An Indexer exposes full-text by registering (and tokenizing) key-value pairs of data, that can be looked-up for matches with keywords that would be found in the value part of the data. The queries return sets of matching key-value pairs.

The Indexer allows creating, reading and deleting entries from the index. This ensures that the index can perform an initial load on its own; be updated with more recent data; and also pruning certain keys if needed.

Finally, it also exposes a Shutdown method allowing a graceful shutdown of the search engine.

The underlying index in an Indexer created by this package is an Index type, which leverages the SQLite FTS5 feature allowing a fast full-text search engine out-of-the-box, either in-memory or persisted to a file.

func IndexerWithLogs

func IndexerWithLogs[K SQLType, V SQLType](indexer Indexer[K, V], handler slog.Handler) Indexer[K, V]

IndexerWithLogs decorates the input Indexer with a slog.Logger using the input slog.Handler.

If the Indexer is nil, a no-op Indexer is returned. If the input slog.Handler is nil, a default text handler is created as a safe default. If the input Indexer is already a logged Indexer; then its logger's handler is replaced with this handler (input or default one).

This Indexer will not add any new functionality besides decorating the Indexer with log events.

func IndexerWithMetrics

func IndexerWithMetrics[K SQLType, V SQLType](indexer Indexer[K, V], m Metrics) Indexer[K, V]

IndexerWithMetrics decorates the input Indexer with a Metrics interface.

If the Indexer is nil, a no-op Indexer is returned. If the input Metrics is nil, a default Prometheus metrics handler is created as a safe default, on port 8080. If the input Indexer is already an Indexer with Metrics; then its Metrics is replaced with this one (input or default one).

This Indexer will not add any new functionality besides decorating the Indexer with metrics registry.

func IndexerWithTrace

func IndexerWithTrace[K SQLType, V SQLType](indexer Indexer[K, V], tracer trace.Tracer) Indexer[K, V]

IndexerWithTrace decorates the input Indexer with a trace.Tracer interface.

If the Indexer is nil, a no-op Indexer is returned. If the input Metrics is nil, a default Prometheus metrics handler is created as a safe default. If the input Indexer is already an Indexer with Metrics; then its Metrics is replaced with this one (input or default one).

This Indexer will not add any new functionality besides decorating the Indexer with metrics registry.

func New

func New[K SQLType, V SQLType](attributes []Attribute[K, V], opts ...cfg.Option[Config]) (Indexer[K, V], error)

New creates an Indexer with the input Attribute and configuration options.

This function allows creating an Index that is intended to be decorated with a logger, metrics and / or tracing.

func NoOp

func NoOp[K SQLType, V SQLType]() Indexer[K, V]

NoOp returns a no-op Indexer for the given key-value types K and V.

type Metrics

type Metrics interface {
	IncSearchesTotal()
	IncSearchesFailed()
	ObserveSearchLatency(ctx context.Context, dur time.Duration)

	IncInsertsTotal()
	IncInsertsFailed()
	ObserveInsertLatency(ctx context.Context, dur time.Duration)

	IncDeletesTotal()
	IncDeletesFailed()
	ObserveDeleteLatency(ctx context.Context, dur time.Duration)
}

type Number

type Number interface {
	int | int8 | int16 | int32 | int64 |
		uint | uint8 | uint16 | uint32 | uint64 |
		float32 | float64
}

Number is a type constraint that comprises all types that are integer or real numbers.

type SQLNullable

type SQLNullable interface {
	sql.NullBool |
		sql.NullInt16 | sql.NullInt32 | sql.NullInt64 |
		sql.NullFloat64 |
		sql.NullString
}

SQLNullable is a type constraint that comprises all supported sql.Null* types, in a full-text search context.

type SQLType

type SQLType interface {
	Number | Char | SQLNullable
}

SQLType is a type constraint that joins the Number, Char and SQLNullable type constraints.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL