symbols

package module
v1.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 10, 2021 License: MIT Imports: 6 Imported by: 0

README

Symbols

Test Go Report Card GoDoc codecov MIT License

GC-friendly string store

This library helps to store big number of strings in structure with small number of pointers to make it friendly to Go garbage collector.

Motivation

As described in article Avoiding high GC overhead with large heaps, a large number of pointers make Go Garbage Collector life harder. So what if you need to deal with strings, a lot of strings? Every string is a pointer. You need to hide this pointers from Go runtime.

It's helpful for persistent caches and text resources.

Philosophy

As small as possible, as fast as possible.

Usage

package main

import (
	"fmt"

	"github.com/vporoshok/symbols"
)

// User is a domain model
type User struct {
	ID     int
	Name   string
	Labels []string
}

// UserCache is a persistent cache of users
type UserCache struct {
	data map[int]cachedUser
	dict symbols.Dictionary // using Dictionary to deduplicate labels
}

// cachedUser is a special struct to glue symbols with domain model
type cachedUser struct {
	name, labels symbols.Symbol
}

// Add user to cache
func (cache *UserCache) Add(user User) {
	if _, ok := cache.data[user.ID]; !ok {
		if cache.data == nil {
			cache.data = make(map[int]cachedUser)
		}
		cache.data[user.ID] = cachedUser{
			name:   cache.dict.AddString(user.Name),
			labels: symbols.AddStrings(&cache.dict, user.Labels),
		}
	}
}

// Get user from cache by id
func (cache UserCache) Get(id int) (User, bool) {
	if user, ok := cache.data[id]; ok {
		return User{
			ID:     id,
			Name:   cache.dict.GetString(user.name),
			Labels: symbols.GetStrings(cache.dict, user.labels),
		}, true
	}
	return User{}, false
}

func main() {
	cache := new(UserCache)
	cache.Add(User{
		ID:     1,
		Name:   "John",
		Labels: []string{"developer", "golang", "bicycle"},
	})
	cache.Add(User{
		ID:     2,
		Name:   "Mary",
		Labels: []string{"developer", "golang", "running"},
	})
	cache.Add(User{
		ID:     3,
		Name:   "Albert",
		Labels: []string{"manager", "bicycle", "running"},
	})
	fmt.Println(cache.Get(1))
	fmt.Println(cache.Get(2))
	fmt.Println(cache.Get(3))
	// Dictionary now contain strings
	// John, Mary, Albert
	// developer, golang, bicycle, running, manager
	// and composition of John's, Mary's and Albert's labels as 3-byte strings
	fmt.Println(cache.dict.Len()) // 11
}

How it works

Store

Store has two parts:

  1. slice of pages for small strings;
  2. slice of long strings as is;

Short strings (less or equal 1<<12 - 1 bytes) stored in pages (1 MB byte arrays), for example:

[J, o, h, n, d, e, v, e, l, o, p, e, r, g, o, l, a, n, g, b, i, c, y, c, l, e, M, a, r, y, ...]

And Symbol for it represents next bits:

  • 1 bit with zero (see long strings store);
  • 31 bit with page number;
  • 12 bit with length of string;
  • 20 bit with index of string in the page;

If string to be add is longer then rest of current page, a new page is being created, so pages may be incomplete (less than 2KB per 1MB page, less than 0.2%).

Long strings stored as is in separate slice and Symbol for it represents index in this slice with 1 at first bit.

Dictionary

Dictionary is a Store Wrapper to deduplicate strings. It use map of xxhashes of strings as short index and long index map[string]Symbol on collision short index.

See also

Original Phil's Pearl projects stringbank and intern

  • store all strings independ of it length in pages that may cause a lot of unused space;
  • use slices as pages that add more pointers and unneeded length int per page;
  • use maphash to deduplicate, that is slower than xxhash;

Roadmap

  • Optimize long string store;

Documentation

Overview

Example
package main

import (
	"fmt"

	"github.com/vporoshok/symbols"
)

// User is a domain model
type User struct {
	ID     int
	Name   string
	Labels []string
}

// UserCache is a persistent cache of users
type UserCache struct {
	data map[int]cachedUser
	dict symbols.Dictionary // using Dictionary to deduplicate labels
}

// cachedUser is a special struct to glue symbols with domain model
type cachedUser struct {
	name, labels symbols.Symbol
}

// Add user to cache
func (cache *UserCache) Add(user User) {
	if _, ok := cache.data[user.ID]; !ok {
		if cache.data == nil {
			cache.data = make(map[int]cachedUser)
		}
		cache.data[user.ID] = cachedUser{
			name:   cache.dict.AddString(user.Name),
			labels: symbols.AddStrings(&cache.dict, user.Labels),
		}
	}
}

// Get user from cache by id
func (cache UserCache) Get(id int) (User, bool) {
	if user, ok := cache.data[id]; ok {
		return User{
			ID:     id,
			Name:   cache.dict.GetString(user.name),
			Labels: symbols.GetStrings(cache.dict, user.labels),
		}, true
	}
	return User{}, false
}

func main() {
	cache := new(UserCache)
	cache.Add(User{
		ID:     1,
		Name:   "John",
		Labels: []string{"developer", "golang", "bicycle"},
	})
	cache.Add(User{
		ID:     2,
		Name:   "Mary",
		Labels: []string{"developer", "golang", "running"},
	})
	cache.Add(User{
		ID:     3,
		Name:   "Albert",
		Labels: []string{"manager", "bicycle", "running"},
	})
	fmt.Println(cache.Get(1))
	fmt.Println(cache.Get(2))
	fmt.Println(cache.Get(3))
	// Dictionary now contain strings
	// John, Mary, Albert
	// developer, golang, bicycle, running, manager
	// and composition of John's, Mary's and Albert's labels as 3-byte strings
	fmt.Println(cache.dict.Len()) // 11
}
Output:

{1 John [developer golang bicycle]} true
{2 Mary [developer golang running]} true
{3 Albert [manager bicycle running]} true
11

Index

Examples

Constants

View Source
const (

	// PageSize 1MB
	PageSize = 1 << pageSizeLog
	// LongGuard length of string to store outside of pages and not to check on duplicates
	LongGuard = 1<<(32-pageSizeLog) - 1
)

Variables

This section is empty.

Functions

func GetStrings

func GetStrings(store interface{ GetString(Symbol) string }, sym Symbol) []string

GetStrings restore slice of string added in store with AddStrings

Types

type Dictionary

type Dictionary struct {
	Store
	// contains filtered or unexported fields
}

Dictionary deduplicate wrapper for Store

Use it on filling Store and use dict.Store in runtime to reduce memory utilization.

func (*Dictionary) AddString

func (dict *Dictionary) AddString(s string) Symbol

AddString check string to duplicate and return existed Symbol or create new

func (*Dictionary) DropIndex

func (dict *Dictionary) DropIndex()

DropIndex clean map to reduce memory utilization

func (Dictionary) State added in v1.3.0

func (dict Dictionary) State() DictionaryState

State of the dictionary

type DictionaryState added in v1.3.0

type DictionaryState struct {
	StoreState
	ShortIndex, LongIndex int
}

DictionaryState static information of the dictionary

type Store

type Store struct {
	// contains filtered or unexported fields
}

Store of strings

func Restore added in v1.1.0

func Restore(r io.Reader) (Store, error)

Restore data from compressed dump

func (*Store) AddString

func (store *Store) AddString(s string) Symbol

AddString to store

func (Store) Dump added in v1.1.0

func (store Store) Dump() io.ReadCloser

Dump data with comression

func (Store) GetString

func (store Store) GetString(sym Symbol) string

GetString from store by reference

func (Store) Len

func (store Store) Len() int

Len count of strings in store

func (Store) State added in v1.3.0

func (store Store) State() StoreState

State of the store

type StoreState added in v1.3.0

type StoreState struct {
	Symbols, Pages, LongStrings int
}

StoreState static information of the store

type Symbol

type Symbol uint64

Symbol reference to string stored in Symbols

func AddStrings

func AddStrings(store interface{ AddString(string) Symbol }, ss []string) Symbol

AddStrings add every string in given store, convert slice of getted symbols to string and add it to store

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL