diskhash

package module
v0.0.0-...-2897557 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 10, 2023 License: Apache-2.0 Imports: 9 Imported by: 2

README

diskhash

on-disk hash table index(mainly for WAL).

When will you need it?

If you are using WAL to store your data,

wal: https://github.com/rosedblabs/wal

you will get the positions to get the data from WAL, the common way to store the positions is to use an in-memory index(like rosedb).

But if you have a large amount of data, and it will take a lot of time to load the index into memory when you restart the system.

So, you can use diskhash to store the index on disk.

Can be used as a general hash table index(without wal)?

yes, you can use it as an on-disk hash table index, but the restriction is that the value must be fixed size. you can set the value size when you create the index, and once you set the value size, you can't change it.

But don't set the value size too large(1KB), the disk size maybe increase dramatically because of the write amplification. it is suitable for storing some metadata of your system.

Design Overview

The diskhash consists of two disk files: main and overflow. The file format is as follows:

File Format:
+---------------+---------------+---------------+---------------+-----+----------------+
|    (unused)   |    bucket0    |    bucket1    |    bucket2    | ... |     bucketN    |
+---------------+---------------+---------------+---------------+-----+----------------+

A file is divided into multiple buckets, if the table reaches the load factor, a new bucket will be appended to the end of the file. A bucket contains 31 slots, and an overflow offset which points to the overflow file buckets.

Bucket Format:
+-------------+-------------+-------------+-------------+-----+--------------+-----------------+
|   slot0     |   slot1     |   slot2     |   slot3     | ... |    slotN     | overflow_offset |
+-------------+-------------+-------------+-------------+-----+--------------+-----------------+

A slot contains a key hash value, and user-defined value.

Slot Format:
+-----------------------+--------------------------------+
|      key_hash(4B)     |          value(N Bytes)        |
+-----------------------+--------------------------------+

Getting Started

package main

import (
	"fmt"
	"github.com/rosedblabs/diskhash"
	"strings"
)

func main() {
	// open the table, specify the slot value length,
	// remember that you can't change it once you set it, and all values must be the same length.
	options := diskhash.DefaultOptions
	options.DirPath = "/tmp/diskhash-test"
	options.SlotValueLength = 10
	table, err := diskhash.Open(options)
	if err != nil {
		panic(err)
	}

	// don't forget to close the table!!!
	// some meta info will be saved when you close the table.
	defer func() {
		_ = table.Close()
	}()

	// put a key-value pair into the table.
	// the MatchKey function will be called when the key is matched.
	// When we store the data in the hash table, we only store the hash value of the key, and the raw value.
	// So when we get the data from hash table, even if the hash value of the key matches, that doesn't mean
	// the key matches because of hash collision.
	// So we need to provide a function to determine whether the key of the slot matches the stored key.
	err = table.Put([]byte("key1"), []byte(strings.Repeat("v", 10)), func(slot diskhash.Slot) (bool, error) {
		return true, nil
	})
	if err != nil {
		panic(err)
	}

	err = table.Get([]byte("key1"), func(slot diskhash.Slot) (bool, error) {
		fmt.Println("val =", string(slot.Value))
		return true, nil
	})
	if err != nil {
		panic(err)
	}

	err = table.Delete([]byte("key1"), func(slot diskhash.Slot) (bool, error) {
		return true, nil
	})
	if err != nil {
		panic(err)
	}
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var DefaultOptions = Options{
	DirPath:         os.TempDir(),
	SlotValueLength: 0,
	LoadFactor:      0.7,
}

DefaultOptions is the default options.

Functions

This section is empty.

Types

type MatchKeyFunc

type MatchKeyFunc func(Slot) (bool, error)

MatchKeyFunc is used to determine whether the key of the slot matches the stored key. And you must supply the function to the Put/Get/Delete methods.

Why we need this function?

When we store the data in the hash table, we only store the hash value of the key, and the raw value. So when we get the data from hash table, even if the hash value of the key matches, that doesn't mean the key matches because of hash collision. So we need to provide a function to determine whether the key of the slot matches the stored key.

type Options

type Options struct {
	// DirPath is the directory path to store the hash table files.
	DirPath string

	// SlotValueLength is the length of the value in each slot.
	// Your value lenght must be equal to the value length you set when creating the table.
	SlotValueLength uint32

	// LoadFactor is the load factor of the hash table.
	// The load factor is the ratio of the number of elements in the hash table to the table size.
	// If the ratio is greater than the load factor, the hash table will be expanded automatically.
	// The default value is 0.7.
	LoadFactor float64
}

Options is used to create a new diskhash table.

type Slot

type Slot struct {
	Hash  uint32 // the hash of the key
	Value []byte // raw value
}

Slot is the basic unit of a bucket. each slot contains a key hash and a value.

type Table

type Table struct {
	// contains filtered or unexported fields
}

Table is a hash table that stores data on disk. It consists of two files, the primary file and the overflow file. Each file is divided into multiple buckets, each bucket contains multiple slots.

The overview design of the hash table is as the Linear Hashing algorithm. See more: https://en.wikipedia.org/wiki/Linear_hashing https://dsf.berkeley.edu/jmh/cs186/f02/lecs/lec18_2up.pdf

func Open

func Open(options Options) (*Table, error)

Open opens a hash table. If the hash table does not exist, it will be created automatically. It will open the primary file, the overflow file and the meta file.

func (*Table) Close

func (t *Table) Close() error

Close closes the files of the hash table.

func (*Table) Delete

func (t *Table) Delete(key []byte, matchKey MatchKeyFunc) error

Delete deletes the key from the hash table. the parameter matchKey is described in the MatchKeyFunc.

func (*Table) Get

func (t *Table) Get(key []byte, matchKey MatchKeyFunc) error

Get gets the value of the key from the hash table. the parameter matchKey is described in the MatchKeyFunc.

func (*Table) Put

func (t *Table) Put(key, value []byte, matchKey MatchKeyFunc) error

Put puts a new ke/value pair to the hash table. the parameter matchKey is described in the MatchKeyFunc.

func (*Table) Size

func (t *Table) Size() uint32

Size returns the number of keys in the hash table.

func (*Table) Sync

func (t *Table) Sync() error

Sync flushes the data of the hash table to disk.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL