cachelog

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 6, 2021 License: BSD-2-Clause Imports: 19 Imported by: 0

README

cachelog

GoDoc

Cachelog provides a log structured cache. Put()s are not fsynced to disk actively. The advantage is that they're way faster, the downside is that you might lose cache data. If you want to ensure to never have stale data, you can first call Delete() to clear out a piece of the cache. Delete()s are fsynced to disk in separate log files, but those are tiny writes.

The intended use case is for storing a partial (but correct) cache of file data. Get/Put/Delete all take a filename and an offset to operate on. The filename isn't treated specially so can be any []byte you want. The filename is stored a log in the logs, so using 1MB filenames will use a lot of space.

Guarantees:

  • If Put() succeeds and your machine/disk doesn't crash, Get() will return the latest data. If your machine/disk does crash, some Put() calls might be lost and old data will be served once it starts up again. Assuming no CPU/memory corruption, no incorrect data will ever be stored. Bit flips on disk are detected and will cause data to be discarded (through the same mechanism we detect partial log entries: they're all md5 hashed).
  • If Delete() succeeds, that data will never be served again regardless of crashes.

Put() writes to disk sequentially.

Get() does random 8 byte disk writes to indicate that the blocks have been recently accessed. If you don't need expiry, set Config.Expiry to the maximum and you'll have one write per block every 2.9 years.

You can configure garbage collection through Config.MaxGarbageRatio. The default of 0.75 allows 75% of the data in the logs to be stale before garbage collection is started. Garbage collection involves copying the still relevant blocks to the latest log file (sorting and merging them while at it) and then dropping the old file. Don't set this too low, as you'll be spending lots of IOPS on copying still active data.

Documentation

Overview

Package cachelog is a log structured cache. See the README for details.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Cache

type Cache struct {
	// contains filtered or unexported fields
}

func Open

func Open(dir string, cfg *Config) (*Cache, error)

Open a new cache, possibly reading in old data from disk. Passing a nil Config uses the default config.

func (*Cache) Delete

func (c *Cache) Delete(filename []byte, offset int64, size int) error

Delete data from the cache. The deletion marker is synced to a separate log file, which guarantees the old data is never served again - even in the face of machine crashes.

func (*Cache) GarbageCollection

func (c *Cache) GarbageCollection()

GarbageCollection runs the garbage collection if needed. If the MaxGarbageRatio isn't violated, GarbageCollection does nothing. The garbage collection process involves copying the still relevant data from the oldest log files to the active log file, this then allows us to remove the old log file and all the stale data with it. Multiple files might be collected in one run, but the active write log file is never considered even if it contains dead data.

func (*Cache) Get

func (c *Cache) Get(filename []byte, offset int64, buf []byte) (missing []ranges.Entry, _ error)

Get reads into buf and fills it with data from filename@offset. For data not in the cache, those bytes in buf will be untouched and those ranges will be returned in $missing.

func (*Cache) Put

func (c *Cache) Put(filename []byte, offset int64, buf []byte) error

Put writes data to the log. We guarantee that future Get()s will only get this new data if there is no machine/disk crash. If the machine/disk does crash, old data might be served. To avoid this, see Delete().

type Config

type Config struct {
	// Expiry controls how long a block should not be read/written before it is discarded during garbage collection. The expiry is a minimum, the block will remain available until it is actually garbage collected.
	// The default Expiry is 1 week.
	Expiry time.Duration
	// LogFileSize controls how large to make each log file. The default is 10MB. Files might be slightly larger.
	LogFileSize int
	// MaxGarbageRatio is a number between 0 and 1 that indicates the threshold at which to start garbage collection.
	// Don't set this too low to avoid excessive IOPS.
	// The default is 0.75, which starts garbage collection once 75% of the data is dead.
	MaxGarbageRatio float64
}

Directories

Path Synopsis
Package ranges keeps track of ranges with arbitrary Data attached.
Package ranges keeps track of ranges with arbitrary Data attached.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL