storage

package
v0.0.0-...-fc4b8ed Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Nov 4, 2022 License: BSD-2-Clause Imports: 27 Imported by: 0

Documentation

Index

Constants

View Source
const HashSize = 32

HashSize is the number of bytes in the hash values returned to represent chunks of data.

Variables

View Source
var (
	ErrHashNotFound       = errors.New("hash not found")
	ErrHashMismatch       = errors.New("hash value mismatch")
	ErrIndexMagicWrong    = errors.New("index entry has incorrect magic number")
	ErrBlobMagicWrong     = errors.New("blob has incorrect magic number")
	ErrPrematureEndOfData = errors.New("premature end of data")
)
View Source
var BlobMagic = [4]byte{'B', 'L', '0', 'B'}
View Source
var IdxMagic = [4]byte{'I', 'd', 'x', '2'}

Functions

func DecodeBlob

func DecodeBlob(blob []byte) ([]byte, error)

DecodeBlob takes a blob read from a pack file (as per the specs from a BlobLocation) and returns the chunk stored in that blob.

func DecodePackFile

func DecodePackFile(r io.Reader, f func(chunk []byte)) error

Given a reader for a pack file, decodes it into blobs and then calls the given callback function for each blob's chunk.

func InitBandwidthLimit

func InitBandwidthLimit(uploadBytesPerSecond, downloadBytesPerSecond int)

func NewHashesReader

func NewHashesReader(hashes []Hash, sem chan bool, backend Backend) io.ReadCloser

NewHashesReader returns an io.ReadCloser that reads multiple hashes in parallel from the given storage backend. It supplies the bytes of the hashes' chunks concatenated together into a single stream. If non-nil, the sem parameter is used to limit the number of active readers; otherwise a fixed number of reader goroutines are launched.

func NewLimitedDownloadReader

func NewLimitedDownloadReader(r io.Reader) io.Reader

func NewLimitedUploadReader

func NewLimitedUploadReader(r io.Reader) io.Reader

func PackBlob

func PackBlob(h Hash, chunk []byte, packFileSize int64) (idx, pack []byte)

PackBLob takes (hash, chunk) pairs and the current size of the pack file and converts them to the representation to be stored in index and pack files, returning the bytes to append to the index and back files to store the chunk.

func SetLogger

func SetLogger(l *u.Logger)

Types

type Backend

type Backend interface {
	// String returns the name of the Backend in the form of a string.
	String() string

	// LogStats reports any statistics that the Backend may have gathered
	// during the course of its operation.
	LogStats()

	// Fsck checks the consistency of the data in the Backend and reports
	// any problems found via the logger specified by SetLogger.
	Fsck()

	// Write saves the provided chunk of data to storage, returning a Hash
	// that uniquely identifies it. Any write errors are fatal and
	// terminate the program.
	Write(chunk []byte) Hash

	// SyncWrites ensures that all chunks of data provided to Write have
	// in fact reached permanent storage. Calls to Read may not find
	// data stored by Write if SyncWrites hasn't been called after the
	// call to Write.
	SyncWrites()

	// Read returns a io.ReadCloser that provides the chunk for the given
	// hash. If the given hash doesn't exist in the backend, an error is
	// returned.
	Read(hash Hash) (io.ReadCloser, error)

	// HashExists reports whether a blob of data with the given hash exists
	// in the storage backend.
	HashExists(hash Hash) bool

	// Hashes returns a map that has all of the hashes stored by the
	// storage backend.
	Hashes() map[Hash]struct{}

	// WriteMetadata saves the given data in the storage backend,
	// associating it with the given name. It's mostly used for storing
	// data that we don't want to run through the dedupe process and want
	// to be able to easily access directly by name.
	WriteMetadata(name string, data []byte)

	// ReadMetadata returns the metadata for a given name that was stored
	// with WriteMetadata.
	ReadMetadata(name string) []byte

	// MetadataExists indicates whether the given named metadata is
	// present in the storage backend.
	MetadataExists(name string) bool

	// ListMetadata returns a map from all of the existing metadata
	// to the time each one was created.
	ListMetadata() map[string]time.Time
}

Backend describes a general interface for low-level data storage; users can provide chunks of data that a storage backend will store (on disk, in the cloud, etc.), and are returned a Hash that identifies each such chunk. Implementations should apply deduplication so that if the same chunk is supplied multiple times, it will only be stored once.

Note: it isn't safe in general for multiple threads to call Backend methods concurrently, though the Read() method may be called by multiple threads (as long as others aren't calling other Backend methods).

func NewCompressed

func NewCompressed(backend Backend) Backend

NewCompressed returns a new storage.Backend that applies gzip compression to the contents of chunks stored in the provided underlying backend. Note: the contents of metadata files are not compressed.

func NewDisk

func NewDisk(dir string) Backend

NewDisk returns a new storage.Backend that stores data to the given dir. This directory should be empty the first time NewDisk is called with it.

func NewEncrypted

func NewEncrypted(backend Backend, passphrase string) Backend

NewEncrypted returns a storage.Backend that applies AES encryption to the chunk data stored in the underlying storage.Backend. Note: metadata contents and the names of named hashes are not encrypted.

func NewGCS

func NewGCS(options GCSOptions) Backend

func NewMemory

func NewMemory() Backend

NewMemory returns a storage.Backend that stores all data--blobs, hashes, metadata, etc., in RAM. It's really only useful for testing of code built on top of storage.Backend, where we may want to save the trouble of saving a bunch of stuff to disk.

type BlobLocation

type BlobLocation struct {
	PackName string
	Offset   int64
	Length   int64
}

External representation of the location of a blob in a pack file that's returned to callers.

type ChunkIndex

type ChunkIndex struct {
	// contains filtered or unexported fields
}

ChunkIndex maintains an index from hashes to the locations of their blobs in pack files.

func (*ChunkIndex) AddIndexFile

func (c *ChunkIndex) AddIndexFile(packName string, idx []byte) (int, error)

Takes the entire contents of an index file and associates its index entries with the given pack file name. Returns the number of entries added and the error (if any).

func (*ChunkIndex) AddSingle

func (c *ChunkIndex) AddSingle(hash Hash, packName string, offset, length int64)

func (*ChunkIndex) Hashes

func (c *ChunkIndex) Hashes() map[Hash]struct{}

func (*ChunkIndex) Lookup

func (c *ChunkIndex) Lookup(hash Hash) (BlobLocation, error)

type FileStorage

type FileStorage interface {
	// CreateFile returns a RobustWriteCloser for a file with the given name;
	// a fatal error occurs if a file with that name already exists.
	CreateFile(name string) RobustWriteCloser

	// ReadFile returns the contents of the given file. If length is zero, the
	// whole file contents are returned; otherwise the segment starting at offset
	// with given length is returned.
	//
	// TODO: it might be more idiomatic to return e.g. an io.ReadCloser,
	// but between the GCS backend needing to be able to retry reads and
	// the fact that callers usually want a []byte in the end anyway, this
	// seems more straightforward overall.
	ReadFile(name string, offset int64, length int64) ([]byte, error)

	// ForFiles calls the given callback function for all files with the
	// given directory prefix, providing the file path and its creation
	// time.
	ForFiles(prefix string, f func(path string, created time.Time))

	String() string

	// Fsck checks the validity of the stored data.  The returned Boolean
	// value indicates whether or not the caller should continue and
	// perform its own checks on the contents of the data as well.
	Fsck() bool
}

FileStorage is a simple abstraction for a storage system.

type GCSOptions

type GCSOptions struct {
	BucketName string
	ProjectId  string
	// Optional. Will use "us-central1" if not specified.
	Location string

	// zero -> unlimited
	MaxUploadBytesPerSecond   int
	MaxDownloadBytesPerSecond int
}

type Hash

type Hash [HashSize]byte

Hash encodes a fixed-size secure hash of a collection of bytes.

func HashBytes

func HashBytes(b []byte) Hash

HashBytes computes the SHAKE256 hash of the given byte slice.

func NewHash

func NewHash(b []byte) (h Hash)

func (Hash) String

func (h Hash) String() string

String returns the given Hash as a hexidecimal-encoded string.

type HashSplitter

type HashSplitter struct {
	// contains filtered or unexported fields
}

The lowest bits seem to be most useful; splitting based on, say, 4 bits in the middle is fiddly, especially when it spans the 16th bit.

func NewHashSplitter

func NewHashSplitter(splitBits uint) *HashSplitter

func (*HashSplitter) AddByte

func (hs *HashSplitter) AddByte(b byte)

func (*HashSplitter) Reset

func (hs *HashSplitter) Reset()

func (*HashSplitter) SplitFromReader

func (hs *HashSplitter) SplitFromReader(reader io.ByteReader) (ret []byte)

func (*HashSplitter) SplitNow

func (hs *HashSplitter) SplitNow() bool

type MerkleHash

type MerkleHash struct {
	Hash  Hash
	Level uint8
}

func DecodeMerkleHash

func DecodeMerkleHash(r io.Reader) (sh MerkleHash)

func MerkleFromSingle

func MerkleFromSingle(hash Hash) MerkleHash

func NewMerkleHash

func NewMerkleHash(b []byte) MerkleHash

func SplitAndStore

func SplitAndStore(r io.Reader, backend Backend, splitBits uint) MerkleHash

Split the bytes of the given io.Reader using a rolling checksum into chunks of size (on average) 1<<splitBits. Return the hash for the root of a Merkle tree that identifies the data stored in the given storage backend.

func (*MerkleHash) Bytes

func (sh *MerkleHash) Bytes() []byte

func (*MerkleHash) Fsck

func (h *MerkleHash) Fsck(backend Backend)

func (*MerkleHash) NewReader

func (h *MerkleHash) NewReader(sem chan bool, backend Backend) io.ReadCloser

type PackFileBackend

type PackFileBackend struct {
	// contains filtered or unexported fields
}

PackFileBackend implements the storage.Backend interface, but depends on an implementation of the FileStorage interface to handle the mechanics of storing and retrieving files. In turn, we can implement functionality that's common between the disk and GCS backends in a single place.

func (*PackFileBackend) Fsck

func (pb *PackFileBackend) Fsck()

func (*PackFileBackend) HashExists

func (pb *PackFileBackend) HashExists(hash Hash) bool

func (*PackFileBackend) Hashes

func (pb *PackFileBackend) Hashes() map[Hash]struct{}

func (*PackFileBackend) ListMetadata

func (pb *PackFileBackend) ListMetadata() map[string]time.Time

func (*PackFileBackend) LogStats

func (pb *PackFileBackend) LogStats()

func (*PackFileBackend) MetadataExists

func (pb *PackFileBackend) MetadataExists(name string) bool

func (*PackFileBackend) Read

func (pb *PackFileBackend) Read(hash Hash) (io.ReadCloser, error)

func (*PackFileBackend) ReadMetadata

func (pb *PackFileBackend) ReadMetadata(name string) []byte

func (*PackFileBackend) String

func (pb *PackFileBackend) String() string

func (*PackFileBackend) SyncWrites

func (pb *PackFileBackend) SyncWrites()

func (*PackFileBackend) Write

func (pb *PackFileBackend) Write(chunk []byte) Hash

func (*PackFileBackend) WriteMetadata

func (pb *PackFileBackend) WriteMetadata(name string, contents []byte)

type RobustWriteCloser

type RobustWriteCloser interface {
	Write(b []byte)
	Close()
}

RobustWriteCloser is like a io.WriteCloser, except it treats any errors as fatal errors and thus doesn't have error return values. Write() always writes all bytes given to it, and after a call to Close() returns, the contents have successfully been committed to storage.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL