cas

package module
v0.0.0-...-c9fddc1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 18, 2024 License: Apache-2.0 Imports: 22 Imported by: 0

README

Content Addressable Storage

Go Reference

This project implements a simple and pragmatic approach to Content Addressable Storage (CAS). It was heavily influenced by Perkeep (aka Camlistore) and Git.

For more details, see concepts and comparison with other systems.

Status

The project is stable, and further work is ongoing on designing CAS2 - more flexible and performant version. This project will receive bug fixed and maintenance work. New features will likely end up in CAS2.

Check the Quick start guide for a list of basic commands.

Goals

  • Simplicity: the core specification should be trivial to implement.

  • Interop: CAS should play nicely with existing tools and technologies, either content-addressable or not.

  • Easy to use: CAS should be a single command away, similar to git init.

Use cases

  • Immutable and versioned archives: CAS supports files with multiple TBs of data, folders with millions of files and can index and use remote data without storing it locally.

  • Data processing pipelines: CAS caching capabilities allows to use it for incremental data pipelines.

  • Git for large files: CAS stores files with an assumption that they can be multiple TBs and is optimized for this use case, while still supporting tags and branches, like Git.

Features and the roadmap

Implemented:

  • Fast file hashing
    • SHA-256, other can be used
    • Stores results in file attributes (cache)
  • Support for large archives
    • Large contiguous files (> TB)
    • Large multipart files (> TB)
    • Large directories (> millions of files)
    • Zero-copy file fetch (BTRFS)
  • Integrations
    • Can index and sync web content
    • HTTP(S) caching (as a Go library)
  • Remote storage
    • Self-hosted HTTP CAS server (read-only)
    • Google Cloud Storage
  • Usability
    • Mutable objects (pins)
    • Local storage in Git fashion
  • Data pipelines
    • Extendable
    • Caches results
    • Incremental

Planned (for CAS2):

  • Support for large multipart files (> TB)
    • Support multilevel parts
    • Support blob splitters (rolling checksum, new line, etc)
  • Remote storage
    • AWS, etc
    • Self-hosted HTTP CAS server (read-write)
  • Integration with Git
    • Zero-copy fetch from Git (either remote or local)
    • LFS integration
  • Integration with Docker
    • Zero-copy fetch of an image from Docker
    • Unpack FS images to CAS
    • Use containers in pipelines
  • Integration with BitTorrent:
    • Store torrent files
    • Download torrent data directly to CAS
    • To consider: expose CAS as a peer
  • Integration with other CAS systems:
    • Perkeep
    • Upspin
    • IPFS
  • Windows and OSX support
  • Better support for pipelines

Documentation

Index

Constants

View Source
const (
	DefaultDir = ".cas"
	DefaultPin = "root"
)

Variables

This section is empty.

Functions

func Init

func Init(dir string, conf *config.Config) error

Init configures a CAS and stores the metadata in a specified directory. If directory path is empty, default path will be used. Relative paths in a local storage configs will be interpreted relative to the config.

func NewWebContent

func NewWebContent(req *http.Request, resp *http.Response) *schema.WebContent

func SaveRef

func SaveRef(ctx context.Context, path string, fi os.FileInfo, ref types.Ref) error

SaveRef stores the ref into file's metadata. Additionally, it will write the size and mtime to know if ref is still valid.

func SaveRefFile

func SaveRefFile(ctx context.Context, f *os.File, fi os.FileInfo, ref types.Ref) error

SaveRefFile stores the ref into file's metadata. Additionally, it will write the size and mtime to know if ref is still valid.

Types

type Concat

type Concat struct {
	ElemType string     `json:"etype,omitempty"`
	Parts    []SizedRef `json:"parts"`
}

type FileDesc

type FileDesc interface {
	Name() string
	Open() (io.ReadCloser, SizedRef, error)
	SetRef(ref types.SizedRef)
}

func LocalFile

func LocalFile(path string) FileDesc

type OpenOptions

type OpenOptions struct {
	Dir     string
	Storage storage.Storage
}

type Ref

type Ref = types.Ref

type SchemaIterator

type SchemaIterator = storage.SchemaIterator

type SchemaRef

type SchemaRef = types.SchemaRef

type SizedRef

type SizedRef = types.SizedRef

func Hash

func Hash(ctx context.Context, path string) (SizedRef, error)

func HashWith

func HashWith(ctx context.Context, path string, info os.FileInfo, force bool) (SizedRef, error)

func Stat

func Stat(ctx context.Context, path string) (SizedRef, error)

Stat returns the size of the file and the ref if it's written into the metadata and considered valid.

func StatFile

func StatFile(ctx context.Context, f *os.File) (SizedRef, error)

StatFile returns the size of the file and the ref if it's written into the metadata and considered valid.

type SplitConfig

type SplitConfig struct {
	Splitter SplitFunc // use this split function instead of size-based
	Min, Max uint64    // in bytes
	PerLevel uint      // chunks on each schema level
}

type SplitFunc

type SplitFunc func(p []byte) int

type Stats

type Stats = schema.Stats

type Storage

type Storage struct {
	// contains filtered or unexported fields
}

func New

func New(st storage.Storage) (*Storage, error)

func Open

func Open(opt OpenOptions) (*Storage, error)

func (*Storage) BeginBlob

func (s *Storage) BeginBlob(ctx context.Context) (storage.BlobWriter, error)

func (*Storage) Checkout

func (s *Storage) Checkout(ctx context.Context, ref Ref, dst string) error

Checkout restores content of ref into the dst.

func (*Storage) Close

func (s *Storage) Close() error

func (*Storage) DecodeSchema

func (s *Storage) DecodeSchema(ctx context.Context, ref types.Ref) (schema.Object, error)

func (*Storage) DeletePin

func (s *Storage) DeletePin(ctx context.Context, name string) error

func (*Storage) FetchBlob

func (s *Storage) FetchBlob(ctx context.Context, ref Ref) (io.ReadCloser, uint64, error)

func (*Storage) FetchSchema

func (s *Storage) FetchSchema(ctx context.Context, ref types.Ref) (io.ReadCloser, uint64, error)

func (*Storage) GetPin

func (s *Storage) GetPin(ctx context.Context, name string) (types.Ref, error)

func (*Storage) GetPinOrRef

func (s *Storage) GetPinOrRef(ctx context.Context, name string) (types.Ref, error)

func (*Storage) IterateBlobs

func (s *Storage) IterateBlobs(ctx context.Context) storage.Iterator

func (*Storage) IterateDataBlobsIn

func (s *Storage) IterateDataBlobsIn(ctx context.Context, root Ref) storage.Iterator

IterateDataBlobsIn iterates all non-schema blobs in the provided schema blob.

func (*Storage) IteratePins

func (s *Storage) IteratePins(ctx context.Context) storage.PinIterator

func (*Storage) IterateSchema

func (s *Storage) IterateSchema(ctx context.Context, typs ...string) SchemaIterator

func (*Storage) ReindexSchema

func (s *Storage) ReindexSchema(ctx context.Context, force bool) error

func (*Storage) SetPin

func (s *Storage) SetPin(ctx context.Context, name string, ref types.Ref) error

func (*Storage) StatBlob

func (s *Storage) StatBlob(ctx context.Context, ref Ref) (uint64, error)

func (*Storage) StoreAddr

func (s *Storage) StoreAddr(ctx context.Context, addr string, conf *StoreConfig) (types.SizedRef, error)

StoreAddr interprets an address as either a local FS path or URL and fetches the content. It will create schema objects automatically.

func (*Storage) StoreAsFile

func (s *Storage) StoreAsFile(ctx context.Context, fd FileDesc, conf *StoreConfig) (SizedRef, error)

func (*Storage) StoreBlob

func (s *Storage) StoreBlob(ctx context.Context, r io.Reader, conf *StoreConfig) (SizedRef, error)

StoreBlob writes the data from r according to a config.

func (*Storage) StoreFilePath

func (s *Storage) StoreFilePath(ctx context.Context, path string, conf *StoreConfig) (SizedRef, error)

func (*Storage) StoreHTTPContent

func (s *Storage) StoreHTTPContent(ctx context.Context, req *http.Request, conf *StoreConfig) (SizedRef, error)

func (*Storage) StoreSchema

func (s *Storage) StoreSchema(ctx context.Context, o schema.Object) (SizedRef, error)

func (*Storage) StoreURLContent

func (s *Storage) StoreURLContent(ctx context.Context, url string, conf *StoreConfig) (SizedRef, error)

func (*Storage) SyncBlob

func (s *Storage) SyncBlob(ctx context.Context, ref Ref) (Ref, error)

type StoreConfig

type StoreConfig struct {
	Expect    types.SizedRef // expected size and ref; can be set separately
	IndexOnly bool           // write metadata only
	Split     *SplitConfig
}

Directories

Path Synopsis
cmd
cas
Package cashttp implement HTTP cache based on CAS.
Package cashttp implement HTTP cache based on CAS.
all
gcs

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL