aostor

package module

v0.0.0-...-04bc891 Latest Latest Go to latest Published: Feb 25, 2013 License: GPL-3.0 Imports: 33 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/godeep/aostor

Links

Open Source Insights

README ¶

aostor

Append-Only File Storage Stores files in an append-only manner, indexed for fast retrieval, but uses few files to be easy on fs (have you ever tried to list all files in a 3 million files directory hierarchy? It takes ages!)

Problem

Lots of small files, slow fs over 1000 files/dir. Possible archiving after a long period of time - for restore, a compact solution needed.

A possible solution

Store files in tars (say, 1Gb each), right with the metadata, too.

Tar layout

Each stored file gets a unique key (UUID), data is stored as #, i.e. 213f34a8dc1d213f34a8dc1d213f34a8#. The metadata (info) is stored in !, the possible symbolic link (for per-tar deduping) is as @.

The info is in HTTP header format ("\n" separated lines, ": " separated key and value), each aostor-specific header (id, index position (ipos) and data position (dpos)) starting with X-Aostor-.

Compression, encryption methods are stored in the Content-Encoding header. Mime-type in Content-Type.

Indexing

Tar needs an index, to be able retrieve files in random order. For this, each tar gets a .cdb companion (D. J. Bernstein's Constant DataBase).

TODO: one needs to find out in which tar the file is in!

A possible solution is that to return the tar's UUID with the key, so retrieval is easy: just use the given UUID! The implementation shall support partial tar UUID's (i.e. just some prefix is presented of the tar UUID) - to be able to store the file UUID + the tar UUID in sime limited space.

For this, the UUIDs are encoded as Base64, URL-safe, stripped padding: this results 22 characters as an UUID, so a file UUID + "," separator + full tar UUID consumes 22 + 1 + 22 = 45 characters, a 40 char wide field can store 17 chars of the tar UUID, 32 => 9.

UUID4 = 16 bytes - (2 + 4 bits) randomness = 16 * 8 - 6 bits, that is 122 bits. So 9 char means 9/22 ratio of the tars need to be searched only.

Appending files

Files written into a simple directory ("staging"), just as they would be in the tar. If the count/size reaches a threshold, they're shoveled in a tar, accompanied by the .cdb.

Retrieving a file

First the staging directory is checked, if the ! (info) file is there, then read, and the #bz2 is checked.

If the staging directory is empty, then we start searching the cdbs, first the newest (L0), then the next level (L1), then the next (L2), and so on.

Index "compaction"

When shovel is called, the files in the staging dir are shoveled in some tars, accompanied by .cdb. The .cdb is symlinked into the L0 directory. Then the L1 directory is checked: if then number of cdbs are bigger than the threshold (10), then they are merged into a new cdb in the L1 directory, and these L0 cdbs are deleted. If this happened, then the L(n+1) dir is checked: if the number of cdbs are bigger than the threshold (10), then they are merged into a new cdb in the L(n+2) directory, and these L(n+1) cdbs are deleted.

CDB has a size limit of 2Gb, so the compactor must take this into account, too!

API Docs: http://go.pkgdoc.org/github.com/tgulacsi/aostor

Documentation ¶

Overview ¶

Log is disabled by default Specific logger can be passed to the library using a 'aostor.UseLogger(...)' call You can enable library log without importing Seelog with a 'aostor.SetLogWriter(writer)' call

Index ¶

Constants
Variables
func AppendFile(tarfn string, info Info, fn string, compressMethod string) (pos uint64, err error)
func AppendLink(tarfn string, info Info, src string, dst string) (err error)
func BaseName(fn string) string
func BytesToStr(buf []byte) string
func CalculateLink(basedir, destfn string) string
func CanonicalHeaderKey(key []byte) []byte
func Compact(realm string, onChange NotifyFunc) error
func CompactIndices(realm string, level uint, onChange func(), alreadyLocked bool) error
func CreateTar(tarfn string, dirname string, sizeLimit uint64, alreadyLocked bool) error
func DeDup(path string, hash string, alreadyLocked bool) int
func DisableLog()
func FhandleTarHeader(fh *os.File) (hdr *tar.Header, err error)
func FileTarHeader(fn string) (hdr *tar.Header, err error)
func FillCaches(force bool) error
func FillHeader(hdr *tar.Header)
func FindLinkOrigin(fn string, abs bool) string
func FindTarEnd(r io.ReadSeeker, last_known uint64) (pos uint64, err error)
func Finfo2Theader(fi os.FileInfo) (hdr *tar.Header, err error)
func FlushLog()
func GetLogger() seelog.LoggerInterface
func LogIsDisabled() bool
func ReadItem(tarfn string, pos int64) (ret io.Reader, err error)
func SameFile(fn1, fn2 string) (bool, error)
func SetLogWriter(writer io.Writer) error
func StrToBytes(str string) []byte
func UseLogger(newLogger seelog.LoggerInterface)
func UseLoggerFromConfigFile(filename string)
func Walk(root string, walkFn filepath.WalkFunc) error
func WriteTar(tw *tar.Writer, hdr *tar.Header, r io.Reader) (err error)
type Config
- func ReadConf(fn string, realm string) (c Config, err error)
type CountingWriter
- func NewCounter() *CountingWriter
- func (c *CountingWriter) Write(p []byte) (n int, err error)
type Info
- func Get(realm string, uuid UUID) (info Info, reader io.Reader, err error)
- func GetFromCdb(uuid UUID, cdb_fn string) (info Info, reader io.Reader, err error)
- func InfoFromBytes(b []byte) (info Info, err error)
- func ReadInfo(r io.Reader) (info Info, err error)
- func (info *Info) Add(key string, val string)
- func (info *Info) AddBytes(key, val []byte)
- func (info *Info) Bytes() []byte
- func (info *Info) Copy(header http.Header)
- func (info *Info) CopyFrom(header map[string][]string)
- func (info *Info) Get(key string) string
- func (info *Info) NewReader() (io.Reader, int)
- func (info *Info) Prepare() error
- func (info *Info) SetFilename(fn string, mime string)
type NotifyFunc
type ReadWriteSeekCloser
- func OpenForAppend(tarfn string) (tw *tar.Writer, fobj ReadWriteSeekCloser, pos uint64, err error)
type SymlinkError
- func (e *SymlinkError) Error() string
type UUID
- func NewUUID() (UUID, error)
- func Put(realm string, info Info, data io.Reader) (key UUID, err error)
- func UUIDFromBytes(text []byte) (b UUID, err error)
- func UUIDFromString(text string) (b UUID, err error)
- func (b UUID) Bytes() []byte
- func (b UUID) IsEmpty() bool
- func (b UUID) String() string

Constants ¶

View Source

const (
	MIN_CDB_SIZE = 2048
	MAX_CDB_SIZE = (1 << 31) - 1
)

View Source

const (
	DefaultConfigFile     = "aostor.ini"
	DefaultTarThreshold   = 1000 * (1 << 20) // 1000Mb
	DefaultIndexThreshold = 10               // How many index cdb should be merged
	DefaultContentHash    = "sha1"
	DefaultCompressMethod = "gzip"
	DefaultHostport       = ":8341"
	DefaultLogConfFile    = "seelog.xml"
	TestConfig            = `` /* 247-byte string literal not displayed */

)

View Source

const (
	SuffInfo = "!" // suffix of info file
	SuffLink = "@" // suffix of link
	SuffData = "#" // suffix of data file (+ compression type)
	BS       = 512 // tar blocksize
)

View Source

const InfoPref = "X-Aostor-" // prefix of specific headers

Variables ¶

View Source

var (
	ErrSymlink     = errors.New("aodb/tarhelper: symlink")
	NotRegularFile = errors.New("aodb/tarhelper: not a regular file")
	ErrBadTarEnd   = errors.New("aodb/tarhelper: bad tar end")
)

View Source

var AlreadyLocked = errors.New("AlreadyLocked")

View Source

var (
	ConfigFile = DefaultConfigFile
)

View Source

var DefaultLogConf = `` /* 261-byte string literal not displayed */

View Source

var MissingFilenameError = errors.New("Filename is missing!")

View Source

var (
	NotFound = errors.New("Not Found")
)

View Source

var StopIteration = errors.New("StopIteration")

View Source

var UUIDMaker = uuid.NewUUID4

Functions ¶

func AppendFile ¶

func AppendFile(tarfn string, info Info, fn string, compressMethod string) (pos uint64, err error)

appends file fn with info to tarfn, compressing with compressMethod

func AppendLink ¶

func AppendLink(tarfn string, info Info, src string, dst string) (err error)

appends as link pointing at a previously written item

func BaseName ¶

func BaseName(fn string) string

basename for Windows and Unix (strips everything before / or \\

func BytesToStr ¶

func BytesToStr(buf []byte) string

converts []byte to string

func CalculateLink ¶

func CalculateLink(basedir, destfn string) string

CalculateLink calculates the symbolic link for destfn relative to basedir

func CanonicalHeaderKey ¶

func CanonicalHeaderKey(key []byte) []byte

func Compact ¶

func Compact(realm string, onChange NotifyFunc) error

compacts staging dir: moves info and data files to tar; calls CompactIndices

func CompactIndices ¶

func CompactIndices(realm string, level uint, onChange func(), alreadyLocked bool) error

Compact compacts the index cdbs

func CreateTar ¶

func CreateTar(tarfn string, dirname string, sizeLimit uint64, alreadyLocked bool) error

Copies files from the given directory into a given tar file

func DeDup ¶

func DeDup(path string, hash string, alreadyLocked bool) int

deduplication: replace data with a symlink to a previous data with the same contant-hash-...

func DisableLog ¶

func DisableLog()

DisableLog disables all library log output

func FhandleTarHeader ¶

func FhandleTarHeader(fh *os.File) (hdr *tar.Header, err error)

tar.Header for a file

func FileTarHeader ¶

func FileTarHeader(fn string) (hdr *tar.Header, err error)

tar.Header for a filename

func FillCaches ¶

func FillCaches(force bool) error

fills caches (reads tar files and cdb files, caches path)

func FillHeader ¶

func FillHeader(hdr *tar.Header)

fills tar.Header missing information (uid/gid, username/groupname, times ...)

func FindLinkOrigin ¶

func FindLinkOrigin(fn string, abs bool) string

func FindTarEnd ¶

func FindTarEnd(r io.ReadSeeker, last_known uint64) (pos uint64, err error)

def _init_find_end(self, end_offset, name):
    '''Move to the end of the archive, before the first empty block.'''
    if not end_offset:
        end_offset = self.END_OFFSET_CACHE.get(name, 0)

    self.firstmember = None
    perf_mark()
    if end_offset > self.offset:
        self.fileobj.seek(0, 2)
        p = self.fileobj.tell()
        self.offset = min(p - tarfile.BLOCKSIZE, end_offset)
        self.fileobj.seek(self.offset)
        perf_print('%s end_offset > self.offset', name)

    if DEBUG_MEM:
        LOG.debug('before while: next() mem=%dKb', usedmem())
    while True:
        if self.next() is None:
            if self.offset > 0:
                self.fileobj.seek(-tarfile.BLOCKSIZE, 1)
            break
    perf_print('find_end %s', name)
    self.END_OFFSET_CACHE[name] = self.offset
    if DEBUG_MEM:
        LOG.debug('after while: next() mem=%dKb', usedmem())

Move to the end of the archive, before the first empty block.

func Finfo2Theader ¶

func Finfo2Theader(fi os.FileInfo) (hdr *tar.Header, err error)

create tar.Header from os.FileInfo

func FlushLog ¶

func FlushLog()

Call this before app shutdown

func GetLogger ¶

func GetLogger() seelog.LoggerInterface

func LogIsDisabled ¶

func LogIsDisabled() bool

func ReadItem ¶

func ReadItem(tarfn string, pos int64) (ret io.Reader, err error)

Reads from tarfn at starting at pos returns a SymplinkError with the symlink information, if there is a symlink at the given position - to be able to retry with the symlink

func SameFile ¶

func SameFile(fn1, fn2 string) (bool, error)

func SetLogWriter ¶

func SetLogWriter(writer io.Writer) error

SetLogWriter uses a specified io.Writer to output library log. Use this func if you are not using Seelog logging system in your app.

func StrToBytes ¶

func StrToBytes(str string) []byte

convert string to []byte

func UseLogger ¶

func UseLogger(newLogger seelog.LoggerInterface)

UseLogger uses a specified seelog.LoggerInterface to output library log. Use this func if you are using Seelog logging system in your app.

func UseLoggerFromConfigFile ¶

func UseLoggerFromConfigFile(filename string)

loads logger from config file

func Walk ¶

func Walk(root string, walkFn filepath.WalkFunc) error

Walk walks the file tree rooted at root, calling walkFn for each file or directory in the tree, including root. All errors that arise visiting files and directories are filtered by walkFn. The files are walked in inode order, which makes the output indeterministic but means that even for very large directories Walk can be efficient.

func WriteTar ¶

func WriteTar(tw *tar.Writer, hdr *tar.Header, r io.Reader) (err error)

Types ¶

type Config ¶

type Config struct {
	StagingDir, IndexDir, TarDir string
	IndexThreshold               uint
	TarThreshold                 uint64
	Hostport                     string
	Realms                       []string
	ContentHash                  string
	ContentHashFunc              func() hash.Hash
	LogConf                      string
	CompressMethod               string
}

configuration variables, parsed

func ReadConf ¶

func ReadConf(fn string, realm string) (c Config, err error)

reads config file (or ConfigFile if empty), replaces every #(realm)s with the given realm, if given

type CountingWriter ¶

type CountingWriter struct {
	Num uint64 // bytes written
}

A writer which counts bytes written into it

func NewCounter ¶

func NewCounter() *CountingWriter

func (*CountingWriter) Write ¶

func (c *CountingWriter) Write(p []byte) (n int, err error)

just count

type Info ¶

type Info struct {
	Key        UUID
	Ipos, Dpos uint64
	// contains filtered or unexported fields
}

func Get ¶

func Get(realm string, uuid UUID) (info Info, reader io.Reader, err error)

returns the associated info and data of a given uuid in a given realm

checks staging area
checks level zero (symlinked cdbs in ndx/L00)
checks higher level (older, too) cdbs in ndx/L01, ndx/L02...

The difference between level zero and higher is the following: at level zero, there are the tar files' cdbs (symlinked), and these cdbs contains the info (with the position information), ready to serve. At higher levels, the cdbs contains only "/%d" signs (which cdb, only a number) and that sign is which zero-level cdb. So at this level an additional lookup is required.

func GetFromCdb ¶

func GetFromCdb(uuid UUID, cdb_fn string) (info Info, reader io.Reader, err error)

func InfoFromBytes ¶

func InfoFromBytes(b []byte) (info Info, err error)

func ReadInfo ¶

func ReadInfo(r io.Reader) (info Info, err error)

parses into Info

func (*Info) Add ¶

func (info *Info) Add(key string, val string)

adds a key

func (*Info) AddBytes ¶

func (info *Info) AddBytes(key, val []byte)

adds a key (byte)

func (*Info) Bytes ¶

func (info *Info) Bytes() []byte

returns Info in wire format

func (*Info) Copy ¶

func (info *Info) Copy(header http.Header)

copies adata from Info to http.Header

func (*Info) CopyFrom ¶

func (info *Info) CopyFrom(header map[string][]string)

copies data from textproto.MIMEHeader

func (*Info) Get ¶

func (info *Info) Get(key string) string

returns the value for key

func (*Info) NewReader ¶

func (info *Info) NewReader() (io.Reader, int)

returns a new Reader and the length for wire format of Info

func (*Info) Prepare ¶

func (info *Info) Prepare() error

prepares info for writing out

func (*Info) SetFilename ¶

func (info *Info) SetFilename(fn string, mime string)

sets the filename and the mimetype, conditionally (if given)

type NotifyFunc ¶

type NotifyFunc func()

type ReadWriteSeekCloser ¶

type ReadWriteSeekCloser interface {
	io.ReadWriteSeeker
	io.Closer
}

Reader + Writer + Seeker + Closer

func OpenForAppend ¶

func OpenForAppend(tarfn string) (
	tw *tar.Writer, fobj ReadWriteSeekCloser, pos uint64, err error)

Opens the tarfile for appending - seeks to the end

type SymlinkError ¶

type SymlinkError struct {
	Linkname string
}

func (*SymlinkError) Error ¶

func (e *SymlinkError) Error() string

type UUID ¶

type UUID [uuid.Length]byte

func NewUUID ¶

func NewUUID() (UUID, error)

returns a hexified uuid.Length-byte UUID1

func Put ¶

func Put(realm string, info Info, data io.Reader) (key UUID, err error)

puts file (info + data) into the given realm - returns the key if the key is in info, then uses that

func UUIDFromBytes ¶

func UUIDFromBytes(text []byte) (b UUID, err error)

func UUIDFromString ¶

func UUIDFromString(text string) (b UUID, err error)

func (UUID) Bytes ¶

func (b UUID) Bytes() []byte

func (UUID) IsEmpty ¶

func (b UUID) IsEmpty() bool

func (UUID) String ¶

func (b UUID) String() string

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
bytrie
compressor
shovel
srv_aostor Append-Only Storage HTTP server	Append-Only Storage HTTP server
uuid uuid.go	uuid.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL