aostor

package module
v0.0.0-...-04bc891 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 25, 2013 License: GPL-3.0 Imports: 33 Imported by: 0

README

aostor

Append-Only File Storage Stores files in an append-only manner, indexed for fast retrieval, but uses few files to be easy on fs (have you ever tried to list all files in a 3 million files directory hierarchy? It takes ages!)

Problem

Lots of small files, slow fs over 1000 files/dir. Possible archiving after a long period of time - for restore, a compact solution needed.

A possible solution

Store files in tars (say, 1Gb each), right with the metadata, too.

Tar layout

Each stored file gets a unique key (UUID), data is stored as #, i.e. 213f34a8dc1d213f34a8dc1d213f34a8#. The metadata (info) is stored in !, the possible symbolic link (for per-tar deduping) is as @.

The info is in HTTP header format ("\n" separated lines, ": " separated key and value), each aostor-specific header (id, index position (ipos) and data position (dpos)) starting with X-Aostor-.

Compression, encryption methods are stored in the Content-Encoding header. Mime-type in Content-Type.

Indexing

Tar needs an index, to be able retrieve files in random order. For this, each tar gets a .cdb companion (D. J. Bernstein's Constant DataBase).

TODO: one needs to find out in which tar the file is in!

A possible solution is that to return the tar's UUID with the key, so retrieval is easy: just use the given UUID! The implementation shall support partial tar UUID's (i.e. just some prefix is presented of the tar UUID) - to be able to store the file UUID + the tar UUID in sime limited space.

For this, the UUIDs are encoded as Base64, URL-safe, stripped padding: this results 22 characters as an UUID, so a file UUID + "," separator + full tar UUID consumes 22 + 1 + 22 = 45 characters, a 40 char wide field can store 17 chars of the tar UUID, 32 => 9.

UUID4 = 16 bytes - (2 + 4 bits) randomness = 16 * 8 - 6 bits, that is 122 bits. So 9 char means 9/22 ratio of the tars need to be searched only.

Appending files

Files written into a simple directory ("staging"), just as they would be in the tar. If the count/size reaches a threshold, they're shoveled in a tar, accompanied by the .cdb.

Retrieving a file

First the staging directory is checked, if the ! (info) file is there, then read, and the #bz2 is checked.

If the staging directory is empty, then we start searching the cdbs, first the newest (L0), then the next level (L1), then the next (L2), and so on.

Index "compaction"

When shovel is called, the files in the staging dir are shoveled in some tars, accompanied by .cdb. The .cdb is symlinked into the L0 directory. Then the L1 directory is checked: if then number of cdbs are bigger than the threshold (10), then they are merged into a new cdb in the L1 directory, and these L0 cdbs are deleted. If this happened, then the L(n+1) dir is checked: if the number of cdbs are bigger than the threshold (10), then they are merged into a new cdb in the L(n+2) directory, and these L(n+1) cdbs are deleted.

CDB has a size limit of 2Gb, so the compactor must take this into account, too!

API Docs: http://go.pkgdoc.org/github.com/tgulacsi/aostor

Documentation

Overview

Log is disabled by default Specific logger can be passed to the library using a 'aostor.UseLogger(...)' call You can enable library log without importing Seelog with a 'aostor.SetLogWriter(writer)' call

Index

Constants

View Source
const (
	MIN_CDB_SIZE = 2048
	MAX_CDB_SIZE = (1 << 31) - 1
)
View Source
const (
	DefaultConfigFile     = "aostor.ini"
	DefaultTarThreshold   = 1000 * (1 << 20) // 1000Mb
	DefaultIndexThreshold = 10               // How many index cdb should be merged
	DefaultContentHash    = "sha1"
	DefaultCompressMethod = "gzip"
	DefaultHostport       = ":8341"
	DefaultLogConfFile    = "seelog.xml"
	TestConfig            = `` /* 247-byte string literal not displayed */

)
View Source
const (
	SuffInfo = "!" // suffix of info file
	SuffLink = "@" // suffix of link
	SuffData = "#" // suffix of data file (+ compression type)
	BS       = 512 // tar blocksize
)
View Source
const InfoPref = "X-Aostor-" // prefix of specific headers

Variables

View Source
var (
	ErrSymlink     = errors.New("aodb/tarhelper: symlink")
	NotRegularFile = errors.New("aodb/tarhelper: not a regular file")
	ErrBadTarEnd   = errors.New("aodb/tarhelper: bad tar end")
)
View Source
var AlreadyLocked = errors.New("AlreadyLocked")
View Source
var (
	ConfigFile = DefaultConfigFile
)
View Source
var DefaultLogConf = `` /* 261-byte string literal not displayed */
View Source
var MissingFilenameError = errors.New("Filename is missing!")
View Source
var (
	NotFound = errors.New("Not Found")
)
View Source
var StopIteration = errors.New("StopIteration")
View Source
var UUIDMaker = uuid.NewUUID4

Functions

func AppendFile

func AppendFile(tarfn string, info Info, fn string, compressMethod string) (pos uint64, err error)

appends file fn with info to tarfn, compressing with compressMethod

func AppendLink(tarfn string, info Info, src string, dst string) (err error)

appends as link pointing at a previously written item

func BaseName

func BaseName(fn string) string

basename for Windows and Unix (strips everything before / or \\

func BytesToStr

func BytesToStr(buf []byte) string

converts []byte to string

func CalculateLink(basedir, destfn string) string

CalculateLink calculates the symbolic link for destfn relative to basedir

func CanonicalHeaderKey

func CanonicalHeaderKey(key []byte) []byte

func Compact

func Compact(realm string, onChange NotifyFunc) error

compacts staging dir: moves info and data files to tar; calls CompactIndices

func CompactIndices

func CompactIndices(realm string, level uint, onChange func(), alreadyLocked bool) error

Compact compacts the index cdbs

func CreateTar

func CreateTar(tarfn string, dirname string, sizeLimit uint64, alreadyLocked bool) error

Copies files from the given directory into a given tar file

func DeDup

func DeDup(path string, hash string, alreadyLocked bool) int

deduplication: replace data with a symlink to a previous data with the same contant-hash-...

func DisableLog

func DisableLog()

DisableLog disables all library log output

func FhandleTarHeader

func FhandleTarHeader(fh *os.File) (hdr *tar.Header, err error)

tar.Header for a file

func FileTarHeader

func FileTarHeader(fn string) (hdr *tar.Header, err error)

tar.Header for a filename

func FillCaches

func FillCaches(force bool) error

fills caches (reads tar files and cdb files, caches path)

func FillHeader

func FillHeader(hdr *tar.Header)

fills tar.Header missing information (uid/gid, username/groupname, times ...)

func FindLinkOrigin

func FindLinkOrigin(fn string, abs bool) string

func FindTarEnd

func FindTarEnd(r io.ReadSeeker, last_known uint64) (pos uint64, err error)
def _init_find_end(self, end_offset, name):
    '''Move to the end of the archive, before the first empty block.'''
    if not end_offset:
        end_offset = self.END_OFFSET_CACHE.get(name, 0)

    self.firstmember = None
    perf_mark()
    if end_offset > self.offset:
        self.fileobj.seek(0, 2)
        p = self.fileobj.tell()
        self.offset = min(p - tarfile.BLOCKSIZE, end_offset)
        self.fileobj.seek(self.offset)
        perf_print('%s end_offset > self.offset', name)

    if DEBUG_MEM:
        LOG.debug('before while: next() mem=%dKb', usedmem())
    while True:
        if self.next() is None:
            if self.offset > 0:
                self.fileobj.seek(-tarfile.BLOCKSIZE, 1)
            break
    perf_print('find_end %s', name)
    self.END_OFFSET_CACHE[name] = self.offset
    if DEBUG_MEM:
        LOG.debug('after while: next() mem=%dKb', usedmem())

Move to the end of the archive, before the first empty block.

func Finfo2Theader

func Finfo2Theader(fi os.FileInfo) (hdr *tar.Header, err error)

create tar.Header from os.FileInfo

func FlushLog

func FlushLog()

Call this before app shutdown

func GetLogger

func GetLogger() seelog.LoggerInterface

func LogIsDisabled

func LogIsDisabled() bool

func ReadItem

func ReadItem(tarfn string, pos int64) (ret io.Reader, err error)

Reads from tarfn at starting at pos returns a SymplinkError with the symlink information, if there is a symlink at the given position - to be able to retry with the symlink

func SameFile

func SameFile(fn1, fn2 string) (bool, error)

func SetLogWriter

func SetLogWriter(writer io.Writer) error

SetLogWriter uses a specified io.Writer to output library log. Use this func if you are not using Seelog logging system in your app.

func StrToBytes

func StrToBytes(str string) []byte

convert string to []byte

func UseLogger

func UseLogger(newLogger seelog.LoggerInterface)

UseLogger uses a specified seelog.LoggerInterface to output library log. Use this func if you are using Seelog logging system in your app.

func UseLoggerFromConfigFile

func UseLoggerFromConfigFile(filename string)

loads logger from config file

func Walk

func Walk(root string, walkFn filepath.WalkFunc) error

Walk walks the file tree rooted at root, calling walkFn for each file or directory in the tree, including root. All errors that arise visiting files and directories are filtered by walkFn. The files are walked in inode order, which makes the output indeterministic but means that even for very large directories Walk can be efficient.

func WriteTar

func WriteTar(tw *tar.Writer, hdr *tar.Header, r io.Reader) (err error)

Types

type Config

type Config struct {
	StagingDir, IndexDir, TarDir string
	IndexThreshold               uint
	TarThreshold                 uint64
	Hostport                     string
	Realms                       []string
	ContentHash                  string
	ContentHashFunc              func() hash.Hash
	LogConf                      string
	CompressMethod               string
}

configuration variables, parsed

func ReadConf

func ReadConf(fn string, realm string) (c Config, err error)

reads config file (or ConfigFile if empty), replaces every #(realm)s with the given realm, if given

type CountingWriter

type CountingWriter struct {
	Num uint64 // bytes written
}

A writer which counts bytes written into it

func NewCounter

func NewCounter() *CountingWriter

func (*CountingWriter) Write

func (c *CountingWriter) Write(p []byte) (n int, err error)

just count

type Info

type Info struct {
	Key        UUID
	Ipos, Dpos uint64
	// contains filtered or unexported fields
}

func Get

func Get(realm string, uuid UUID) (info Info, reader io.Reader, err error)

returns the associated info and data of a given uuid in a given realm

  1. checks staging area
  2. checks level zero (symlinked cdbs in ndx/L00)
  3. checks higher level (older, too) cdbs in ndx/L01, ndx/L02...

The difference between level zero and higher is the following: at level zero, there are the tar files' cdbs (symlinked), and these cdbs contains the info (with the position information), ready to serve. At higher levels, the cdbs contains only "/%d" signs (which cdb, only a number) and that sign is which zero-level cdb. So at this level an additional lookup is required.

func GetFromCdb

func GetFromCdb(uuid UUID, cdb_fn string) (info Info, reader io.Reader, err error)

func InfoFromBytes

func InfoFromBytes(b []byte) (info Info, err error)

func ReadInfo

func ReadInfo(r io.Reader) (info Info, err error)

parses into Info

func (*Info) Add

func (info *Info) Add(key string, val string)

adds a key

func (*Info) AddBytes

func (info *Info) AddBytes(key, val []byte)

adds a key (byte)

func (*Info) Bytes

func (info *Info) Bytes() []byte

returns Info in wire format

func (*Info) Copy

func (info *Info) Copy(header http.Header)

copies adata from Info to http.Header

func (*Info) CopyFrom

func (info *Info) CopyFrom(header map[string][]string)

copies data from textproto.MIMEHeader

func (*Info) Get

func (info *Info) Get(key string) string

returns the value for key

func (*Info) NewReader

func (info *Info) NewReader() (io.Reader, int)

returns a new Reader and the length for wire format of Info

func (*Info) Prepare

func (info *Info) Prepare() error

prepares info for writing out

func (*Info) SetFilename

func (info *Info) SetFilename(fn string, mime string)

sets the filename and the mimetype, conditionally (if given)

type NotifyFunc

type NotifyFunc func()

type ReadWriteSeekCloser

type ReadWriteSeekCloser interface {
	io.ReadWriteSeeker
	io.Closer
}

Reader + Writer + Seeker + Closer

func OpenForAppend

func OpenForAppend(tarfn string) (
	tw *tar.Writer, fobj ReadWriteSeekCloser, pos uint64, err error)

Opens the tarfile for appending - seeks to the end

type SymlinkError

type SymlinkError struct {
	Linkname string
}

func (*SymlinkError) Error

func (e *SymlinkError) Error() string

type UUID

type UUID [uuid.Length]byte

func NewUUID

func NewUUID() (UUID, error)

returns a hexified uuid.Length-byte UUID1

func Put

func Put(realm string, info Info, data io.Reader) (key UUID, err error)

puts file (info + data) into the given realm - returns the key if the key is in info, then uses that

func UUIDFromBytes

func UUIDFromBytes(text []byte) (b UUID, err error)

func UUIDFromString

func UUIDFromString(text string) (b UUID, err error)

func (UUID) Bytes

func (b UUID) Bytes() []byte

func (UUID) IsEmpty

func (b UUID) IsEmpty() bool

func (UUID) String

func (b UUID) String() string

Directories

Path Synopsis
Append-Only Storage HTTP server
Append-Only Storage HTTP server
uuid.go
uuid.go

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL