dtdiff

package
v0.0.0-...-be7858c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 24, 2017 License: BSD-2-Clause Imports: 19 Imported by: 0

README

Distributed Tree Diff

This is an attempt to implement a system that tracks changes in different replicas of a tree structure, intended for file systems.

Many of the ideas here are based on this paper:

D. Malkhi and D. Terry. Concise version vectors in WinFS. In Distributed Computing, 2005.

Which can be found on Google Scholar.

This librarly appears pretty stable now, but I cannot guarantee anything.

This library is licensed under the BSD 2-clause license. See LICENSE.txt for details.

File format

The file format is actually pretty simple. It's a fairly easy-to-parse text format.

Example file:

Magic: dtsync-status-file
Created-By: dtsync 0.1
Version: 1
Content-Type: text/tab-separated-values; charset=utf-8
Identity: first
Generation: 2
Knowledge: second:1,other:12

path	fingerprint	revision
dir	d/2016-02-15T19:17:51.35414374Z/0	0:1
dir/file.txt	f/2016-02-15T19:18:18.290458876Z/105	0:1
file2.html	f/2012-12-18T20:25:21.119862001Z/2046	0:2

The whole file is in MIME format, and must be parsed with an appropriate library.

Magic: a magic string that can be used to identify the file type. The first line of the file must be exactly like this.

Created-By: a more-or-less free format identifier for the software that wrote the file. Currently it is in a <name> <version> format.

Version: the current version of this file format. Can be increased in the future for backwards-incompatible changes so older versions of dtsync can still work with the file. Changes should, if possible, be additions that still work on older versions in most cases. This version header can be used when old programs can interpret the file the wrong way.

Content-Type: always the same value. Can be changed in the future so other formats (binary?) can be used. The only supported charset is utf-8.

Identity: an opaque string (roughly limited to a-zA-Z0-9, but some other characters are allowed) serving as identification of this replica. Must be sufficiently large and randomly generated by the creating replica as to be unique. An UUID may also be used.

Generation: the generation number of this replica. Each scan that a change is detected, this number is incremented by one.

Knowledge: a key-value format to store what changes of other replicas have been (fully) integrated into this replica.

Option-*: options (e.g. Exclude) are put in a header starting with Option- with the key after that. E.g.: Option-Exclude: /.sync. Options are kept even if the option cannot be interpreted by the parser. Other headers are removed if they are unknown.

Then follows the body, just like in HTTP. The format of the body is in TSV, with escape characters as described in UniTSV. In short, the escape characters are literal \\, \n and \t to encode \, newline, and tab.

The body has a few possible columns. More can be added in the future, but unknown columns should be removed by an implementation. The columns are path, fingerprint, inode, revision, hash, and options.

The path elements are separated by forward slashes, even on Windows, for consistency.

The fingerprint is a special microformat. It is of the form type/modtime/size for regular files, and type/modtime for other filetypes. In the future, a permissions field may be added, making it type/modtime/permissions/size and type/modtime/permissions. The type field is an f for regular files or d for directores. Other types are added as appropriate. modtime is encoded in the RFC3339 format with nanoseconds and no timezone

The inode column contains the inode for this file, if the underlying filesystem supports it. When no inode is available, it is an empty string (or zero).

The revision column has two values: the replica index (1-indexed in the Knowledge header, 0 means this replica) and the generation number. With each change, the revision and generation number are set to that of the current replica.

The hash column contains the blake2b hash of the file. The hash may be left out, and is not shown here.

The options column (which is not shown here either) contains key-value pairs in the format key=value,key2=value2. An example of an option is indicating a file is removed (removed key with RFC3339 timestamp when it disappeared) but still keeping the status around in case it appears again (e.g. when a volume is re-mounted or a file moved back).

Documentation

Overview

Package dtdiff is a generated protocol buffer package.

It is generated from these files:

messages.proto

It has these top-level messages:

ProtoReplica
ProtoEntry

Index

Constants

View Source
const (
	MAGIC_TEXT  = "dtsync-status-file"
	MAGIC_PROTO = "dtsync-status-file-proto"
)

These string constants are used in the header to determine serialization format and can be used to identify the filetype (e.g. by the file(1) command).

View Source
const (
	FORMAT_NONE = iota
	FORMAT_TEXT
	FORMAT_PROTO
)

Format constants to use for the Replica.Serialize function.

View Source
const HASH_ID = "blake2b-256"

HASH_ID is the identifier for the hash function in use. The last part is the number of bits for this version (256 bits or 32 bytes).

View Source
const PERMS_DEFAULT = 0777

PERMS_DEFAULT has the default permission bits that are compared. It is possible to compare less bits with the 'perms' option.

View Source
const STATUS_FILE = ".dtsync"

File where current status of the tree is stored.

View Source
const STATUS_RETAIN = time.Hour * 24 * 30

STATUS_RETAIN is the duration how long to retain status entries that have disappeared, in case they reappear (e.g. a disk is mounted again).

Variables

View Source
var (
	ErrExists             = errors.New("dtdiff: already exists")
	ErrSameRoot           = errors.New("dtdiff: trying to synchronize the same directory")
	ErrParsingFingerprint = errors.New("dtdiff: could not parse fingerprint")
)

Functions

func ExtraOptions

func ExtraOptions(extraOptions *tree.ScanOptions) func(*ReplicaSet)

func IterateEntries

func IterateEntries(list []*Entry) chan *Entry

IterateEntries returns a channel, reads from the channel will return each entry in the slice.

func LeastName

func LeastName(names ...string) string

LeastName returns the alphabetically first name in the list that is not the empty string.

func Progress

func Progress(progress chan<- ScanProgress) func(*ReplicaSet)

Types

type Entry

type Entry struct {
	// contains filtered or unexported fields
}

An Entry is one object (row) in a Replica. It belongs to one Replica.

func (*Entry) Add

func (e *Entry) Add(info tree.FileInfo, source *Entry) (*Entry, error)

Add a new status entry.

func (*Entry) After

func (e *Entry) After(e2 *Entry) bool

After returns true if this entry was modified after the other.

func (*Entry) Before

func (e *Entry) Before(e2 *Entry) bool

Before returns true if this entry is modified before the other.

func (*Entry) Conflict

func (e *Entry) Conflict(e2 *Entry) bool

Conflict returns true if both entries are modified.

func (*Entry) Count

func (e *Entry) Count() (int, int64)

Count returns the number of entries (at least 1) and the number of bytes in the file or directory tree (only regular bytes, no directory entries). Can be used for progress indication.

func (*Entry) Equal

func (e *Entry) Equal(e2 *Entry) bool

Equal returns true if both entries are of the same revision (replica and generation). Not recursive.

func (*Entry) EqualContents

func (e *Entry) EqualContents(e2 *Entry) bool

EqualContents returns true if the contents (fingerprint/hash) of these entries is the same.

func (*Entry) EqualMode

func (e *Entry) EqualMode(e2 *Entry) bool

EqualMode compares the mode bits, noting the HasMode of both entries.

func (*Entry) Filesystem

func (e *Entry) Filesystem() tree.Filesystem

Filesystem returns the filesystem ID for this replica (filesystem IDs are unique per replica).

func (*Entry) Get

func (e *Entry) Get(name string) *Entry

Get returns the named child, or nil if it doesn't exist.

func (*Entry) HasMode

func (e *Entry) HasMode() tree.Mode

HasMode returns the permission bits this entry supports.

func (*Entry) HasRevision

func (e *Entry) HasRevision(other *Entry) bool

HasRevision returns true if this file (actually, this replica) includes the revision the other entry is at.

func (*Entry) Hash

func (e *Entry) Hash() tree.Hash

Hash returns the hash of the status entry, if one is known.

func (*Entry) Includes

func (e *Entry) Includes(e2 *Entry) bool

Includes returns true if this entry includes all revisions from the other entry (recursively: children are also compared).

FIXME: It does not always work when one file is removed. It does work however to check for equality, though.

func (*Entry) Inode

func (e *Entry) Inode() uint64

Id returns an unique identification for this file: a combination of the inode number and the generation number (for NFS). Together with the filesystem ID, this is a unique identification for this replica.

Returns a nil value if there is no unique identification available.

func (*Entry) List

func (e *Entry) List() []*Entry

func (*Entry) ModTime

func (e *Entry) ModTime() time.Time

ModTime returns the last modification time.

func (*Entry) Mode

func (e *Entry) Mode() tree.Mode

Mode returns the permission bits for this entry.

func (*Entry) Name

func (e *Entry) Name() string

Name returns the name of this entry

func (*Entry) RelativePath

func (e *Entry) RelativePath() []string

RelativePath returns the path relative to the root

func (*Entry) Remove

func (e *Entry) Remove()

Remove this entry. It will mark this entry as removed, and remove it once it's getting old.

func (*Entry) Size

func (e *Entry) Size() int64

Size returns the filesize for regular files, or 0.

func (*Entry) String

func (e *Entry) String() string

String function, for debugging purposes

func (*Entry) Type

func (e *Entry) Type() tree.Type

Type returns the tree.Type filetype.

func (*Entry) Update

func (e *Entry) Update(info tree.FileInfo, fs interface{}, hash tree.Hash, source *Entry)

Update updates the revision if the file was changed. The file is not changed if the fingerprint but not the hash changed. Note: the fs argument must be one of *tree.LocalFilesystem or tree.Filesystem.

func (*Entry) UpdateHash

func (e *Entry) UpdateHash(hash tree.Hash, source *Entry)

UpdateHash sets the new hash from the parameter, marking this file as changed if it is different from the existing one.

type ErrSameIdentity

type ErrSameIdentity struct {
	Identity string
}

func (*ErrSameIdentity) Error

func (e *ErrSameIdentity) Error() string

type ParseError

type ParseError struct {
	Message string
	Row     int
	Err     error
}

func (*ParseError) Error

func (e *ParseError) Error() string

type ProtoEntry

type ProtoEntry struct {
	Name             *string `protobuf:"bytes,1,opt,name=name" json:"name,omitempty"`
	Type             *uint32 `protobuf:"varint,2,opt,name=type" json:"type,omitempty"`
	ModTime          *int64  `protobuf:"zigzag64,3,opt,name=modTime" json:"modTime,omitempty"`
	Size             *uint64 `protobuf:"varint,4,opt,name=size" json:"size,omitempty"`
	Identity         *uint32 `protobuf:"varint,5,opt,name=identity" json:"identity,omitempty"`
	Generation       *uint64 `protobuf:"varint,6,opt,name=generation" json:"generation,omitempty"`
	Mode             *uint32 `protobuf:"varint,7,opt,name=mode" json:"mode,omitempty"`
	HasMode          *uint32 `protobuf:"varint,8,opt,name=hasMode" json:"hasMode,omitempty"`
	HashType         *uint32 `protobuf:"varint,9,opt,name=hashType" json:"hashType,omitempty"`
	HashData         []byte  `protobuf:"bytes,10,opt,name=hashData" json:"hashData,omitempty"`
	Removed          *int64  `protobuf:"zigzag64,11,opt,name=removed" json:"removed,omitempty"`
	Inode            *uint64 `protobuf:"varint,12,opt,name=inode" json:"inode,omitempty"`
	Fs               *uint64 `protobuf:"varint,16,opt,name=fs" json:"fs,omitempty"`
	XXX_unrecognized []byte  `json:"-"`
}

func (*ProtoEntry) Descriptor

func (*ProtoEntry) Descriptor() ([]byte, []int)

func (*ProtoEntry) GetFs

func (m *ProtoEntry) GetFs() uint64

func (*ProtoEntry) GetGeneration

func (m *ProtoEntry) GetGeneration() uint64

func (*ProtoEntry) GetHasMode

func (m *ProtoEntry) GetHasMode() uint32

func (*ProtoEntry) GetHashData

func (m *ProtoEntry) GetHashData() []byte

func (*ProtoEntry) GetHashType

func (m *ProtoEntry) GetHashType() uint32

func (*ProtoEntry) GetIdentity

func (m *ProtoEntry) GetIdentity() uint32

func (*ProtoEntry) GetInode

func (m *ProtoEntry) GetInode() uint64

func (*ProtoEntry) GetModTime

func (m *ProtoEntry) GetModTime() int64

func (*ProtoEntry) GetMode

func (m *ProtoEntry) GetMode() uint32

func (*ProtoEntry) GetName

func (m *ProtoEntry) GetName() string

func (*ProtoEntry) GetRemoved

func (m *ProtoEntry) GetRemoved() int64

func (*ProtoEntry) GetSize

func (m *ProtoEntry) GetSize() uint64

func (*ProtoEntry) GetType

func (m *ProtoEntry) GetType() uint32

func (*ProtoEntry) ProtoMessage

func (*ProtoEntry) ProtoMessage()

func (*ProtoEntry) Reset

func (m *ProtoEntry) Reset()

func (*ProtoEntry) String

func (m *ProtoEntry) String() string

type ProtoReplica

type ProtoReplica struct {
	Version          *uint32  `protobuf:"varint,1,opt,name=version" json:"version,omitempty"`
	CreatedBy        *string  `protobuf:"bytes,2,opt,name=createdBy" json:"createdBy,omitempty"`
	Identity         *string  `protobuf:"bytes,3,opt,name=identity" json:"identity,omitempty"`
	Generation       *uint64  `protobuf:"varint,4,opt,name=generation" json:"generation,omitempty"`
	KnowledgeKeys    []string `protobuf:"bytes,5,rep,name=knowledgeKeys" json:"knowledgeKeys,omitempty"`
	KnowledgeValues  []uint64 `protobuf:"varint,6,rep,name=knowledgeValues" json:"knowledgeValues,omitempty"`
	OptionKeys       []string `protobuf:"bytes,7,rep,name=optionKeys" json:"optionKeys,omitempty"`
	OptionValues     []string `protobuf:"bytes,8,rep,name=optionValues" json:"optionValues,omitempty"`
	Hash             *string  `protobuf:"bytes,9,opt,name=hash" json:"hash,omitempty"`
	XXX_unrecognized []byte   `json:"-"`
}

func (*ProtoReplica) Descriptor

func (*ProtoReplica) Descriptor() ([]byte, []int)

func (*ProtoReplica) GetCreatedBy

func (m *ProtoReplica) GetCreatedBy() string

func (*ProtoReplica) GetGeneration

func (m *ProtoReplica) GetGeneration() uint64

func (*ProtoReplica) GetHash

func (m *ProtoReplica) GetHash() string

func (*ProtoReplica) GetIdentity

func (m *ProtoReplica) GetIdentity() string

func (*ProtoReplica) GetKnowledgeKeys

func (m *ProtoReplica) GetKnowledgeKeys() []string

func (*ProtoReplica) GetKnowledgeValues

func (m *ProtoReplica) GetKnowledgeValues() []uint64

func (*ProtoReplica) GetOptionKeys

func (m *ProtoReplica) GetOptionKeys() []string

func (*ProtoReplica) GetOptionValues

func (m *ProtoReplica) GetOptionValues() []string

func (*ProtoReplica) GetVersion

func (m *ProtoReplica) GetVersion() uint32

func (*ProtoReplica) ProtoMessage

func (*ProtoReplica) ProtoMessage()

func (*ProtoReplica) Reset

func (m *ProtoReplica) Reset()

func (*ProtoReplica) String

func (m *ProtoReplica) String() string

type Replica

type Replica struct {
	// contains filtered or unexported fields
}

func LoadReplica

func LoadReplica(file io.Reader) (*Replica, error)

func ScanTree

func ScanTree(fs tree.LocalFileTree, extraOptions *tree.ScanOptions, recvOptionsChan, sendOptionsChan chan *tree.ScanOptions, progress chan<- *tree.ScanProgress, cancel chan struct{}) (*Replica, error)

func (*Replica) Changed

func (r *Replica) Changed() bool

Changed returns true if this replica got a change in it's own files (data or metadata)

func (*Replica) ChangedAny

func (r *Replica) ChangedAny() bool

ChangedAny returns true if this replica got any updated (presumably during the last scan).

func (*Replica) Perms

func (r *Replica) Perms() tree.Mode

Perms returns the permission map used for this replica (which permissions are used in synchronizing). 0777 by default, but can be changed by the "perms" option.

func (*Replica) Root

func (r *Replica) Root() *Entry

Root returns the root entry.

func (*Replica) Serialize

func (r *Replica) Serialize(fs tree.Tree) error

func (*Replica) SerializeStream

func (r *Replica) SerializeStream(out io.Writer, format int) error

func (*Replica) String

func (r *Replica) String() string

type ReplicaSet

type ReplicaSet struct {
	// contains filtered or unexported fields
}

ReplicaSet is a combination of two replicas

func Scan

func Scan(fs1, fs2 tree.Tree, options ...func(*ReplicaSet)) (*ReplicaSet, error)

func (*ReplicaSet) Get

func (rs *ReplicaSet) Get(index int) *Replica

Get returns the replica by index

func (*ReplicaSet) MarkSynced

func (rs *ReplicaSet) MarkSynced()

MarkSynced sets the generation as including each other. This is done after they have been cleanly synchronized.

type ScanProgress

type ScanProgress [2]*tree.ScanProgress

func (ScanProgress) Ahead

func (p ScanProgress) Ahead() *tree.ScanProgress

func (ScanProgress) Behind

func (p ScanProgress) Behind() *tree.ScanProgress

func (ScanProgress) Percent

func (p ScanProgress) Percent() float64

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL