xsum

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 11, 2022 License: Apache-2.0 Imports: 19 Imported by: 0

README

xsum

GoDoc Tests

xsum is a utility for calculating checksums that supports:

  • 18 cryptographic hash functions
  • 12 non-cryptographic hash functions

The xsum CLI can be used in place of shasum, md5sum, or similar utilities.

xsum differs from existing tools that calculate checksums in that it can:

  • Calculate a single checksum for an entire directory structure using Merkle trees.
    • Merkle trees allow for concurrency when calculating checksums of directories. (See Performance.)
    • Merkle trees are the same data structure used to reference layers in Docker images.
  • Calculate checksums that include file attributes such as type, UID, GID, permissions, etc. (See Format.)
  • Execute plugins, including:
    • xsum-pcm: calculate checksums of raw PCM inside audio files (e.g., AAC, MP3, FLAC, ALAC)
      • Checksums remain constant when audio file metadata/tags change, but still protect audio stream.
      • Install xsum-pcm to $PATH and use xsum -a pcm to invoke.
      • Requires ffmpeg.

Performance

xsum aims to:

  • Minimize execution time using concurrency
  • Avoid opening more files than available CPUs
  • Provide entirely deterministic output
  • Avoid buffering or delaying output

This makes xsum ideal for calculating checksums of large directory structures (e.g., for archival purposes).

With shasum -a 256, ~21 seconds:

laptop:Library stephen$ time find "The Beatles/" -type f -print0|xargs -0 shasum -a 256
...

real    0m24.775s
user    0m21.250s
sys     0m2.209s

With xsum, defaulting to sha256, ~3 seconds:

laptop:Library stephen$ time find "The Beatles/" -type f -print0|xargs -0 xsum
...

real    0m2.882s
user    0m19.297s
sys     0m0.971s

Checksum of entire directory structure (including UID/GID/perms), using ASN.1 Merkle tree, ~3 seconds:

laptop:Library stephen$ time xsum -f "The Beatles/"
sha256:c1ee0a0a43b56ad834d12aa7187fdb367c9efd5b45dbd96163a9ce27830b5651:7777+ug  The Beatles

real    0m2.832s
user    0m19.328s
sys     0m0.937s

Usage

$ xsum -h
Usage:
  xsum [OPTIONS] [paths...]

General Options:
  -a, --algorithm=  Use specified hash function (default: sha256)
  -w, --write=      Write a separate, adjacent file for each checksum
                    By default, filename will be [orig-name].[alg]
                    Use -w=ext or -wext to override extension (no space!)
  -c, --check       Validate checksums
  -s, --status      With --check, suppress all output
  -q, --quiet       With --check, suppress passing checksums
  -v, --version     Show version

Mask Options:
  -m, --mask=       Apply attribute mask as [777]7[+ugx...]:
                    +u	Include UID
                    +g	Include GID
                    +s	Include special file modes
                    +t	Include modified time
                    +c	Include created time
                    +x	Include extended attrs
                    +i	Include top-level metadata
                    +n	Exclude file names
                    +e	Exclude data
                    +l	Always follow symlinks
  -d, --dirs        Directory mode (implies: -m 0000)
  -p, --portable    Portable mode, exclude names (implies: -m 0000+p)
  -g, --git         Git mode (implies: -m 0100)
  -f, --full        Full mode (implies: -m 7777+ug)
  -x, --extended    Extended mode (implies: -m 7777+ugxs)
  -e, --everything  Everything mode (implies: -m 7777+ugxsct)
  -i, --inclusive   Include top-level metadata (enables mask, adds +i)
  -l, --follow      Follow symlinks (enables mask, adds +l)
  -o, --opaque      Encode attribute mask to opaque, fixed-length hex (enables mask)

Help Options:
  -h, --help        Show this help message

Format

When extended flags are used (e.g., xsum -d [paths...]), xsum checksums follow a three-part format:

[checksum type]:[checksum](:[attribute mask])  [file name]

For example:

sha256:c1ee0a0a43b56ad834d12aa7187fdb367c9efd5b45dbd96163a9ce27830b5651:7777+ug  The Beatles
sha256:d0ed3ba499d2f79b4b4af9b5a9301918515c35fc99b0e57d88974f1ee74f7820  The Beatles.tar

This allows xsum to:

  1. Encode which attributes (e.g., UNIX permission bits) are included in the hash (if applicable).
  2. Specify which hashing algorithm should be used to validate each hash.

The data format used for extended checksums is specified in FORMAT.md and may be considered stable.

Extended checksums are portable across operating systems, as long as all requested attributes are supported.

Top-level Attributes

By default, xsum only calculates checksums for file/directory contents, including when extended mode flags are used. This means that by default, extended checksums only include attributes (e.g., UNIX permissions) for files/directories that are inside a specified directory.

Use -i to include top-level attributes:

$ xsum -fi "The Beatles.tar" "The Beatles/"
sha256:60f6435e916aae9c4b1a7d4d66011963d80c29744a42c2f0b2171e4c50e90113:7777+ugi  The Beatles.tar
sha256:7a90cbb0973419f0d3b10a82e53281aa3f0f317ab4ecce10570f26a7404975a1:7777+ugi  The Beatles

Without -i, xsum will not append an attribute mask for non-directories, for example:

$ xsum -f "The Beatles.tar" "The Beatles/"
sha256:d0ed3ba499d2f79b4b4af9b5a9301918515c35fc99b0e57d88974f1ee74f7820  The Beatles.tar  # contents only!
sha256:c1ee0a0a43b56ad834d12aa7187fdb367c9efd5b45dbd96163a9ce27830b5651:7777+ug  The Beatles

Additionally, without any extended flags, xsum checksums and errors follow the standard output format used by other checksum tools:

$ xsum "The Beatles.tar" "The Beatles/"
d0ed3ba499d2f79b4b4af9b5a9301918515c35fc99b0e57d88974f1ee74f7820  The Beatles.tar
xsum: The Beatles: is a directory

Installation

Binaries for macOS, Linux, and Windows are attached to each release.

To install xsum-pcm, copy the binary to $PATH. Invoke it with xsum -a pcm.

xsum is also available as a Docker image (includes xsum-pcm in :full).

Go Package

xsum may be imported as a Go package. See godoc for details.

NOTE: The current Go API should not be considered stable.

Security Considerations

  • xsum only uses hashing algorithms present in Go's standard library and golang.org/x/crypto packages.
  • xsum uses a subset of DER-encoded ASN.1 for deterministic and canonical encoding of all metadata and Merkle Trees.
  • Extended checksums (which include a checksum type and attribute mask) should only be validated with xsum to avoid collision with files that contain xsum's data format directly.
  • Certain (generally non-cryptographic) hash functions supported by xsum may have high collision rates with specific patterns of data. These hash functions may not be appropriate when used to generate checksums of directories. Unless you know what you are doing, choose a strong cryptographic hashing function (like sha256) when calculating checksums of directories.

Documentation

Index

Constants

View Source
const (
	HashNone       = ""
	HashMD4        = "md4"
	HashMD5        = "md5"
	HashSHA1       = "sha1"
	HashSHA256     = "sha256"
	HashSHA224     = "sha224"
	HashSHA512     = "sha512"
	HashSHA384     = "sha384"
	HashSHA512_224 = "sha512-224"
	HashSHA512_256 = "sha512-256"
	HashSHA3_224   = "sha3-224"
	HashSHA3_256   = "sha3-256"
	HashSHA3_384   = "sha3-384"
	HashSHA3_512   = "sha3-512"
	HashBlake2s256 = "blake2s256"
	HashBlake2b256 = "blake2b256"
	HashBlake2b384 = "blake2b384"
	HashBlake2b512 = "blake2b512"
	HashRMD160     = "rmd160"
	HashCRC32      = "crc32"
	HashCRC32c     = "crc32c"
	HashCRC32k     = "crc32k"
	HashCRC64ISO   = "crc64iso"
	HashCRC64ECMA  = "crc64ecma"
	HashAdler32    = "adler32"
	HashFNV32      = "fnv32"
	HashFNV32a     = "fnv32a"
	HashFNV64      = "fnv64"
	HashFNV64a     = "fnv64a"
	HashFNV128     = "fnv128"
	HashFNV128a    = "fnv128a"
)

Variables

View Source
var (
	ErrDirectory = errors.New("is a directory")
	ErrNoStat    = errors.New("stat data unavailable")
	ErrNoXattr   = errors.New("xattr data unavailable")

	DefaultSemaphore = semaphore.NewWeighted(int64(runtime.NumCPU()))
	DefaultSum       = &Sum{Semaphore: DefaultSemaphore}
)

Functions

This section is empty.

Types

type Attr

type Attr uint16
const (
	AttrUID Attr = 1 << iota
	AttrGID
	AttrAtime
	AttrMtime
	AttrCtime
	AttrBtime
	AttrSpecial
	AttrX

	AttrInclusive
	AttrNoName
	AttrNoData
	AttrFollow

	AttrEmpty Attr = 0
)

func NewAttrHex

func NewAttrHex(s string) (Attr, error)

func NewAttrString

func NewAttrString(s string) (Attr, error)

func (Attr) Hex

func (a Attr) Hex() string

func (Attr) String

func (a Attr) String() string

type File

type File struct {
	Hash  Hash
	Path  string
	Mask  Mask
	Stdin bool
}

type FileError

type FileError struct {
	Action string // failed action
	Path   string
	Subdir bool // error apply to file/dir in subdir of specified path
	Err    error
}

FileError is similar to os.PathError, but contains extra information such as Subdir.

func (*FileError) Error

func (e *FileError) Error() string

Error message

func (*FileError) Unwrap

func (e *FileError) Unwrap() error

Unwrap returns the underlying error

type Hash

type Hash interface {
	String() string // returns function name
	Metadata(b []byte) ([]byte, error)
	Data(r io.Reader) ([]byte, error)
	File(path string) ([]byte, error)
}

Hash provides a named hash function applicable to several types of data.

func NewHashFunc

func NewHashFunc(name string, fn func() hash.Hash) Hash

NewHashFunc returns a Hash defined by func fn. The same func fn is used for all types of data.

func NewHashPlugin

func NewHashPlugin(name, path string) Hash

NewHashPlugin returns a Hash backed by the xsum plugin at the specified path. File()/Data() and Metadata() may use different underlying hash functions. See PLUGIN.md for details.

type Mask

type Mask struct {
	Mode Mode
	Attr Attr
}

func NewMask

func NewMask(mode os.FileMode, attr Attr) Mask

func NewMaskHex

func NewMaskHex(s string) (Mask, error)

func NewMaskString

func NewMaskString(s string) (Mask, error)

func (Mask) Hex

func (m Mask) Hex() string

func (Mask) String

func (m Mask) String() string

type Mode

type Mode uint16

func NewModeHex

func NewModeHex(s string) (Mode, error)

func NewModeString

func NewModeString(s string) (Mode, error)

func (Mode) Hex

func (m Mode) Hex() string

func (Mode) String

func (m Mode) String() string

type Node

type Node struct {
	File
	Sum   []byte
	Mode  os.FileMode
	Sys   *Sys
	Xattr *Xattr
	Err   error
}

func (*Node) Hex

func (n *Node) Hex() string

func (*Node) String

func (n *Node) String() string

func (*Node) SumString

func (n *Node) SumString() string

type Sum

type Sum struct {
	Semaphore *semaphore.Weighted
	NoDirs    bool
}

Sum may be used to calculate checksums of files and directories. Directory checksums use Merkle trees to hash their contents. If noDirs is true, Files that refer to directories will return ErrDirectory. If Semaphone is not provided, DefaultSemaphore is used.

func (*Sum) Each

func (s *Sum) Each(files <-chan File, fn func(*Node) error) error

Each takes a channel of Files and invokes f for each resulting *Node. Each *Node contains either a checksum or an error. Each returns immediately if fn returns an error.

func (*Sum) EachList

func (s *Sum) EachList(files []File, fn func(*Node) error) error

EachList takes a slice of Files and invokes f for each resulting *Node. Each *Node contains either a checksum or an error. EachList returns immediately if fn returns an error.

func (*Sum) Find

func (s *Sum) Find(files []File) ([]*Node, error)

Find takes a slice of Files and returns a slice of *Nodes. Each *Node contains either a checksum or an error. Unlike Each and EachList, Find returns immediately on the first error encountered. Returned *Nodes are guaranteed to have Node.Err set to nil.

type Sys

type Sys struct {
	UID, GID     *uint32
	Mtime, Ctime *encoding.Timespec
	Rdev         *uint64
}

type Xattr

type Xattr struct {
	HashType encoding.HashType
	Hashes   []encoding.NamedHash
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL