concurrenthash

package module
v1.6.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 4, 2024 License: MIT Imports: 20 Imported by: 0

README

concurrent hash

concurrenthash codecov Go Report Card Go Reference

A simple two level Merkle tree for hashing large files to ensure integrity. Fast hashing algorithms exist for the case of large files but they dont consider the entire file and thus cannot guarantee integrity. Concurrently hashing blocks of the file and then hashing the hashes is not a new idea, both zfs and btrfs hash inodes all the way up the directory tree to ensure the filesystem is not corrupted.

Usage

Library
var ch = concurrenthash.NewConcurrentHash(context.Background(), 2, 2, sha512.New)
var hash, err = ch.HashFile(file)
fmt.Println(hash)

Some hash algorithms do not have constructors that return hash.Hash so there are convenience wrappers you can use.

Cli

./concurrenthash -file /path/to/large/file -threads 4 -block-size 1

Options:
 -algos                 print available hash algos
 -file                  input file
 -hash-func             hash algo to use, default: sha256
 -threads               amount of concurrency, default: 1
 -block-size            size of the leaf nodes to be hashed, default: 1MB

Benchmarks

Time to hash a 10GB file of /dev/urandom data

algo time (s)
adler32 8.02
crc32Castagnoli 6.60
crc32IEEE 4.99
crc32Koopman 6.18
crc64ECMA 5.54
crc64ISO 4.20
fnv32 4.15
fnv32a 4.14
fnv64 4.58
fnv64a 4.33
md5 19.01
murmur32 5.05
murmur64 5.36
sha1 13.07
sha256 18.14
sha512 18.10

Block size benchmarks

Raw data Benchmarks

Documentation

Index

Constants

This section is empty.

Variables

View Source
var HashNamesToHashFuncs = map[string]func() hash.Hash{
	"adler32":         WrapAdler32,
	"crc32IEEE":       WrapCrc32IEEE,
	"crc32Castagnoli": WrapCrc32Castagnoli,
	"crc32Koopman":    WrapCrc32Koopman,
	"crc64ISO":        WrapCrc64ISO,
	"crc64ECMA":       WrapCrc64ECMA,
	"fnv32":           WrapFnv32,
	"fnv32a":          WrapFnv32a,
	"fnv64":           WrapFnv64,
	"fnv64a":          WrapFnv64a,
	"sha256":          sha256.New,
	"md5":             md5.New,
	"sha1":            sha1.New,
	"sha512":          sha512.New,
	"murmur32":        WrapMurmur32,
	"murmur64":        WrapMurmur64,
}

Functions

func WrapAdler32

func WrapAdler32() hash.Hash

func WrapCrc32Castagnoli

func WrapCrc32Castagnoli() hash.Hash

func WrapCrc32IEEE

func WrapCrc32IEEE() hash.Hash

func WrapCrc32Koopman

func WrapCrc32Koopman() hash.Hash

func WrapCrc64ECMA

func WrapCrc64ECMA() hash.Hash

func WrapCrc64ISO

func WrapCrc64ISO() hash.Hash

func WrapFnv32

func WrapFnv32() hash.Hash

func WrapFnv32a

func WrapFnv32a() hash.Hash

func WrapFnv64

func WrapFnv64() hash.Hash

func WrapFnv64a

func WrapFnv64a() hash.Hash

func WrapMurmur32

func WrapMurmur32() hash.Hash

func WrapMurmur64

func WrapMurmur64() hash.Hash

func WrapSha224 added in v1.4.0

func WrapSha224() hash.Hash

func WrapSha256 added in v1.4.0

func WrapSha256() hash.Hash

func WrapSha384 added in v1.3.0

func WrapSha384() hash.Hash

func WrapSha512 added in v1.3.0

func WrapSha512() hash.Hash

func WrapSha512224 added in v1.3.0

func WrapSha512224() hash.Hash

func WrapSha512256 added in v1.3.0

func WrapSha512256() hash.Hash

Types

type ConcurrentHash

type ConcurrentHash struct {
	Concurrency     int
	BlockSize       int64
	HashConstructor func() hash.Hash

	// internal
	Hashes     [][]byte
	HashesLock sync.RWMutex
}

ConcurrentHash is basically a https://en.wikipedia.org/wiki/Merkle_tree

func NewConcurrentHash

func NewConcurrentHash(concurrency int, blockSize int64, hashFunc func() hash.Hash) ConcurrentHash

NewConcurrentHash is the constructor and entrypoint

func (*ConcurrentHash) HashFile

func (c *ConcurrentHash) HashFile(ctx context.Context, file string) (string, error)

HashFile is a coordination func that fans out to hash workers, collects their output and hashes the final array

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL