hashlink

package module

v0.0.0-...-8da0c3e Latest Latest Go to latest Published: May 2, 2020 License: Apache-2.0 Imports: 9 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ollien/hashlink

Links

Open Source Insights

README ¶

hashlink

Hashlink is a utility designed to perform migrations of duplicated data in a set of drives. Specifically, it is designed to free up space when one file is duplicated between drives, even when their filenames differ. Hashlink will take all matching files and hardlink them to the given destination location. Hashlink makes heavy use of concurrency to split up the workload of hashing these files.

Building

Run make build in the project root to produce the hashlink binary.

Usage

Usage: ./hashlink [-j n] [-n] [-c] src_dir reference_dir out_dir
  -c	copy the files that are missing from src_dir
  -j int
    	specify a number of workers (default 1)
  -n	do not link any files, but print out what files would have been linked

Hashlink has three directories it references.

src_dir is the directory from which files will be hardlinked.
reference_dir is where the potentially duplicated data is stored. In the workflow that hashlink is designed to handle, this is located on a separate filesystem than src_dir or out_dir. If -c is specified, any files that are located within reference_dir but not src_dir will be copied from reference_dir.
out_dir is where any hardlinks or copies will be placed. Due to the nature of how hardlinks work, this must be on the same filesystem as src_dir. In addition, this directory must be empty before running the utility.

Example Use-Case

Consider the following setup

$ ls -l /mnt/drive1/foo
total 0
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 a
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 b
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 c

$ ls -l /mnt/drive2/foo
total 0
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 the-same-as-a-but-different-name
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 b
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 someotherfile

If a user desires to create /mnt/drive1/bar, which contains the contents shared between /mnt/drive1/foo and /mnt/drive2/foo, running hashlink /mnt/drive1/foo /mnt/drive2/foo /mnt/drive1/bar will perform the desired migration.

Documentation ¶

Index ¶

func GetUnmappedFiles(hashes PathHashes, files FileMap) []string
func ParallelWalkHasherProgressReporter(reporter ProgressReporter) func(*ParallelWalkHasher)
func SerialWalkHasherProgressReporter(reporter ProgressReporter) func(*SerialWalkHasher)
type FileMap
- func FindIdenticalFiles(hashes PathHashes, other PathHashes) FileMap
- func MakeFlippedFileMap(files FileMap) FileMap
type ParallelWalkHasher
- func NewParallelWalkHasher(numWorkers int, constructor func() hash.Hash, ...) *ParallelWalkHasher
- func (hasher *ParallelWalkHasher) WalkAndHash(root string) (PathHashes, error)
type PathHashes
type Progress
type ProgressReporter
type SerialWalkHasher
- func NewSerialWalkHasher(constructor func() hash.Hash, options ...func(*SerialWalkHasher)) *SerialWalkHasher
- func (hasher SerialWalkHasher) WalkAndHash(root string) (PathHashes, error)
type WalkHasher

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func GetUnmappedFiles ¶

func GetUnmappedFiles(hashes PathHashes, files FileMap) []string

GetUnmappedFiles returns all files that are in hashes but not files.

func ParallelWalkHasherProgressReporter ¶

func ParallelWalkHasherProgressReporter(reporter ProgressReporter) func(*ParallelWalkHasher)

ParallelWalkHasherProgressReporter will provide a ProgressReporter for a ParallelWalkWasher. Intended to be passed to NewParallelWalkHasher as an option.

func SerialWalkHasherProgressReporter ¶

func SerialWalkHasherProgressReporter(reporter ProgressReporter) func(*SerialWalkHasher)

SerialWalkHasherProgressReporter will provide a ProgressReporter for a SerialWalkHasher. Intended to be passed to NewSerialWalkHasher as an option.

Types ¶

type FileMap ¶

type FileMap map[string][]string

FileMap represents a mapping between one file path and any related file paths.

func FindIdenticalFiles ¶

func FindIdenticalFiles(hashes PathHashes, other PathHashes) FileMap

FindIdenticalFiles generates a FileMap that describes the identical files in hashes, mapped to the identical files in other.

func MakeFlippedFileMap ¶

func MakeFlippedFileMap(files FileMap) FileMap

MakeFlippedFileMap takes an existing map and moves all of the files in the value portion to the keys portion, and vice-versa.

type ParallelWalkHasher ¶

type ParallelWalkHasher struct {
	// contains filtered or unexported fields
}

ParallelWalkHasher will hash all files concurrently, up to the number of specified workers.

func NewParallelWalkHasher ¶

func NewParallelWalkHasher(numWorkers int, constructor func() hash.Hash, options ...func(*ParallelWalkHasher)) *ParallelWalkHasher

NewParallelWalkHasher makekes a new ParallelWalkHasher with a constructor for a hash algorithm and a number of workers.

func (*ParallelWalkHasher) WalkAndHash ¶

func (hasher *ParallelWalkHasher) WalkAndHash(root string) (PathHashes, error)

WalkAndHash walks the given path across all workers and returns hashes for all the files in the path.

type PathHashes ¶

type PathHashes map[string]hash.Hash

PathHashes represent the hashes for all paths walked by a WalkHasher, with the path as the key, and the hash as the value.

type Progress ¶

type Progress int

Progress repressents the progress of something, on a scale of 0-100

type ProgressReporter ¶

type ProgressReporter interface {
	// ReportProgress will report the progress of the process.
	ReportProgress(progress Progress)
}

ProgressReporter will report the progress of a process.

type SerialWalkHasher ¶

type SerialWalkHasher struct {
	// contains filtered or unexported fields
}

SerialWalkHasher will hash all files one after the other. Implements HashWalker.

func NewSerialWalkHasher ¶

func NewSerialWalkHasher(constructor func() hash.Hash, options ...func(*SerialWalkHasher)) *SerialWalkHasher

NewSerialWalkHasher makes a new SerialWalkHasher with a constructor for a hash algorithm.

func (SerialWalkHasher) WalkAndHash ¶

func (hasher SerialWalkHasher) WalkAndHash(root string) (PathHashes, error)

WalkAndHash walks the given path and returns hashes for all the files in the path.

type WalkHasher ¶

type WalkHasher interface {
	// WalkAndHash takes a root path and returns a path of each file, along with its hash.
	WalkAndHash(root string) (PathHashes, error)
}

WalkHasher represents something that can walk a tree and generate hashes.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
cmd
multierror

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL