hashlink

package module
v0.0.0-...-8da0c3e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 2, 2020 License: Apache-2.0 Imports: 9 Imported by: 0

README

Build Status

Hashlink is a utility designed to perform migrations of duplicated data in a set of drives. Specifically, it is designed to free up space when one file is duplicated between drives, even when their filenames differ. Hashlink will take all matching files and hardlink them to the given destination location. Hashlink makes heavy use of concurrency to split up the workload of hashing these files.

Building

Run make build in the project root to produce the hashlink binary.

Usage

Usage: ./hashlink [-j n] [-n] [-c] src_dir reference_dir out_dir
  -c	copy the files that are missing from src_dir
  -j int
    	specify a number of workers (default 1)
  -n	do not link any files, but print out what files would have been linked

Hashlink has three directories it references.

  • src_dir is the directory from which files will be hardlinked.
  • reference_dir is where the potentially duplicated data is stored. In the workflow that hashlink is designed to handle, this is located on a separate filesystem than src_dir or out_dir. If -c is specified, any files that are located within reference_dir but not src_dir will be copied from reference_dir.
  • out_dir is where any hardlinks or copies will be placed. Due to the nature of how hardlinks work, this must be on the same filesystem as src_dir. In addition, this directory must be empty before running the utility.
Example Use-Case

Consider the following setup

$ ls -l /mnt/drive1/foo
total 0
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 a
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 b
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 c

$ ls -l /mnt/drive2/foo
total 0
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 the-same-as-a-but-different-name
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 b
-rw-r--r-- 1 nick nick 0 Aug 20 00:21 someotherfile

If a user desires to create /mnt/drive1/bar, which contains the contents shared between /mnt/drive1/foo and /mnt/drive2/foo, running hashlink /mnt/drive1/foo /mnt/drive2/foo /mnt/drive1/bar will perform the desired migration.

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetUnmappedFiles

func GetUnmappedFiles(hashes PathHashes, files FileMap) []string

GetUnmappedFiles returns all files that are in hashes but not files.

func ParallelWalkHasherProgressReporter

func ParallelWalkHasherProgressReporter(reporter ProgressReporter) func(*ParallelWalkHasher)

ParallelWalkHasherProgressReporter will provide a ProgressReporter for a ParallelWalkWasher. Intended to be passed to NewParallelWalkHasher as an option.

func SerialWalkHasherProgressReporter

func SerialWalkHasherProgressReporter(reporter ProgressReporter) func(*SerialWalkHasher)

SerialWalkHasherProgressReporter will provide a ProgressReporter for a SerialWalkHasher. Intended to be passed to NewSerialWalkHasher as an option.

Types

type FileMap

type FileMap map[string][]string

FileMap represents a mapping between one file path and any related file paths.

func FindIdenticalFiles

func FindIdenticalFiles(hashes PathHashes, other PathHashes) FileMap

FindIdenticalFiles generates a FileMap that describes the identical files in hashes, mapped to the identical files in other.

func MakeFlippedFileMap

func MakeFlippedFileMap(files FileMap) FileMap

MakeFlippedFileMap takes an existing map and moves all of the files in the value portion to the keys portion, and vice-versa.

type ParallelWalkHasher

type ParallelWalkHasher struct {
	// contains filtered or unexported fields
}

ParallelWalkHasher will hash all files concurrently, up to the number of specified workers.

func NewParallelWalkHasher

func NewParallelWalkHasher(numWorkers int, constructor func() hash.Hash, options ...func(*ParallelWalkHasher)) *ParallelWalkHasher

NewParallelWalkHasher makekes a new ParallelWalkHasher with a constructor for a hash algorithm and a number of workers.

func (*ParallelWalkHasher) WalkAndHash

func (hasher *ParallelWalkHasher) WalkAndHash(root string) (PathHashes, error)

WalkAndHash walks the given path across all workers and returns hashes for all the files in the path.

type PathHashes

type PathHashes map[string]hash.Hash

PathHashes represent the hashes for all paths walked by a WalkHasher, with the path as the key, and the hash as the value.

type Progress

type Progress int

Progress repressents the progress of something, on a scale of 0-100

type ProgressReporter

type ProgressReporter interface {
	// ReportProgress will report the progress of the process.
	ReportProgress(progress Progress)
}

ProgressReporter will report the progress of a process.

type SerialWalkHasher

type SerialWalkHasher struct {
	// contains filtered or unexported fields
}

SerialWalkHasher will hash all files one after the other. Implements HashWalker.

func NewSerialWalkHasher

func NewSerialWalkHasher(constructor func() hash.Hash, options ...func(*SerialWalkHasher)) *SerialWalkHasher

NewSerialWalkHasher makes a new SerialWalkHasher with a constructor for a hash algorithm.

func (SerialWalkHasher) WalkAndHash

func (hasher SerialWalkHasher) WalkAndHash(root string) (PathHashes, error)

WalkAndHash walks the given path and returns hashes for all the files in the path.

type WalkHasher

type WalkHasher interface {
	// WalkAndHash takes a root path and returns a path of each file, along with its hash.
	WalkAndHash(root string) (PathHashes, error)
}

WalkHasher represents something that can walk a tree and generate hashes.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL