uniq

package module
v1.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 25, 2020 License: BSD-3-Clause Imports: 12 Imported by: 0

README

uniq

Build Status Go Report Card Documentation Latest release BSD-3-Clause License

uniq is a command-line utility for detecting duplicate files.

Usage

NAME
  uniq - detect duplicate files

SYNOPSIS
  uniq -u [-b] [-e] [-L] [-R] [<dir>]
  uniq -d [-b] [-e] [-L] [-R] [<dir>]
  uniq -D [-e] [-L] [-R] [<dir>]

DESCRIPTION
  uniq reads file paths from stdin and looks for duplicates by computing the 
SHA1 checksum of each file. If <dir> is specified, uniq evaluates files in 
<dir> (recursively if -R is specified) instead.
  By default, nothing is printed to stdout. To print paths of files with 
previously-unseen checksums to stdout, specify -u. To print paths of files 
with previously-seen checksums to stdout instead, specify -d. Or, to print a 
summary of all duplicate files and their checksums to stdout once all files 
have been evaluated, specify -D. Note that only one of -u, -d, and -D may 
be specified.
  After evaluating all files, uniq will exit with non-zero status if any 
duplicates were found or if any errors occurred, and zero status otherwise. 
By default, if an error occurs, such as failure to open a file for reading, 
the error is printed to stderr and uniq continues. This behavior may be 
changed by specifying -e, which causes uniq to exit immediately if an error 
occurs. Similarly, specifying -b causes uniq to exit immediately if a file 
with a previously-seen checksum is encountered.

OPTIONS
  -D	Print summary of duplicate files and their checksums to stdout in 
    	the following format after all files have been evaluated:

    		da39a3ee5e6b4b0d3255bfef95601890afd80709:
    		- "/path/to/file1"
    		- "/path/to/file2"
    		...

  -L	Follow symbolic links.
  -R	Read files from <dir> recursively. Has no effect when reading from 
    	stdin.
  -b	Stop processing and exit with non-zero status if a file with a 
    	previously-seen checksum is found.
  -d	Print each file with a previously-seen checksum to stdout.
  -e	If an error occurs, print it to stderr and exit with non-zero status. 
    	The default behavior is to print the error to stderr and continue.
  -u	Print each file with a previously-unseen checksum to stdout.

EXAMPLES
  Print paths of unique images found in <dir> to stdout and discard error 
messages:

    	$ find <dir> -type f -regextype sed \
    		-iregex '.*\.\(gif\|jpe\?g\|png\)' | uniq -u 2>/dev/null

  Write summary of files with duplicate checksums found in <dir> (following 
any symbolic links encountered) to <file> as YAML:

    	$ uniq -R -L -D <dir> > <file>

  Remove files with previously-seen checksums from <dir>:

    	$ uniq -R -d <dir> | xargs rm --

Documentation

Overview

Package uniq exposes primitives for detecting files with duplicate checksums from a list of file paths.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Errors

type Errors []error

Errors implements the error interface for a slice of errors.

func (Errors) Error

func (el Errors) Error() string

type File

type File struct {
	Path string
	Info os.FileInfo
}

File pairs a path with the os.FileInfo for the file located at that path.

type Options

type Options struct {
	FollowSymlinks bool            // Follow symbolic links.
	Recursive      bool            // Recurse if reading from a directory.
	ExitOnError    bool            // Stop if an error occurs.
	ExitOnDup      bool            // Stop if a file with a previously-seen checksum is found.
	Cancel         <-chan struct{} // Close to signal cancellation.
	UniqWriter     io.Writer       // Write paths of files with previously-unseen checksums.
	DupWriter      io.Writer       // Write paths of files with previously-seen checksums.
	ErrWriter      io.Writer       // Write errors.
	// contains filtered or unexported fields
}

Options groups configuration options for Filter and FilterDir.

type Stats

type Stats struct {
	NumFiles    uint64
	NumBytes    uint64
	NumDupFiles uint64
	NumDupBytes uint64
}

Stats contains a summary of files and bytes examined by Sums.

func (Stats) String

func (s Stats) String() string

type Sum

type Sum [sha1.Size]byte

Sum is a type alias for [sha1.Size]byte.

type Sums

type Sums struct {
	// contains filtered or unexported fields
}

Sums is a map of checksums to files that is safe for concurrent access from multiple goroutines.

func Filter

func Filter(r io.Reader, opts *Options) (*Sums, error)

Filter reads newline-delimited file paths from r, evaluates each file in search of duplicate checksums, and returns a *Sums and any error(s) that may have occurred during evaluation. If err is non-nil, its type will be Errors.

func FilterDir

func FilterDir(path string, opts *Options) (*Sums, error)

FilterDir is like Filter except it reads file paths from the directory located at path.

func NewSums

func NewSums() *Sums

NewSums initializes a Sums and returns a pointer to it.

func (*Sums) Append

func (s *Sums) Append(sum Sum, file *File) (dup bool)

Append stores file in the set of files under checksum sum. Append does not attempt to verify whether sum is a valid checksum for file. Append returns false if file is the first encountered for sum, true otherwise.

func (*Sums) Get

func (s *Sums) Get(sum Sum) (files []*File, ok bool)

Get returns the list of files for sum. ok will be false if s does not contain any files for sum, true otherwise.

func (*Sums) Range

func (s *Sums) Range(f func(sum Sum, files []*File) bool)

Range calls f sequentially for each sum and set of files present in s. If f returns false, Range stops the iteration. If s is modified concurrently, Range may reflect any mapping for a given key during the Range call.

func (*Sums) Stats

func (s *Sums) Stats() Stats

Stats reports the number of files, bytes, duplicate files, and duplicate bytes examined.

func (*Sums) WriteAllDup

func (s *Sums) WriteAllDup(w io.Writer) (err error)

WriteAllDup writes a summary of duplicate files and their checksums to w in the following format:

da39a3ee5e6b4b0d3255bfef95601890afd80709:
- "/path/to/file1"
- "/path/to/file2"
...

Directories

Path Synopsis
cmd
Package filesys provides an abstraction for working with file systems, mainly to facilitate testing.
Package filesys provides an abstraction for working with file systems, mainly to facilitate testing.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL