dups

package module
v0.2.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2020 License: Apache-2.0 Imports: 11 Imported by: 0

README

dups

dups

dups is a CLI tool to find and remove duplicate files using different hash algorithms (MD5, SHA256 and XXHash) with multi-core support.

Install

Download binaries:

Release Page

To use in a go project:

go get github.com/Navid2zp/dups
Usage
CLI

Available Commands:

Command Description
clean Finds duplicate files in a given path and deletes them.
scan Finds duplicate files in a given path but doesn't delete them.
help Help about any command

Flags:

flag Description
--algorithm string algorithm to use (md5/sha256/xxhash) (default "md5")
-f, --flat flat output, no extra info (only prints duplicate files)
-r, --full full search (search in sub-directories too)
--min-size int minimum file size to scan in bytes (default 10)
-s, --single-core use single cpu core

Examples:

Remove duplicates bigger than 1KB using multiple cpu cores:

dups clean path/to/directory --min-size 1024

Find duplicates and write them into file.txt:

dups scan path/to/directory -f >> file.txt

Find and list duplicates using single cpu core and XXHash algorithm:

dups scan path/to/directory -s --algorithm xxhash
Go code:
package main

import (
	"fmt"
	"github.com/Navid2zp/dups"
)

func main()  {
	// list all files including files in any sub-directory
	files, err := dups.GetFiles("path/to/directory", true)
	if err != nil {
		panic(err)
	}
	// collect hashes for all files
	// singleThread: use a single thread
	// flatt: don't print the process bar or any other information
	hashes := dups.CollectHashes(files, false, 1024, dups.XXHash, false)
	duplicates, filesCount, duplicatesCount := dups.GetDuplicates(hashes)
	fmt.Println("total of files with duplicates:", filesCount)
	fmt.Println("total of duplicate files:", duplicatesCount)

	freedSize, deletedCount, err := dups.RemoveDuplicates(duplicates)
	if err != nil {
		panic(err)
	}
	fmt.Println("remove", deletedCount, "files")
	fmt.Println("freed a total of ", freedSize, "bytes")
}
Notes:
  • Use single core option (-s) if files are big (depending on your disk type).
  • Use XXHash algorithm for fast scanning and MD5/SHA256 for safest scanning or if the number of files is huge.
Build from source:

go build -tags multicore if you are building using Go < 1.5 or edit runtime.GOMAXPROCS() manually to support multi-core.

License

Apache

Documentation

Index

Constants

View Source
const (
	XXHash = "xxhash"
	MD5    = "md5"
	SHA256 = "sha256"
)

Variables

This section is empty.

Functions

func CleanPath

func CleanPath(path string) string

CleanPath replaces \ with / in a path

func CollectHashes

func CollectHashes(files []FileInfo, singleThread bool, minSize int, algorithm string, flat bool) map[string][]FileInfo

CollectHashes returns hashes for the given files A hash will be the key and a list of FileInfo for files that share the hash as the value "singleThread=false" will force all the function to use one thread only minSize is the minimum file size to scan "flat=true" will tell the function not to print out any data other than the path to duplicate files algorithm is the algorithm to calculate the hash with

func GetAlgorithm

func GetAlgorithm(al string) string

GetAlgorithm matches the given string to one of the supported algorithms Returns md5 if a match wasn't found

func GetDuplicates

func GetDuplicates(hashes map[string][]FileInfo) ([][]FileInfo, int, int)

GetDuplicates scans the given map of hashes and finds the one with duplicates It will return a slice containing slices with each slice containing paths to duplicate files It will also returns the total of duplicate files and the total of files that have duplicates

func GetFileHash

func GetFileHash(path, algorithm string) (string, error)

GetFileHash returns given file hash using the provided algorithm Default: md5

func RemoveDuplicates

func RemoveDuplicates(fileSets [][]FileInfo) (int, int, error)

RemoveDuplicates removes duplicates It will keep the first file in a duplicate set and removes any other files in the set It will return the sum of deleted file sizes and total number of deleted files

Types

type FileInfo

type FileInfo struct {
	Path string
	Info os.FileInfo
}

func GetFiles

func GetFiles(root string, full bool) ([]FileInfo, error)

GetFiles finds and returns all the files in the given path It will also returns any file in sub-directories if "full=true"

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL