duphard

command module
v0.0.0-...-1c0041c Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2020 License: GPL-3.0 Imports: 7 Imported by: 0

README

Usage

duphard is a small utility that detects duplicate files and replaces them with hard links. It is pronounced dup-hard. It has been tested on GNU/Linux and ext4 filesystem.

Duphard is a bit naive and expects you to provide it with files and directories in the same filesystem since only then hard links can work. If you fail to do so, then if you run duphard in non dry-run mode, it will delete one copy of your first duplicate file and exit with an error code.

It is easy to run it. In dry-run (no changes made):

duphard <DIR> [<FILE> <DIR>...]

In this mode it will search recursively for regular files in all paths provided and report to you how many and which are duplicates and an estimate of the space you can save by converting them to hard links.

If you are convinced about the results, you can run it in non-dry mode. This will delete duplicate files and replace them with hard links.

duphard -d=0 <DIR> [<FILE> <DIR>...]

Please remember, hard links are still links, they point to the same inode (data) in the filesystem. Changes to one file will reflect to others sharing the same data. Thus you should use hard links on files that for all purposes are immutable (e.g your media collection) or you really expect to have the same content (e.g a banner across different website directories).

Behind the scenes

Duphard starts by grouping your files by size. Files of the same size go to the same group. Groups with more than one file are obviously duplicate candidates.

Duphard will check if there are already any hard links in each group since it is an easy and quick check and remove them from the list.

Then, for each group with more than one members, it calculates a checksum (md5) for each file and creates a new map where files are grouped by checksum.

This list contains duplicate files and is reported to you. In non dry-run mode, each duplicate file gets deleted first, then hardlinked. If any error occurs while deleting/hardlinking, duphard will immediately stop in order to prevent further errors.

Documentation

Overview

Command godedup finds duplicates files and turns them to hardlinks if possible.

*Be careful*, godedup is dump. It won't check if the files reside on the same filesystem. You are supposed to run it on folders and files in the same filesystem.

Usage, test run, only report what we find:

godedup <DIR> [<FILE> <DIR>...]

Usage, full run, replace duplicates with hard links:

godedup -d=0 <DIR> [<FILE> <DIR>...]

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL