This package provides a utility that scans files and checks for duplicates.
A minimum logger interface with three severities.
An abstraction to deduplication logic, with a minimal interface.
Iterate filtered files to delete each, and attempt to clear any empty parent folders recursively.
Returns the duplicates marked for deletion.
Initializes the metrics system, which sets the start time and clears data.
Ensures the input path is both absolute and clean, parses the supplied excludes, and initializes private maps and slices clearing any former data.
Uses path/filepath.WalkFunc to iterate all files in the input path, and discards any zero-size files, symbolic links, or files matching the list of case-sensitive excludes. It groups the remaining files by size.
Any errors encountered while walking the file system will be logged and then discarded so the program may continue.
Iterates each set of files grouped by size, and two at a time will be checked using os.SameFile to discard hard-links, and then buffered byte-by-byte comparison.
The buffered comparison offers early termination, making it a faster solution than hash checks. Additionally, the code is written to work with the possibility of multiple duplicate groups of the same size.
Files with matching data are put into an unnamed group and appended to the slice of duplicates.
Finally it sorts the groups of duplicates, using a weighted score by depth and then by recurrence of parent path. The file with the lowest score in the group will be kept, and the rest are appended to a single dimensional slice, which can be requested by Filtered and is used by Delete.