level

package module

v0.0.0-...-7d6cd58 Latest Latest Go to latest Published: Jun 26, 2017 License: Zlib Imports: 7 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/cdelorme/level

Links

Open Source Insights

README ¶

level

A cross platform FOSS library to scan for duplicate files written in go.

Title inspired by the Japanese anime Toaru Kagaku no Railgun.

Documentation

design

All processing is done synchronously because the bottleneck will always be the persistent storage.

The primary operation (LastOrder) will scan a folder for files, discard all files with no size or which contain excluded segments, group the rest by size, then iterate the groups checking all but the first to ignore hard links using os.SameFile, read the remaining files two at a time in 4K chunks to compare them byte-by-byte.

If run in test mode no further actions will take place, and it is expected that the caller will print the metrics collected and the groups of duplicates so the user may act upon them.

Otherwise, it will perform a weighted sort of each group favoring depth then frequency of directory discarding the first record with the lowest score so the rest may be deleted.

If the file system uses a larger block size than the 4K buffer used by the software it may negatively affect the performance of the software.

usage

Import the library:

import "github.com/cdelorme/level"

Please use the godocs for further instructions.

Installation process:

go get github.com/cdelorme/level/...

tests

Tests can be run via:

go test -v -cover -race

future

add intelligent buffer size to detect disk block sizes and use the lowest common denominator

Documentation ¶

Overview ¶

This package provides a utility that scans files and checks for duplicates.

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Logger ¶

type Logger interface {
	Error(string, ...interface{})
	Info(string, ...interface{})
	Debug(string, ...interface{})
}

A minimum logger interface with three severities.

type Six ¶

type Six struct {
	Input    string `json:"input,omitempty"`
	Excludes string `json:"excludes,omitempty"`
	Test     bool   `json:"test,omitempty"`
	L        Logger `json:"-"`
	S        Stats  `json:"-"`
	// contains filtered or unexported fields
}

An abstraction to deduplication logic, with a minimal interface.

func (*Six) Delete ¶

func (s *Six) Delete()

Iterate filtered files to delete each, and attempt to clear any empty parent folders recursively.

func (*Six) Filtered ¶

func (s *Six) Filtered() []string

Returns the duplicates marked for deletion.

func (*Six) LastOrder ¶

func (s *Six) LastOrder()

Initializes the metrics system, which sets the start time and clears data.

Ensures the input path is both absolute and clean, parses the supplied excludes, and initializes private maps and slices clearing any former data.

Uses path/filepath.WalkFunc to iterate all files in the input path, and discards any zero-size files, symbolic links, or files matching the list of case-sensitive excludes. It groups the remaining files by size.

Any errors encountered while walking the file system will be logged and then discarded so the program may continue.

Iterates each set of files grouped by size, and two at a time will be checked using os.SameFile to discard hard-links, and then buffered byte-by-byte comparison.

The buffered comparison offers early termination, making it a faster solution than hash checks. Additionally, the code is written to work with the possibility of multiple duplicate groups of the same size.

Files with matching data are put into an unnamed group and appended to the slice of duplicates.

Finally it sorts the groups of duplicates, using a weighted score by depth and then by recurrence of parent path. The file with the lowest score in the group will be kept, and the rest are appended to a single dimensional slice, which can be requested by Filtered and is used by Delete.

type Stats ¶

type Stats interface {
	Add(string, int) int
}

Source Files ¶

View all Source files

level.go

Directories ¶

Path	Synopsis
cmd
accelerator

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL