level

package module
v0.0.0-...-7d6cd58 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 26, 2017 License: Zlib Imports: 7 Imported by: 0

README

level

A cross platform FOSS library to scan for duplicate files written in go.

Title inspired by the Japanese anime Toaru Kagaku no Railgun.

Documentation

design

All processing is done synchronously because the bottleneck will always be the persistent storage.

The primary operation (LastOrder) will scan a folder for files, discard all files with no size or which contain excluded segments, group the rest by size, then iterate the groups checking all but the first to ignore hard links using os.SameFile, read the remaining files two at a time in 4K chunks to compare them byte-by-byte.

If run in test mode no further actions will take place, and it is expected that the caller will print the metrics collected and the groups of duplicates so the user may act upon them.

Otherwise, it will perform a weighted sort of each group favoring depth then frequency of directory discarding the first record with the lowest score so the rest may be deleted.

If the file system uses a larger block size than the 4K buffer used by the software it may negatively affect the performance of the software.

usage

Import the library:

import "github.com/cdelorme/level"

Please use the godocs for further instructions.

Installation process:

go get github.com/cdelorme/level/...

tests

Tests can be run via:

go test -v -cover -race

future

  • add intelligent buffer size to detect disk block sizes and use the lowest common denominator

Documentation

Overview

This package provides a utility that scans files and checks for duplicates.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Logger

type Logger interface {
	Error(string, ...interface{})
	Info(string, ...interface{})
	Debug(string, ...interface{})
}

A minimum logger interface with three severities.

type Six

type Six struct {
	Input    string `json:"input,omitempty"`
	Excludes string `json:"excludes,omitempty"`
	Test     bool   `json:"test,omitempty"`
	L        Logger `json:"-"`
	S        Stats  `json:"-"`
	// contains filtered or unexported fields
}

An abstraction to deduplication logic, with a minimal interface.

func (*Six) Delete

func (s *Six) Delete()

Iterate filtered files to delete each, and attempt to clear any empty parent folders recursively.

func (*Six) Filtered

func (s *Six) Filtered() []string

Returns the duplicates marked for deletion.

func (*Six) LastOrder

func (s *Six) LastOrder()

Initializes the metrics system, which sets the start time and clears data.

Ensures the input path is both absolute and clean, parses the supplied excludes, and initializes private maps and slices clearing any former data.

Uses path/filepath.WalkFunc to iterate all files in the input path, and discards any zero-size files, symbolic links, or files matching the list of case-sensitive excludes. It groups the remaining files by size.

Any errors encountered while walking the file system will be logged and then discarded so the program may continue.

Iterates each set of files grouped by size, and two at a time will be checked using os.SameFile to discard hard-links, and then buffered byte-by-byte comparison.

The buffered comparison offers early termination, making it a faster solution than hash checks. Additionally, the code is written to work with the possibility of multiple duplicate groups of the same size.

Files with matching data are put into an unnamed group and appended to the slice of duplicates.

Finally it sorts the groups of duplicates, using a weighted score by depth and then by recurrence of parent path. The file with the lowest score in the group will be kept, and the rest are appended to a single dimensional slice, which can be requested by Filtered and is used by Delete.

type Stats

type Stats interface {
	Add(string, int) int
}

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL