filetypestats

package module
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jul 22, 2023 License: EUPL-1.2 Imports: 14 Imported by: 0

README

filetypestats

About

This internal gitlab repo is publicly mirrored on github.com/Rainc1oud.

filetypestats scans directories to index all files into an sqlite database, which can then be queried using globs to get summary statistics about type and size.

Background

TODO update

Combines github.com/karrick/godirwalk with (a modified fork of) github.com/h2non/filetype to produce a dictionary with file classes ("video", "audio", ...) as keys and filecount and total size as values.

A slice of root folders to scan can be given as input (this list will be sanitized to remove overlap), and the statistics are returned as aggregated output per file class.

For performance reasons, scanning has been modified to store results in an sqlite database, and a normal query will be done on the DB, not perform a new scan. To keep the DB up to date without doing frequent rescans, recursive inotify is used.

There are several go libs that wrap inotify:

For x-platform, we first try notify, if it is too resource-hungry, we may have to switch, since for now the main use case is NAS.

Changelog (anecdotal)

v0.4.0

Refactor to get rid of redundant keeping of dir status, which was bad for robustness and maintainability.

The single source of truth regarding watched dirs is TDirMonitors which is map[string]*TDirMonitor, where TDirMonitor is a simple composition of NotifyWatcher with some state info that needs to be kept per watcher but isn't supported by the NotifyWatcher itself.

TDirMonitors is responsible for managing notify watcher processes, but the event notification handler is provided by TreeStatsWatcher, as well as any other functions that need to access the DB or coordinate TDirMonitors.

v0.3.4

Return struct changed from a map of dirs (which was not actually used as such) to FileTypeStats, which looks like this:

type FTypeStat struct {
	Path      string
	FType     string
	NumBytes  uint64
	FileCount uint
}

// FileTypeStats is a map from type (same as FTypeStat.FType) to FTypeStat
type FileTypeStats map[string]*FTypeStat

The FType field contains one of <filetype> from h2non/filetype/kind.go (in lowercase), plus the special keys 'dir' and 'total', which all are keys to FileTypeStats.

For 'dir' NumBytes is always 0.

Path has the following values:

  • absolute path of a file for keys (kind or "category") where FileCount == 1 and the query contains only one directory
  • <path>/* if query contains only one directory
  • otherwise *
Breaking Changes v0.3.0

Version 0.3.0 probably has breaking changes

Documentation

Index

Constants

View Source
const Version = "v0.8.0"

Version exposes the current package version.

Variables

This section is empty.

Functions

func WalkFileTypeStatsDB

func WalkFileTypeStatsDB(scanDirs []string, dbfile string) (types.FileTypeStats, error)

Types

type TDirMonitor

type TDirMonitor struct {
	notifywatch.NotifyWatcher // embed NotifyWatcher, because TDirMonitor is just a Watcer with added state and access/info methods
	// contains filtered or unexported fields
}

type TDirMonitors

type TDirMonitors map[string]*TDirMonitor

func NewDirMonitors

func NewDirMonitors() *TDirMonitors

NewDirMonitors constructor

func (*TDirMonitors) AddDir

func (dm *TDirMonitors) AddDir(dir string, recursive bool, handler notifywatch.NotifyHandlerFun, events ...notify.Event) *TDirMonitor

AddDir adds dir to the DirMonitors collection with a new DirMonitor instance, while removing all overlapping dirs

func (*TDirMonitors) Contains

func (dm *TDirMonitors) Contains(dir string) bool

Contains returns whether dir is contained in the registered dirs

func (*TDirMonitors) Dirs

func (dm *TDirMonitors) Dirs() []string

Dirs returns a slice of all registered dirs

func (*TDirMonitors) IsDirty

func (dm *TDirMonitors) IsDirty(dir string) bool

IsDirty reports dirty status, i.e. if the DB for dir is up to date or being updated

func (*TDirMonitors) RemoveDir

func (dm *TDirMonitors) RemoveDir(dir string) error

RemoveDir removes dir from the container

func (*TDirMonitors) RemoveDirs

func (dm *TDirMonitors) RemoveDirs(dirs ...string) error

RemoveDirs removes dirs from the container

func (*TDirMonitors) ScanFinish

func (dm *TDirMonitors) ScanFinish(dir string)

ScanFinish updates finished time for dir

func (*TDirMonitors) ScanFinished

func (dm *TDirMonitors) ScanFinished(dir string) time.Time

ScanFinished returns the time the last scan was started

func (*TDirMonitors) ScanRunning

func (dm *TDirMonitors) ScanRunning(dir string) bool

ScanRunning reports whether a ssscan on dir is currently running

func (*TDirMonitors) ScanStart

func (dm *TDirMonitors) ScanStart(dir string)

ScanFinish updates start time for dir

func (*TDirMonitors) ScanStarted

func (dm *TDirMonitors) ScanStarted(dir string) time.Time

ScanStarted returns the time the last scan was started

func (*TDirMonitors) Status

func (dm *TDirMonitors) Status() *TDirMonitorsStatus

type TDirMonitorsStatus

type TDirMonitorsStatus struct {
	Dirty            bool
	ScanStartedLast  time.Time
	ScanFinishedLast time.Time
	ScanLongestLast  time.Duration // the longest duration of all last dir scans
}

type TreeStatsWatcher

type TreeStatsWatcher struct {
	TDirMonitors // embed this map, because a TreeStatsWatcher is just TDirMonitors with added state
	// contains filtered or unexported fields
}

func NewTreeStatsWatcher

func NewTreeStatsWatcher(dirs []string, dbconn *ftsdb.FileTypeStatsDB) (*TreeStatsWatcher, error)

NewTreeStatsWatcher is the top level constructor featuring:

  • a recursive watcher and scanner for all files in the given param dirs
  • a sqlite DB session (param database: file name)

An instance is always returned, even if an error occurred dirs will be trimmed of trailing suffixes and evaluated recursively If dirs is empty, you can add watches later with AddWatch() or AddDir()

func (*TreeStatsWatcher) AddWatch

func (tsw *TreeStatsWatcher) AddWatch(dirs ...string) error

AddWatch adds a (default) watch for the given dirs Default means: recursive and for events notify.InCreate, notify.InModify, notify.InMovedFrom, notify.InMovedTo, notify.Remove For a customised watch, use AddDir()

func (*TreeStatsWatcher) ScanAllSync

func (tsw *TreeStatsWatcher) ScanAllSync() error

ScanSync does a full scan over all registered dirs synchronously and updates the database This can take a long time (minutes to hours) to complete

func (*TreeStatsWatcher) ScanDir

func (tsw *TreeStatsWatcher) ScanDir(dir string) error

scanDir scans the given dir recursively and updates the database This can take a long time (minutes to hours) to complete

func (*TreeStatsWatcher) ScanDirAsync

func (tsw *TreeStatsWatcher) ScanDirAsync(dir string) error

ScanDirAsync scans dir asynchronously TODO: add channel to make interuption possible?

func (*TreeStatsWatcher) ScanDurationLast

func (tsw *TreeStatsWatcher) ScanDurationLast() time.Duration

func (*TreeStatsWatcher) StartWatcher

func (tsw *TreeStatsWatcher) StartWatcher(dir string) error

StartWatcher starts the dir watcher in the background (or returns an error if not available)

func (*TreeStatsWatcher) StopWatchAll

func (tsw *TreeStatsWatcher) StopWatchAll() error

StopAll stops all registered dirs with the notify watcher

func (*TreeStatsWatcher) StopWatcher

func (tsw *TreeStatsWatcher) StopWatcher(dir string) error

StopWatcher stops and removes the watcher for dir (The DirMonitor is removed entirely, because we have no way to re-start a stopped watcher, so its existence becomes meaningless after stopping)

func (*TreeStatsWatcher) WatchAll

func (tsw *TreeStatsWatcher) WatchAll() error

WatchAll starts all registered dirs with the notify watcher (ignoring already started ones)

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL