sys-file-indexer

command module
v0.0.0-...-9211eec Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 12, 2016 License: BSD-3-Clause Imports: 31 Imported by: 0

README

sys-file-indexer

A custom parallel file indexer and hasher

ABOUT

sys-file-indexer indices the directory specified as last argument or the current directory by default.

sys-file-indexer always outputs the result to stdout.

MODES OF OPERATION

sys-file-indexer has the following modes of operation:

  1. Normal mode: outputs a special CSV file that combines the two datasets to generate and that does not contain unique ID. This needs to be processed further by split mode to be useful. No options are necessary.

    Normal mode can benefit from a previous run if data is supplied with the -delta option. In this case, sys-file-indexer uses the data generated by a previous run whenever the modification time of a file has not changed.

  2. Split mode: split mode takes the file generated with the output for normal mode as input and generates either the CSV for the sys_file dataset or for sys_file_metadata. See options -ofile and -ometa.

  3. SQL mode: outputs readily usable SQL INSERT statements that can be piped directly to the database.

  4. SQL transform mode: reads a normal mode CSV and outputs SQL statements. Can be used to have SQL output and using partitioning (in two steps.)

  5. Single mode: outputs one single CSV dataset. Useful for testing onty.

EXAMPLE

Generate the normal mode CSV output:

$ sys-file-indexer . >../normal.csv

Update a previously generated normal mode CSV:

$ sys-file-indexer -delta=../normal.csv >../new-normal.csv

Split normal mode CSV to generate two datasets:

$ sys-file-indexer -ofile=normal.csv >sys_file.csv
$ sys-file-indexer -ometa=normal.csv >sys_file_metadata.csv

Generate metadata directly into the database (cannot use -delta):

$ sys-file-indexer -sql | mysql ...

Transform a normal-mode CSV into SQL:

$ sys-file-indexer -osql sys_file_metadata.csv | mysql ...

Delta mode and output to SQL (use tee to update the normale file in one go):

$ sys-file-indexer -delta normal.csv | sys-file-indexer -osql - | mysql ...
PARTITIONING

sys-file-indexer can be run on multiple machines if that leads to an increase in I/O throughput.

host1$ sys-file-indexer -w 1 -wg 3 ... > result1.csv
host2$ sys-file-indexer -w 2 -wg 3 ... > result2.csv
host3$ sys-file-indexer -w 3 -wg 3 ... > result3.csv
host1$ cat result1.csv result2.csv result3.csv > result.csv
TODO
  • Can scan multiple directories

Documentation

The Go Gopher

There is no documentation for this package.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL