indexer

command
v1.0.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 25, 2018 License: GPL-2.0 Imports: 6 Imported by: 0

README

timefind_indexer
================

`timefind_indexer' reads in a configuration file describing a source and outputs an
index in CSV format containing a list of filenames, timestamp of the earliest
record, timestamp of the latest record, and the time that the file was last
modified.

Using `timefind' in conjunction with these indexes, a user can downselect the
number of files based on a time range.

Dependencies and Building
=========================

1. Build configuration file. (e.g., SOURCENAME.conf.json)
   See [Single Source Configuration File].

2. Run timefind_indexer.

    ./timefind_indexer -h

```
    Usage: timefind_indexer [-huv] [-c PATH]
      -c, --config=PATH  Path to configuration file (can be used multiple times)
      -h, --help         Show this help message and exit
      -u, --unixtime     write Unix time to indexes instead of RFC 3339
      -v, --verbose      Verbose progress indicators and messages
```

After building your configuration file, you can run the timefind_indexer:

    ./timefind_indexer -c SOURCENAME.conf.json

Single Source Configuration File
================================

Each distinct data source requires its own configuration file.
The name of the configuration file (or source) will be the name of the index:

    source name => source configuration filename => index filename
    dns         => dns.conf.json                 => dns.csv

Note that the configuration filename MUST end in ".conf.json".

Some example valid configuration filenames:

    dns.conf.json
    great_pcap.conf.json
    http_traffic.conf.json

A basic source file for DNS data (named "dns.conf.json") might look like this:

    {
        "indexDir": "/index/pcap",
        "type": "pcap",
        "paths": ["/data/pcap"],
        "include": ["*.gz"],
        "exclude": []
    }

The index directory ("indexDir") is where the indexed data will be stored. 
After the timefind_indexer has finished running, the indexes can be found in the 
in a .csv file located in "indexDir".

The index filename is the same as the source name.

Recursive directory support: When the timefind_indexer is run, the directory structure
that is traversed when indexing "paths" is created in the "indexDir." One index
file is created per directory. For directories containing subdirectories with
an index file in them, the index file in the subdirectory is indexed and
written to the directory currently being indexed.

For example, if we use the above configuration file ("dns.conf.json"), and the
directory structure is as follows:

    /data/pcap/
    /data/pcap/a/
    /data/pcap/a/b/
    /data/pcap/c/

The indexes will be generated in the following format:

    /index/pcap/example.csv
    /index/pcap/a/example.csv
    /index/pcap/a/b/example.csv
    /index/pcap/c/example.csv

See [Index Format] for more details.

Each source config file has the components "type", "paths", "include",
and "exclude". 

"type" depends on the file format and which dates you wish to record from each
file. See [Data Types and Processors] for the types of data that the timefind_indexer
supports. If you don't see your data type listed, you will probably have to
write a processor for it.

"paths" is one or more filepaths containing files that you wish to index. 

"include" is a file pattern that specifies which files you wish to index. 
"exclude" is a file pattern specifies which files you do not want indexed.

Index Format
============

Indexes are in CSV format:

    filename,begin_timestamp,end_timestamp,last_modified_time

Timestamps are in Unix timestamp format with nanosecond precision.

Recursive directory support: An index can contain entries that are files
(absolute path) or directories (relative path). An index entry that is a
directory is a pointer to the existence of an index within that
directory and the time range it covers.

A sample index file "pcap.csv":

    2010-01-01,            100, 199, 9999
    2010-01-02,            200, 299, 9999
    /data/pcap/example.gz, 300, 302, 9999

Since /data/pcap/example.gz is an absolute path, that entry references a
particular file.

"2010-01-01" and "2010-01-02" are directories, which means we need to look
into "2010-01-01/pcap.csv" and "2010-01-02/pcap.csv" to potentially pull
out files in a given time range.

The index file "2010-01-01/pcap.csv" might look something like:

    /data/pcap/2010-01-01/ab1.gz, 100, 105, 9999
    /data/pcap/2010-01-01/cd2.gz, 103, 107, 9999
    /data/pcap/2010-01-01/ef3.gz, 180, 199, 9999
    another_directory,            150, 180, 9999

Again, after searching this index, if an index entry that matches our
desired time range is a directory (denoted by a relative path), we
traverse to that directory's index and recursively process until we find
the matching file entries, if any.

Data Types and Processors
=========================

The timefind_indexer reads data files and indexes the earliest and latest time found in
each file. It has the ability to index data classified under the following
categories:

1. "cpp": 
  
    Unix timestamp is the first number listed on each line. Stores timestamp as
    a string and parses it to time.

2. "bomgar":
    Searches for the expression "when='Unix timestamp'" on each line. Stores
    timestamp as a string and parses it to time.

3. "bluecoat":
    Searches for a date of the format "YYYY-MM-DD HH:MM:SS" on each line.
    Stores date as a string and parses it to time.

4. "codevision": 
    Searches for the expression "timestamp=YYYY-MM-DDTHH:MM:SS-ZZ:ZZ" on each
    line.  Stores date listed inside the expressison as a string and parses it
    to time.

5. "cer":
    Searches for the expression "receieved='YYYY-MM-DD HH:MM:SS.SSSSSS-ZZ:ZZ'"
    on each line. Stores date listed inside the expression as a string and
    parses it to time.

6. "sep": 
    Searches for the expression "Event Time: YYYY-MM-DD HH:MM:SS" on each line.
    Stores date listed inside the expression as a string and parses it to time.
    If the expression is not found, timefind_indexer searches for the expression "Begin:
    YYYY-MM-DD HH:MM:SS" on each line. The date listed inside the expression is
    stored as a string and is parsed to a time. If the expression is not found,
    timefind_indexer uses the time listed at the beginning of each line. This time is
    either of the format "Jan 2 2006 15:04:05" or the format "Jan 2 15:04:05"

7. "juniper":
    Searches for a date of the format "YYYY-MM-DD HH:MM:SS" on each line.
    Stores date as a string and parses it to time. If a date of this format is
    not found, timefind_indexer uses the time listed at the beginning of each line. This
    time is either of the format "Jan 2 2006 15:04:05" or the format "Jan 2
    15:04:05"

8. "email":
    Searches for the expression "[DATETIME]YYYY.MM.DD HH:MM:SS.SSSSSSS" on each
    line.  Stores date listed inside the expression as a string and parses it
    to time. 

9. "text": 
    Stores the time listed at the beginning of each line as a string and parses
    it to time. This time is either of the format "Jan 2 2006 15:04:05 or the
    format "Jan 2 15:04:05"

10. "snare":
    Searches for a date of the format "Mon Jan 02 15:04:05 2006" on each line.
    Stores date as a string and parses it to time. If a date of this format is
    not found, the time listed at the beginning of each line is used. This time
    is of the format "YYYY-MM-DDTHH:MM:SS-ZZZZ"

11. "iod":
    Searches for a date of the format "YYYY-MM-DDTHH:MM:SS-ZZZZ" on each line.
    Stores date as a string and parses it to time.

12. "win_messages":
    Searches for a date of the format "Mon Jan 2 15:04:05 2006" on each line.
    If a date of this format is not found, timefind_indexer searches for a date of the
    format "YYYY-MM-DDTHH:MM:SS-ZZ:ZZ" on each line. Stores date as a string
    and parses it to time.

13. "wireless":
    Searches for the expression "Time=YYYY-MM-DDTHH:MM:SS" on each line. Stores
    date listed inside the expression as a string and parses it to time. If the
    expression is not found, the date listed at the beginning of each line is
    used. This time is either of the format "Jan 2 15:04:05 2006" or the format
    "Jan 2 15:04:05"

14. "stealthwatch":
    Searches for a date of the format "YYYY-MM-DDTHH:MM:SS" on each line.
    Stores the date listed inside the expression as a string and parses it to
    time.

15. "pcap":
    Retrieves time found in pcap file type

16. "fsdb_time_col_1":
    Retrieves time found in the *first* column of an fsdb-formatted,
    tab-delimited file.  At the moment, this timefind_indexer does not read the fsdb
    header; it simply ignores it (along with any comments).

    If you're getting errors with reading timestamps, check to make sure the
    file is tab-delimited.

16. "fsdb_time_col_2":
    Retrieves time found in the *second* column of an fsdb-formatted,
    tab-delimited file. See "fsdb_time_col_1" for additional details.

Documentation

The Go Gopher

There is no documentation for this package.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL