siegfried

package module
v1.6.4 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 5, 2016 License: Apache-2.0 Imports: 22 Imported by: 0

README

Siegfried

Siegfried is a signature-based file format identification tool, implementing:

  • the National Archives UK's PRONOM file format signatures
  • freedesktop.org's MIME-info file format signatures
  • the Library of Congress's FDD file format signatures (beta).
Version

1.6.4

Build Status GoDoc Go Report Card

Usage

Command line
sf file.ext
sf DIR
Options
sf -csv file.ext | DIR                     // Output CSV rather than YAML
sf -json file.ext | DIR                    // Output JSON rather than YAML
sf -droid file.ext | DIR                   // Output DROID CSV rather than YAML
sf -                                       // Read list of files piped to stdin
sf -nr DIR                                 // Don't scan subdirectories
sf -z file.zip | DIR                       // Decompress and scan zip, tar, gzip, warc, arc
sf -hash md5 file.ext | DIR                // Calculate md5, sha1, sha256, sha512, or crc hash
sf -sig custom.sig file.ext                // Use a custom signature file
sf -home c:\junk -sig custom.sig file.ext  // Use a custom home directory
sf -serve hostname:port                    // Server mode
sf -version                                // Display version information
sf -throttle 10ms DIR                      // Pause for duration (e.g. 1s) between file scans
sf -log [comma-sep opts] file.ext | DIR    // Log errors etc. to stderr (default) or stdout
sf -log e,w file.ext | DIR                 // Log errors and warnings to stderr
sf -log u,o file.ext | DIR                 // Log unknowns to stdout
sf -log d,s file.ext | DIR                 // Log debugging and slow messages to stderr
sf -log p,t DIR > results.yaml             // Log progress and time while redirecting results
Example

asciicast

Signature files

By default, siegfried uses the latest PRONOM signatures without buffer limits (i.e. it may do full file scans). To use MIME-info or LOC signatures, or to add buffer limits or other customisations, use the roy tool to build your own signature file.

Install

With go installed:
go get github.com/richardlehane/siegfried/cmd/sf

sf -update
Or, without go installed:
Win:

Download a pre-built binary from the releases page. Unzip to a location in your system path. Then run:

sf -update
Mac Homebrew (or Linuxbrew):
brew install mistydemeo/digipres/siegfried

Or, for the most recent updates, you can install from this fork:

brew install richardlehane/digipres/siegfried
Ubuntu/Debian (64 bit):
wget -qO - https://bintray.com/user/downloadSubjectPublicKey?username=bintray | sudo apt-key add -
echo "deb http://dl.bintray.com/siegfried/debian wheezy main" | sudo tee -a /etc/apt/sources.list
sudo apt-get update && sudo apt-get install siegfried

Changes

v1.6.4 (2016-09-05)
Added
  • roy inspect FMT command now inspects sets e.g. roy inspect @pdfa
  • roy inspect priorities command generates graphs of priority relations
Fixed
Changed
  • use fwac rather than wac package for performance
  • roy inspect FMT command speed up by building without reports and without the doubles filter
  • -reports flag removed for roy harvest and roy build commands
  • -reports flag changed for roy inspect command, now a boolean that, if set, will cause the signature(s) to be built from the PRONOM report(s), rather than the DROID XML file. This is slower but can be a more accurate representation.

Rights

Copyright 2016 Richard Lehane

Licensed under the Apache License, Version 2.0

Announcements

Join the Google Group for updates, signature releases, and help.

Contributing

Like siegfried and want to get involved in its development? That'd be wonderful! There are some notes on the wiki to get you started, and please get in touch.

Thanks

Thanks TNA for http://www.nationalarchives.gov.uk/pronom/ and http://www.nationalarchives.gov.uk/information-management/projects-and-work/droid.htm

Thanks Ross for https://github.com/exponential-decay/skeleton-test-suite-generator and http://exponentialdecay.co.uk/sd/index.htm, both are very handy!

Thanks Misty for the brew and ubuntu packaging

Documentation

Overview

Package siegfried identifies file formats

Example:

s, err := siegfried.Load("pronom.sig")
if err != nil {
	log.Fatal(err)
}
f, err := os.Open("file")
if err != nil {
	log.Fatal(err)
}
defer f.Close()
c, err := s.Identify("filename", f)
if err != nil {
	log.Fatal(err)
}
for id := range c {
	fmt.Println(id)
}

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Siegfried

type Siegfried struct {
	C time.Time // signature create time
	// contains filtered or unexported fields
}

Siegfried structs are persisent objects that can be serialised to disk and used to identify file formats. They contain three matchers as well as a slice of identifiers. When identifiers are added to a Siegfried struct, they are registered with each matcher.

func Load

func Load(path string) (*Siegfried, error)

Load creates a Siegfried struct and loads content from path

func New

func New() *Siegfried

New creates a new Siegfried struct. It initializes the three matchers.

Example:

s := New()
p, err := pronom.New() // create a new PRONOM identifier
if err != nil {
	log.Fatal(err)
}
err = s.Add(p) // add the identifier to the Siegfried
if err != nil {
	log.Fatal(err)
}
err = s.Save("pronom.sig") // save the Siegfried

func (*Siegfried) Add

func (s *Siegfried) Add(i core.Identifier) error

Add adds an identifier to a Siegfried struct.

func (*Siegfried) Blame

func (s *Siegfried) Blame(idx, ct int, cn string) string

Blame checks with the byte matcher to see what identification results subscribe to a particular result or test tree index. It can be used when identifying in a debug mode to check which identification results trigger which strikes.

func (*Siegfried) Buffer

func (s *Siegfried) Buffer() *siegreader.Buffer

Buffer returns the last buffer inspected This prevents unnecessary double-up of IO e.g. when unzipping files post-identification.

func (*Siegfried) Fields

func (s *Siegfried) Fields() [][]string

Fields returns a slice of the names of the fields in each identifier.

func (*Siegfried) Identify

func (s *Siegfried) Identify(r io.Reader, name, mime string) (chan core.Identification, error)

Identify identifies a stream or file object. It takes the name of the file/stream (if unknown, give an empty string) and an io.Reader It returns a channel of identifications and an error.

func (*Siegfried) Inspect

func (s *Siegfried) Inspect(t core.MatcherType) string

Inspect returns a string containing detail about the various matchers in the Siegfried struct.

func (*Siegfried) JSON

func (s *Siegfried) JSON() string

JSON representation of a Siegfried struct. This is the provenace block at the beginning of sf results and includes descriptions for each identifier.

func (*Siegfried) Save

func (s *Siegfried) Save(path string) error

Save persists a Siegfried struct to disk (path)

func (*Siegfried) String

func (s *Siegfried) String() string

String representation of a Siegfried struct

func (*Siegfried) Update

func (s *Siegfried) Update(t string) bool

Update checks whether a Siegfried struct is due for update, by testing whether the time given is after the time the signature was created.

func (*Siegfried) YAML

func (s *Siegfried) YAML() string

YAML representation of a Siegfried struct. This is the provenace block at the beginning of sf results and includes descriptions for each identifier.

Directories

Path Synopsis
cmd
roy
sf
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
Package config sets up defaults used by both the SF and roy tools Config options can be overridden with build flags e.g.
pkg
core
Package core defines the Identifier, Identification, Recorder, Matcher and Result interfaces.
Package core defines the Identifier, Identification, Recorder, Matcher and Result interfaces.
core/bytematcher
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
Package bytematcher builds a matching engine from a set of signatures and performs concurrent matching against an input siegreader.Buffer.
core/bytematcher/frames
Package frames describes the Frame interface.
Package frames describes the Frame interface.
core/bytematcher/frames/tests
Package tests exports shared frames and signatures for use by the other bytematcher packages
Package tests exports shared frames and signatures for use by the other bytematcher packages
core/bytematcher/patterns
Package patterns describes the Pattern interface.
Package patterns describes the Pattern interface.
core/bytematcher/patterns/tests
Package tests exports shared patterns for use by the other bytematcher packages
Package tests exports shared patterns for use by the other bytematcher packages
core/persist
Package persist marshals and unmarshals siegfried signatures as binary data
Package persist marshals and unmarshals siegfried signatures as binary data
core/priority
Package priority creates a subordinate-superiors map of identifications.
Package priority creates a subordinate-superiors map of identifications.
core/siegreader
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
Package siegreader implements multiple independent Readers (and ReverseReaders) from a single Buffer.
loc
pronom
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
Define custom patterns (implementing the siegfried.Pattern interface) for the different patterns allowed by the PRONOM spec.
pronom/mappings
This file contains struct mappings to unmarshal three different PRONOM XML formats: the signature file format, the report format, and the container format
This file contains struct mappings to unmarshal three different PRONOM XML formats: the signature file format, the report format, and the container format

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL