mimemagic

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 25, 2021 License: GPL-2.0 Imports: 9 Imported by: 16

README

mimemagic

GoDoc Build Status Codecov Go Report Card

Powerful and versatile MIME sniffing package using pre-compiled glob patterns, magic number signatures, xml document namespaces, and tree magic for mounted volumes, generated from the XDG shared-mime-info database.

License

The generated code in magicsigs.go, globs.go, treemagicsigs.go, namespaces.go and mediatypes.go makes this a derivative work of shared-mime-info, and therefore falls under the GPL-2.0-or-later license. See the discussion.

For an MIT licensed branch with the generated code removed, please see this. Providing a hypothetical permissively licensed freedesktop.org.xml file for parsing is still required, to redistribute the compiled executable with that license, however.

Features

  • All in native go, no outside dependencies/C library bindings
  • 1003 MIME types, with a description, an acronym (where available), common aliases, extensions, icons, and subclasses
  • 493 magic signature tests (comprising of 1147 individual patterns), featuring range searches and bit masks, as per the xdg specification
  • 1099 glob patterns, for filename-based matching
  • 11 Tree Magic signatures and 28 XML namespace/local name pairs, offered for completeness' sake.
  • Included is the xml file parser to generate your own MIME definitions
  • Also included is a CLI based on this library that is fully featured and blazing-fast, beating the native 'file' and KDE's 'kmimetypefinder' in performance
  • Cross-platform support

Installation

The library:

go get github.com/zRedShift/mimemagic

The CLI:

go get github.com/zRedShift/mimemagic/cmd/mimemagic

API

See the Godoc reference, and cmd/mimemagic for an example implementation.

Usage

The library:

package main

import (
	"fmt"
	"github.com/zRedShift/mimemagic"
	"strings"
)

func main() {
	// Ignoring Read errors that might arise
	mimeType, _ := mimemagic.MatchFilePath("sample.svgz", -1)

	// image/svg+xml-compressed
	fmt.Println(mimeType.MediaType())

	// compressed SVG image
	fmt.Println(mimeType.Comment)

	// SVG (Scalable Vector Graphics)
	fmt.Printf("%s (%s)\n", mimeType.Acronym, mimeType.ExpandedAcronym)

	// application/gzip
	fmt.Println(strings.Join(mimeType.SubClassOf, ", "))

	// .svgz
	fmt.Println(strings.Join(mimeType.Extensions, ", "))

	// This is an image.
	switch mimeType.Media {
	case "image":
		fmt.Println("This is an image.")
	case "video":
		fmt.Println("This is a video file.")
	case "audio":
		fmt.Println("This is an audio file.")
	case "application":
		fmt.Println("This is an application.")
	default:
		fmt.Printf("This is a(n) %s.", mimeType.Media)
	}

	// true
	fmt.Println(mimeType.IsExtension(".svgz"))
}

The CLI:

Usage: mimemagic [options] <file> ...
Determines the MIME type of the given file(s).

Options:
  -c    Determine the MIME type of the file(s) using only its content.
  -f    Determine the MIME type of the file(s) using only the file name. Does
        not check for the file's existence. The -c
         flag takes precedence.
  -i    Output the MIME type in a human readable format.
  -l int
        The number of bytes from the beginning of the file mimemagic will
        examine. Reads the entire file if set to a negative value. By default
        mimemagic will only read the first 512 from stdin, however setting this
        flag to a non-default negative value will override this. (default -1)
  -t    Determine the MIME type of the directory/mounted volume using tree
        magic. Can't be used in conjunction with with -c, -f or -x.
  -x    Determine the MIME type of the xml file(s) using the local names and
        namespaces within. Can't be used in conjunction with -c, -f or -t.

Arguments:
  file
        The file(s) to test. '-' to read from stdin. If '-' is set, all other
        inputs will be ignored.

Examples:
  $ mimemagic -c sample.svgz
    	application/gzip
  $ mimemagic *.svg*
    	Olympic_rings_with_transparent_rims.svg: image/svg+xml
    	Piano.svg.png: image/png
    	RAID_5.svg: image/svg+xml
    	sample.svgz: image/svg+xml-compressed
  $ cat /dev/urandom | mimemagic -
    	application/octet-stream
  $ ls software; mimemagic -i -t software/
    	autorun
    	UNIX software

Benchmarks

See Benchmarks. For Match(), the average across over 400 completely different files (representing a unique MIME type each) is 13 ± 7 μs/op. For MatchGlob() it's 900 ± 200 ns/op, and for 12 ± 7 μs/op MatchMagic().

Documentation

Overview

Package mimemagic implements MIME sniffing using pre-compiled glob patterns, magic number signatures, xml document namespaces, and tree magic for mounted volumes, generated from the XDG shared-mime-info database.

To generate your own database simply remove the leading space, point to the directory with freedesktop.org package files (freedesktop.org.xml, if it exists, is always processed first and Override.xml is always processed last), and run go generate:

go:generate go run github.com/zRedShift/mimemagic/cmd/parser /usr/share/mime/packages

To use the default freedesktop.org.xml file provided in this package:

go:generate go run github.com/zRedShift/mimemagic/cmd/parser cmd/parser

globs.go is generated unformatted so it's a good idea to run this for your OCD

go:generate go fmt globs.go

Index

Constants

View Source
const (
	// Default behaviour relies on MatchGlob if it returns a sole
	// match, else it defers to the first magic match.
	Default = iota
	// Magic prefers MatchMagic in case of a contention.
	Magic
	// Glob prefers MatchGlob in case of a contention.
	Glob
)

Variables

This section is empty.

Functions

This section is empty.

Types

type MediaType

type MediaType struct {
	Media, Subtype, Comment, Acronym, ExpandedAcronym, Icon, GenericIcon string
	Alias, SubClassOf, Extensions                                        []string
	// contains filtered or unexported fields
}

MediaType stores all the parsed values of a MIME type within a shared-mime-info package.

func Match

func Match(data []byte, filename string, preference ...int) MediaType

Match determines the MIME type of the file in a byte slice form with a given filename. Anonymous buffers should use MatchMagic. Preference is an optional value that allows to prioritize glob/magic matching in case of a contention. Contention is when both magic and glob matches are found, but they can't be reconciled via aliases or subclasses.

func MatchFile

func MatchFile(f *os.File, limAndPref ...int) (MediaType, error)

MatchFile is an *os.File convenience wrapper for MatchReader.

func MatchFilePath

func MatchFilePath(path string, limAndPref ...int) (m MediaType, err error)

MatchFilePath is a file path convenience wrapper for MatchReader.

func MatchGlob

func MatchGlob(filename string) MediaType

MatchGlob determines the MIME type of the file using exclusively its filename.

func MatchMagic

func MatchMagic(data []byte) MediaType

MatchMagic determines the MIME type of the file in byte slice form. For an io.Reader wrapper see MatchReader (blank filename).

func MatchReader

func MatchReader(r io.Reader, filename string, limAndPref ...int) (MediaType, error)

MatchReader is an io.Reader wrapper for Match that can be supplied with a filename, a limit on the data to read and whether to prefer any of the matching methods in case of a contention. Negative or non-existent values of limit will read the file up until the longest magic signature in the database.

func MatchTreeMagic

func MatchTreeMagic(path string) (MediaType, error)

MatchTreeMagic determines if the path or the directory of the file supplied in the path matches any common mounted volume signatures and returns their x-content MIME type. Return inode/directory MediaType in the case of a negative identification for a directory, and application/octet-stream in the case of a file.

func MatchXML

func MatchXML(data []byte) MediaType

MatchXML determines the MIME type of the xml file in a byte slice form. Returns application/octet-stream in case the file isn't a valid xml and application/xml if the identification comes back negative.

func MatchXMLReader

func MatchXMLReader(r io.Reader, limit int) MediaType

MatchXMLReader is an io.Reader wrapper for MatchXML that can be supplied with a limit on the data to read.

func (MediaType) IsExtension

func (m MediaType) IsExtension(ext string) bool

IsExtension checks if the extension ext is associated with the MIME type. The extension should begin with a leading dot, as in ".html".

func (MediaType) MediaType

func (m MediaType) MediaType() string

MediaType returns the MIME type in the format of the MIME spec.

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL