mimetype

package module
v1.1.2-0...-a675e6a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 6, 2020 License: MIT Imports: 4 Imported by: 0

README

mimetype

A package for detecting MIME types and extensions based on magic numbers

No C bindings, zero dependencies and thread safe

Build Status Documentation Go report card Go report card License

Features

  • fast and precise MIME type and file extension detection
  • long list of supported MIME types
  • common file formats are prioritized
  • small and simple API
  • handles MIME type aliases
  • thread safe
  • low memory usage, besides the file header

Install

go get github.com/gabriel-vasile/mimetype

Usage

There are quick examples and GoDoc for full reference.

Structure

mimetype uses an hierarchical structure to keep the MIME type detection logic. This reduces the number of calls needed for detecting the file type. The reason behind this choice is that there are file formats used as containers for other file formats. For example, Microsoft Office files are just zip archives, containing specific metadata files. Once a file a file has been identified as a zip, there is no need to check if it is a text file, but it is worth checking if it is an Microsoft Office file.

To prevent loading entire files into memory, when detecting from a reader or from a file mimetype limits itself to reading only the header of the input.

structure

Performance

Thanks to the hierarchical structure, searching for common formats first, and limiting itself to file headers, mimetype matches the performance of stdlib http.DetectContentType while outperforming the alternative package.

Benchmarks were run on an Intel Xeon Gold 6136 24 core CPU @ 3.00GHz. Lower is better.

                            mimetype  http.DetectContentType      filetype
BenchmarkMatchTar-24       250 ns/op         400 ns/op           3778 ns/op
BenchmarkMatchZip-24       524 ns/op         351 ns/op           4884 ns/op
BenchmarkMatchJpeg-24      103 ns/op         228 ns/op            839 ns/op
BenchmarkMatchGif-24       139 ns/op         202 ns/op            751 ns/op
BenchmarkMatchPng-24       165 ns/op         221 ns/op           1176 ns/op

Contributing

See CONTRIBUTING.md.

Documentation

Overview

Package mimetype uses magic number signatures to detect the MIME type of a file.

mimetype stores the list of MIME types in a tree structure with "application/octet-stream" at the root of the hierarchy. The hierarchy approach minimizes the number of checks that need to be done on the input and allows for more precise results once the base type of file has been identified.

Example (Check)

To check if some bytes/reader/file has a specific MIME type, first perform a detect on the input and then test against the MIME.

Different from the string comparison, e.g.: mime.String() == "application/zip", mime.Is("application/zip") method has the following advantages: it handles MIME aliases, is case insensitive, ignores optional MIME parameters, and ignores any leading and trailing whitespace.

package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	mime, err := mimetype.DetectFile("testdata/zip.zip")
	// application/x-zip is an alias of application/zip,
	// therefore Is returns true both times.
	fmt.Println(mime.Is("application/zip"), mime.Is("application/x-zip"), err)

}
Output:

true true <nil>
Example (Detect)

To find the MIME type of some input, perform a detect. In addition to the basic Detect,

mimetype.Detect([]byte) *MIME

there are shortcuts for detecting from a reader:

mimetype.DetectReader(io.Reader) (*MIME, error)

or from a file:

mimetype.DetectFile(string) (*MIME, error)
package main

import (
	"fmt"
	"io/ioutil"
	"os"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	file := "testdata/pdf.pdf"

	// Detect the MIME type of a file stored as a byte slice.
	data, _ := ioutil.ReadFile(file) // ignoring error for brevity's sake
	mime := mimetype.Detect(data)
	fmt.Println(mime.String(), mime.Extension())

	// Detect the MIME type of a reader.
	reader, _ := os.Open(file) // ignoring error for brevity's sake
	mime, rerr := mimetype.DetectReader(reader)
	fmt.Println(mime.String(), mime.Extension(), rerr)

	// Detect the MIME type of a file.
	mime, ferr := mimetype.DetectFile(file)
	fmt.Println(mime.String(), mime.Extension(), ferr)

}
Output:

application/pdf .pdf
application/pdf .pdf <nil>
application/pdf .pdf <nil>
Example (TextVsBinary)

Considering the definition of a binary file as "a computer file that is not a text file", they can differentiated by searching for the text/plain MIME in it's MIME hierarchy.

package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	detectedMIME, err := mimetype.DetectFile("testdata/xml.xml")

	isBinary := true
	for mime := detectedMIME; mime != nil; mime = mime.Parent() {
		if mime.Is("text/plain") {
			isBinary = false
		}
	}

	fmt.Println(isBinary, detectedMIME, err)

}
Output:

false text/xml; charset=utf-8 <nil>

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func EqualsAny

func EqualsAny(s string, mimes ...string) bool

EqualsAny reports whether s MIME type is equal to any MIME type in mimes. MIME type equality test is done on the "type/subtype" section, ignores any optional MIME parameters, ignores any leading and trailing whitespace, and is case insensitive.

Example
package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	allowed := []string{"text/plain", "text/html", "text/csv"}
	mime, _ := mimetype.DetectFile("testdata/utf8.txt")

	if mimetype.EqualsAny(mime.String(), allowed...) {
		fmt.Printf("%s is allowed\n", mime)
	} else {
		fmt.Printf("%s is now allowed\n", mime)
	}

}
Output:

text/plain; charset=utf-8 is allowed

Types

type MIME

type MIME struct {
	// contains filtered or unexported fields
}

MIME struct holds information about a file format: the string representation of the MIME type, the extension and the parent file format.

func Detect

func Detect(in []byte) *MIME

Detect returns the MIME type found from the provided byte slice.

The result is always a valid MIME type, with application/octet-stream returned when identification failed.

Example
package main

import (
	"fmt"
	"io/ioutil"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	data, err := ioutil.ReadFile("testdata/zip.zip")
	mime := mimetype.Detect(data)

	fmt.Println(mime.String(), err)

}
Output:

application/zip <nil>

func DetectFile

func DetectFile(file string) (*MIME, error)

DetectFile returns the MIME type of the provided file.

The result is always a valid MIME type, with application/octet-stream returned when identification failed with or without an error. Any error returned is related to the opening and reading from the input file.

To prevent loading entire files into memory, DetectFile reads at most matchers.ReadLimit bytes from the input file.

Example
package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	mime, err := mimetype.DetectFile("testdata/zip.zip")

	fmt.Println(mime.String(), err)

}
Output:

application/zip <nil>

func DetectReader

func DetectReader(r io.Reader) (*MIME, error)

DetectReader returns the MIME type of the provided reader.

The result is always a valid MIME type, with application/octet-stream returned when identification failed with or without an error. Any error returned is related to the reading from the input reader.

DetectReader assumes the reader offset is at the start. If the input is a ReadSeeker you read from before, it should be rewinded before detection:

reader.Seek(0, io.SeekStart)

To prevent loading entire files into memory, DetectReader reads at most matchers.ReadLimit bytes from the reader.

Example
package main

import (
	"fmt"
	"os"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	data, oerr := os.Open("testdata/zip.zip")
	mime, merr := mimetype.DetectReader(data)

	fmt.Println(mime.String(), oerr, merr)

}
Output:

application/zip <nil> <nil>

func (*MIME) Extension

func (m *MIME) Extension() string

Extension returns the file extension associated with the MIME type. It includes the leading dot, as in ".html". When the file format does not have an extension, the empty string is returned.

func (*MIME) Is

func (m *MIME) Is(expectedMIME string) bool

Is checks whether this MIME type, or any of its aliases, is equal to the expected MIME type. MIME type equality test is done on the "type/subtype" section, ignores any optional MIME parameters, ignores any leading and trailing whitespace, and is case insensitive.

Example
package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	mime, err := mimetype.DetectFile("testdata/pdf.pdf")

	pdf := mime.Is("application/pdf")
	xdf := mime.Is("application/x-pdf")
	txt := mime.Is("text/plain")
	fmt.Println(pdf, xdf, txt, err)

}
Output:

true true false <nil>

func (*MIME) Parent

func (m *MIME) Parent() *MIME

Parent returns the parent MIME type from the hierarchy. Each MIME type has a non-nil parent, except for the root MIME type.

For example, the application/json and text/html MIME types have text/plain as their parent because they are text files who happen to contain JSON or HTML. Another example is the ZIP format, which is used as container for Microsoft Office files, EPUB files, JAR files and others.

func (*MIME) String

func (m *MIME) String() string

String returns the string representation of the MIME type, e.g., "application/zip".

Directories

Path Synopsis
internal
json
Package json provides a JSON value parser state machine.
Package json provides a JSON value parser state machine.
matchers
Package matchers holds the matching functions used to find MIME types.
Package matchers holds the matching functions used to find MIME types.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL