mimetype

package module

v1.2.2 Latest Latest Go to latest Published: Apr 5, 2021 License: MIT Imports: 6 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/dpnwr/mimetype

Links

Open Source Insights

README ¶

mimetype

A package for detecting MIME types and extensions based on magic numbers

No C bindings, zero dependencies and thread safe

Features

fast and precise MIME type and file extension detection
long list of supported MIME types
common file formats are prioritized
small and simple API
handles MIME type aliases
thread safe
low memory usage, besides the file header

Install

go get github.com/gabriel-vasile/mimetype

Usage

mime := mimetype.Detect([]byte)
// OR
mime, err := mimetype.DetectReader(io.Reader)
// OR
mime, err := mimetype.DetectFile("/path/to/file")
fmt.Println(mime.String(), mime.Extension())

See the runnable Go Playground examples.

Structure

mimetype uses an hierarchical structure to keep the MIME type detection logic. This reduces the number of calls needed for detecting the file type. The reason behind this choice is that there are file formats used as containers for other file formats. For example, Microsoft Office files are just zip archives, containing specific metadata files. Once a file a file has been identified as a zip, there is no need to check if it is a text file, but it is worth checking if it is an Microsoft Office file.

To prevent loading entire files into memory, when detecting from a reader or from a file mimetype limits itself to reading only the header of the input.

Performance

Thanks to the hierarchical structure, searching for common formats first, and limiting itself to file headers, mimetype matches the performance of stdlib http.DetectContentType while outperforming the alternative package.

Benchmarks were run on an Intel Xeon Gold 6136 24 core CPU @ 3.00GHz. Lower is better.

                            mimetype  http.DetectContentType      filetype
BenchmarkMatchTar-24       250 ns/op         400 ns/op           3778 ns/op
BenchmarkMatchZip-24       524 ns/op         351 ns/op           4884 ns/op
BenchmarkMatchJpeg-24      103 ns/op         228 ns/op            839 ns/op
BenchmarkMatchGif-24       139 ns/op         202 ns/op            751 ns/op
BenchmarkMatchPng-24       165 ns/op         221 ns/op           1176 ns/op

Contributing

See CONTRIBUTING.md.

Documentation ¶

Overview ¶

Package mimetype uses magic number signatures to detect the MIME type of a file.

Example (Detect) ¶

package main

import (
	"bytes"
	"fmt"
	"os"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	testBytes := []byte("This random text has a MIME type of text/plain; charset=utf-8.")

	mime := mimetype.Detect(testBytes)
	fmt.Println(mime.Is("text/plain"), mime.String(), mime.Extension())

	mime, err := mimetype.DetectReader(bytes.NewReader(testBytes))
	fmt.Println(mime.Is("text/plain"), mime.String(), mime.Extension(), err)

	mime, err = mimetype.DetectFile("a nonexistent file")
	fmt.Println(mime.Is("application/octet-stream"), mime.String(), os.IsNotExist(err))
}

Output:

true text/plain; charset=utf-8 .txt
true text/plain; charset=utf-8 .txt <nil>
true application/octet-stream true

Example (DetectReader) ¶

Pure io.Readers (meaning those without a Seek method) cannot be read twice. This means that once DetectReader has been called on an io.Reader, that reader is missing the bytes representing the header of the file. To detect the MIME type and then reuse the input, use a buffer, io.TeeReader, and io.MultiReader to create a new reader containing the original, unaltered data.

If the input is an io.ReadSeeker instead, call input.Seek(0, io.SeekStart) before reusing it.

package main

import (
	"bytes"
	"fmt"
	"io"
	"io/ioutil"

	"github.com/gabriel-vasile/mimetype"
)

// Pure io.Readers (meaning those without a Seek method) cannot be read twice.
// This means that once DetectReader has been called on an io.Reader, that reader
// is missing the bytes representing the header of the file.
// To detect the MIME type and then reuse the input, use a buffer, io.TeeReader,
// and io.MultiReader to create a new reader containing the original, unaltered data.
//
// If the input is an io.ReadSeeker instead, call input.Seek(0, io.SeekStart)
// before reusing it.
func main() {
	testBytes := []byte("This random text has a MIME type of text/plain; charset=utf-8.")
	input := bytes.NewReader(testBytes)

	mime, recycledInput, err := recycleReader(input)

	// Verify recycledInput contains the original input.
	text, _ := ioutil.ReadAll(recycledInput)
	fmt.Println(mime, bytes.Equal(testBytes, text), err)
}

// recycleReader returns the MIME type of input and a new reader
// containing the whole data from input.
func recycleReader(input io.Reader) (mimeType string, recycled io.Reader, err error) {
	// header will store the bytes mimetype uses for detection.
	header := bytes.NewBuffer(nil)

	// After DetectReader, the data read from input is copied into header.
	mime, err := mimetype.DetectReader(io.TeeReader(input, header))

	// Concatenate back the header to the rest of the file.
	// recycled now contains the complete, original data.
	recycled = io.MultiReader(header, input)

	return mime.String(), recycled, err
}

Output:

text/plain; charset=utf-8 true <nil>

Example (TextVsBinary) ¶

Considering the definition of a binary file as "a computer file that is not a text file", they can differentiated by searching for the text/plain MIME in their MIME hierarchy.

package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	testBytes := []byte("This random text has a MIME type of text/plain; charset=utf-8.")
	detectedMIME := mimetype.Detect(testBytes)

	isBinary := true
	for mime := detectedMIME; mime != nil; mime = mime.Parent() {
		if mime.Is("text/plain") {
			isBinary = false
		}
	}

	fmt.Println(isBinary, detectedMIME)
}

Output:

false text/plain; charset=utf-8

Example (Whitelist) ¶

package main

import (
	"fmt"

	"github.com/gabriel-vasile/mimetype"
)

func main() {
	testBytes := []byte("This random text has a MIME type of text/plain; charset=utf-8.")
	allowed := []string{"text/plain", "application/zip", "application/pdf"}
	mime := mimetype.Detect(testBytes)

	if mimetype.EqualsAny(mime.String(), allowed...) {
		fmt.Printf("%s is allowed\n", mime)
	} else {
		fmt.Printf("%s is now allowed\n", mime)
	}
}

Output:

text/plain; charset=utf-8 is allowed

Index ¶

func EqualsAny(s string, mimes ...string) bool
func SetLimit(limit uint32)
type MIME

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func EqualsAny ¶

func EqualsAny(s string, mimes ...string) bool

EqualsAny reports whether s MIME type is equal to any MIME type in mimes. MIME type equality test is done on the "type/subtype" section, ignores any optional MIME parameters, ignores any leading and trailing whitespace, and is case insensitive.

func SetLimit ¶

func SetLimit(limit uint32)

SetLimit sets the maximum number of bytes read from input when detecting the MIME type. Increasing the limit provides better detection for file formats which store their magical numbers towards the end of the file: docx, pptx, xlsx, etc. A limit of 0 means the whole input file will be used.

Types ¶

type MIME ¶

type MIME struct {
	// contains filtered or unexported fields
}

MIME struct holds information about a file format: the string representation of the MIME type, the extension and the parent file format.

func Detect ¶

func Detect(in []byte) *MIME

Detect returns the MIME type found from the provided byte slice.

The result is always a valid MIME type, with application/octet-stream returned when identification failed.

func DetectFile ¶

func DetectFile(file string) (*MIME, error)

DetectFile returns the MIME type of the provided file.

The result is always a valid MIME type, with application/octet-stream returned when identification failed with or without an error. Any error returned is related to the opening and reading from the input file.

func DetectReader ¶

func DetectReader(r io.Reader) (*MIME, error)

DetectReader returns the MIME type of the provided reader.

The result is always a valid MIME type, with application/octet-stream returned when identification failed with or without an error. Any error returned is related to the reading from the input reader.

DetectReader assumes the reader offset is at the start. If the input is an io.ReadSeeker you previously read from, it should be rewinded before detection:

reader.Seek(0, io.SeekStart)

func (*MIME) Extension ¶

func (m *MIME) Extension() string

Extension returns the file extension associated with the MIME type. It includes the leading dot, as in ".html". When the file format does not have an extension, the empty string is returned.

func (*MIME) Is ¶

func (m *MIME) Is(expectedMIME string) bool

Is checks whether this MIME type, or any of its aliases, is equal to the expected MIME type. MIME type equality test is done on the "type/subtype" section, ignores any optional MIME parameters, ignores any leading and trailing whitespace, and is case insensitive.

func (*MIME) Parent ¶

func (m *MIME) Parent() *MIME

Parent returns the parent MIME type from the hierarchy. Each MIME type has a non-nil parent, except for the root MIME type.

For example, the application/json and text/html MIME types have text/plain as their parent because they are text files who happen to contain JSON or HTML. Another example is the ZIP format, which is used as container for Microsoft Office files, EPUB files, JAR files, and others.

func (*MIME) String ¶

func (m *MIME) String() string

String returns the string representation of the MIME type, e.g., "application/zip".

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
internal
json Package json provides a JSON value parser state machine.	Package json provides a JSON value parser state machine.
matchers Package matchers holds the matching functions used to find MIME types.	Package matchers holds the matching functions used to find MIME types.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL