compressor

package module
v0.0.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2023 License: Apache-2.0 Imports: 36 Imported by: 0

README

compressor

Easy creation and extraction of archives, as well as compression and decompression of files of different formats

Documentation

Index

Constants

View Source
const (
	// Additional compression methods not offered by archive/zip.
	ZipMethodBzip2 = 12
	ZipMethodLzma  = 14
	ZipMethodZstd  = 93
	ZipMethodXz    = 95
)

Variables

View Source
var ZlibHeader = []byte{0x78}

Functions

func FileSystem

func FileSystem(ctx context.Context, root string) (fs.FS, error)

FileSystem opens a file in the root as a read-only file system. The root can be a directory path, an archive file, a compressed archive file, a compressed file, or any other file on disk. If root is a directory, its contents are accessed directly from the disk's file system. If root is an archive file, its contents are accessed as a normal directory. Compressed archive files are transparently decompressed as the contents are accessed. If root is any other file, it is the only file in the filesystem, if the file is compressed, it is transparently decompressed as it is read from it. This method essentially provides uniform read access to different file types: directories, archives, compressed archives, and individual files are treated identically. Except for zip files, FS return values are guaranteed to be of the fs.ReadDirFS and fs.StatFS types, and can also be fs.SubFS.

func RegisterFormat

func RegisterFormat(format Format)

RegisterFormat registers the format. It must be called during init. Duplicate formats by name are not allowed and will cause a panic.

func TopDirOpen

func TopDirOpen(fsys fs.FS, name string) (fs.File, error)

TopDirOpen is a special Open() function, which can be useful if the file system root was created when the archive was extracted. It first tries the file name as given, but if this returns an error, it tries the name without the first path element. In other words, if "a/b/c" returns an error, it will try "b/c" instead. Consider an archive contains a file "a/b/c". When the archive is extracted, its contents may be created without a new parent/root folder for its contents, and the path of the same file outside the archive may not have an exclusive root or parent container. Therefore, the file system created for the same files extracted to disk is likely to be root with respect to one of the top-level files/folders from the archive, not the parent folder. For example, a file known as "a/b/c" when rooted at the archive becomes "b/c" when extracted from the "a" folder on disk (because no new exclusive top-level folder was created). This difference in paths can make it difficult to use archives and directories uniformly. Hence these TopDir* functions, which try to smooth out the difference. Some extraction utilities create a container folder for the contents of the archive when extracting, in which case the user can specify that path as the root path. In this case these TopDir* functions are not necessary (but not harmful either). They are useful primarily if you are not sure whether the root is the archive file or the extracted archive file, because they will work with the same file name/path regardless of whether there is a top-level directory.

func TopDirReadDir

func TopDirReadDir(fsys fs.FS, name string) ([]fs.DirEntry, error)

TopDirReadDir is like TopDirOpen but for ReadDir.

func TopDirStat

func TopDirStat(fsys fs.FS, name string) (fs.FileInfo, error)

TopDirStat is like TopDirOpen but for Stat.

Types

type Archival

type Archival interface {
	Format
	Archiver
	Extractor
}

Archival is an archival format with both archive and extract methods.

type ArchiveFS

type ArchiveFS struct {
	// set one of these:
	Path   string            // path to the archive file on disk
	Stream *io.SectionReader // stream from which to read archive

	Format  Archival        // the archive format
	Prefix  string          // optional subdirectory in which to root the fs
	Context context.Context // optional
}

ArchiveFS allows accessing an archive (or a compressed archive) using a consistent file system interface. Essentially, it allows traversal and read the contents of an archive just like any normal directory on disk. The contents of compressed archives are transparently decompressed. A valid ArchiveFS value should be set either Path or Stream. If Path is set, a literal file will be opened from the disk. If Stream is set, new SectionReaders will be implicitly created to access the stream, providing safe concurrent access.

Because of the Go file system APIs (see io/fs package), tArchiveFS performance when using fs.WalkDir() is low for archives with lots of files. The fs.WalkDir() API requires listing the contents of each directory in turn, and the only way to ensure we return a complete list of folder contents is to traverse the whole archive and build a slice, so if this is done for the root of an archive with many files, performance tends to O(n^2) as the entire archive is walked for each folder that is enumerated (WalkDir calls ReadDir recursively). If you don't want the contents of each directory to be viewed in order, prefer to call Extract() from the archive type directly, this will do an O(n) view of the contents in archive order, rather than the slower directory tree order.

func (ArchiveFS) Open

func (f ArchiveFS) Open(name string) (archiveFile fs.File, err error)

Open opens the named file from the archive. If name is ".", the archive file itself will be opened as a directory file.

func (ArchiveFS) ReadDir

func (f ArchiveFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir reads the named directory from within the archive.

func (ArchiveFS) Stat

func (f ArchiveFS) Stat(name string) (fs.FileInfo, error)

Stat stats the named file from within the archive. If name is "." then the archive file itself is statted and treated as a directory file.

func (*ArchiveFS) Sub

func (f *ArchiveFS) Sub(dir string) (fs.FS, error)

Sub returns an FS corresponding to the subtree rooted at dir.

type Archiver

type Archiver interface {
	// Archive writes an archive file to output with the given files.
	// Context cancellation must be honored.
	Archive(ctx context.Context, output io.Writer, files []File) error
}

Archiver can create a new archive.

type ArchiverAsync

type ArchiverAsync interface {
	Archiver

	// Use ArchiveAsync if you cannot pre-assemble a list of all files for the archive.
	// Close the file channel after all files have been sent.
	ArchiveAsync(ctx context.Context, output io.Writer, files <-chan File) error
}

ArchiverAsync is an Archiver that can also create archives asynchronously, pumping files into the channel as they are discovered.

type Brotli

type Brotli struct {
	Quality int
}

Brotli facilitates brotli compression.

func (Brotli) Match

func (br Brotli) Match(filename string, stream io.Reader) (MatchResult, error)

func (Brotli) Name

func (Brotli) Name() string

func (Brotli) OpenReader

func (Brotli) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Brotli) OpenWriter

func (br Brotli) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Bz2

type Bz2 struct {
	CompressionLevel int
}

Bz2 facilitates bzip2 compression.

func (Bz2) Match

func (bz Bz2) Match(filename string, stream io.Reader) (MatchResult, error)

func (Bz2) Name

func (Bz2) Name() string

func (Bz2) OpenReader

func (Bz2) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Bz2) OpenWriter

func (bz Bz2) OpenWriter(w io.Writer) (io.WriteCloser, error)

type CompressedArchive

type CompressedArchive struct {
	Compression
	Archival
}

CompressedArchive combines a compression format on top of an archive format (e.g. "tar.gz") and provides both functionalities in a single type. This ensures that archive functions are wrapped by compressors and decompressors. However, compressed archives have some limitations. For example, files cannot be inserted/appended because of complexities with modifying existing compression state. As this type is intended to compose compression and archive formats, both must be specified for the value to be valid, or its methods will return errors.

func (CompressedArchive) Archive

func (caf CompressedArchive) Archive(ctx context.Context, output io.Writer, files []File) error

Archive adds files to the output archive while compressing the result.

func (CompressedArchive) Extract

func (caf CompressedArchive) Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error

Extract reads files out of an archive while decompressing the results.

func (CompressedArchive) Match

func (caf CompressedArchive) Match(filename string, stream io.Reader) (MatchResult, error)

Match matches if the input matches both the compression and archive format.

func (CompressedArchive) Name

func (caf CompressedArchive) Name() string

Name returns a concatenation of the archive format name and the compression format name.

type Compression

type Compression interface {
	Format
	Compressor
	Decompressor
}

Compression is a compression format with both compress and decompress methods.

type Compressor

type Compressor interface {
	// OpenWriter wraps w with a new writer that compresses what is written.
	// The writer must be closed when writing is finished.
	OpenWriter(w io.Writer) (io.WriteCloser, error)
}

Compressor can compress data by wrapping a writer.

type Decompressor

type Decompressor interface {
	// OpenReader wraps r with a new reader that decompresses what is read.
	// The reader must be closed when reading is finished.
	OpenReader(r io.Reader) (io.ReadCloser, error)
}

Decompressor can decompress data by wrapping a reader.

type DirFS

type DirFS string

DirFS allows access to a directory on the disk with a serial file system interface.

func (DirFS) Open

func (f DirFS) Open(name string) (fs.File, error)

Open opens the named file.

func (DirFS) ReadDir

func (f DirFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir returns a listing of all the files in the named directory.

func (DirFS) Stat

func (f DirFS) Stat(name string) (fs.FileInfo, error)

Stat returns info about the named file.

func (DirFS) Sub

func (f DirFS) Sub(dir string) (fs.FS, error)

Sub returns an FS corresponding to the subtree rooted at dir.

type Extractor

type Extractor interface {
	// Extract reads the files at pathsInArchive from sourceArchive.
	// If pathsInArchive is nil, all files are extracted without restriction.
	// If pathsInArchive is empty, the files are not extracted.
	// If paths refer to a directory, all files in it are extracted.
	// Extracted files are passed to the handleFile callback for processing.
	// The context cancellation must be honored.
	Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error
}

Extractor can extract files from an archive.

type File

type File struct {
	fs.FileInfo

	// The file header as used/provided by the archive format.
	Header interface{}

	// The path of the file as it appears in the archive.
	FileName string

	// For symbolic and hard links.
	// Not all archive formats are supported.
	LinkTarget string

	// A callback function that opens a file to read its contents.
	// The file must be closed when the reading is finished.
	// Not used for files that have no content (directories and links).
	Open func() (io.ReadCloser, error)
}

File abstraction for interacting with archives.

func FilesFromDisk

func FilesFromDisk(options *FromDiskOptions, filenames map[string]string) (files []File, err error)

FilesFromDisk returns a list of files by traversing the directories in a given filename map. The keys are the names on disk, and the values are the associated names in the archive. Map keys pointing to directories on disk will be looked up and added to the archive recursively, with the root in the named directory. They must use a platform path separator (backslash in Windows; slash in all others). For convenience, map keys ending with a delimiter ('/', or '\' in Windows) will only list the contents of the folder without adding the folder itself to the archive. Map values should normally use a slash ('/') as a separator regardless of platform, since most archive formats standardize this rune as a directory separator for filenames in the archive. For convenience, map values that are an empty string are interpreted as the base filename (no path) in the root of the archive; and map values ending with a slash will use the base filename in the given archive folder. The files will be assembled according to the settings specified in the options. This function is mainly used when preparing a list of files to add to the archive.

func (File) Stat

func (f File) Stat() (fs.FileInfo, error)

type FileFS

type FileFS struct {
	Path        string       // path to the file on disk
	Compression Decompressor // if file is compressed, setting this field will transparently decompress reads
}

FileFS allows accessing a file on disk using a consistent file system interface. The value should be the path to a regular file, not a directory. This file will be the only entry in the file system and will be at the root of the file system. It can be accessed in the file system by the name of "." or by file name. If the file is compressed, set the Compression field to read from the file transparently decompressed.

func (FileFS) Open

func (f FileFS) Open(name string) (fs.File, error)

Open opens the named file, which must be the file used to create the file system.

func (FileFS) ReadDir

func (f FileFS) ReadDir(name string) ([]fs.DirEntry, error)

ReadDir returns a directory listing with the file as the singular entry.

func (FileFS) Stat

func (f FileFS) Stat(name string) (fs.FileInfo, error)

Stat stats the named file, which must be the file used to create the file system.

type FileHandler

type FileHandler func(ctx context.Context, f File) error

FileHandler is a callback function that is used to handle files when reading them from an archive. It is similar to fs.WalkDirFunc. Handler functions that open files must not overlap or execute at the same time, since files can be read from the same sequential thread. Always close the file before returning it. If a special error value of fs.SkipDir is returned, the file directory (or the file itself, if it is a directory) will not be passed. Note that since the contents of an archive are not necessarily ordered, skipping directories requires memory, and skipping a large number of directories can lead to memory overruns. Any other error returned will abort the pass.

type Format

type Format interface {
	// Name returns the name of the format.
	Name() string

	// Match returns true if the given name/stream is recognized. One of the arguments is optional:
	// the filename can be empty if you are working with an unnamed stream,
	// or the stream can be empty if you are working with just the filename.
	// The filename should consist only of the filename, not the path component,
	// and is usually used to search by file extension.
	// However, it is preferable to perform a read stream search.
	// Match reads only as many bytes as necessary to determine the match.
	// To save the stream when matching,
	// you must either buffer what Match reads or search for the last position before calling Match.
	Match(filename string, stream io.Reader) (MatchResult, error)
}

The format is either an archive or a compression format.

func Identify

func Identify(filename string, stream io.Reader) (Format, io.Reader, error)

Identify goes through the registered formats and returns the one that matches the given file name and/or stream. It is capable of identifying compressed files (.gz, .xz...), archive files (.tar, .zip...) and compressed archive files (tar.gz, tar.bz2...). The returned Format value can be checked for type to determine its capabilities. If no suitable formats are found, a special error fmt.Errorf("no formats matched") is returned. The returned io.Reader will always be non-nil and will read from the same point as the passed reader, it should be used instead of the input stream after the Identify() call, because it saves and re-reads bytes that have already been read in the Identify process.

type FromDiskOptions

type FromDiskOptions struct {
	// If true, symbolic links will be dereferenced,
	// that is, the link will not be added as a link,
	// but what the link points to will be added as a file.
	FollowSymboliclinks bool

	// If true, some attributes of the file will not be saved.
	// The name, size, type and permissions will be saved.
	ClearAttributes bool
}

FromDiskOptions specifies options for gathering files from the disk.

type Gz

type Gz struct {
	// Gzip compression level.
	// If 0, DefaultCompression is assumed, not no compression.
	CompressionLevel int

	// Use a fast parallel Gzip implementation.
	// This is effective only for large threads (about 1 MB or more).
	Multithreaded bool
}

Gz facilitates gzip compression.

func (Gz) Match

func (gz Gz) Match(filename string, stream io.Reader) (MatchResult, error)

func (Gz) Name

func (Gz) Name() string

func (Gz) OpenReader

func (gz Gz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Gz) OpenWriter

func (gz Gz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Inserter

type Inserter interface {
	// Context cancellation must be honored.
	Insert(ctx context.Context, archive io.ReadWriteSeeker, files []File) error
}

Inserter can insert files into an existing archive.

type Lz4

type Lz4 struct {
	CompressionLevel int
}

Lz4 facilitates LZ4 compression.

func (Lz4) Match

func (lz Lz4) Match(filename string, stream io.Reader) (MatchResult, error)

func (Lz4) Name

func (Lz4) Name() string

func (Lz4) OpenReader

func (Lz4) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Lz4) OpenWriter

func (lz Lz4) OpenWriter(w io.Writer) (io.WriteCloser, error)

type MatchResult

type MatchResult struct {
	ByName,
	ByStream bool
}

MatchResult returns true if the format was found either by name, by stream, or by both parameters. The name usually refers to searching by file extension, and the stream refers to reading the first few bytes of the stream (its header). Matching by stream is usually more reliable, because filenames do not always indicate the contents of files, if they exist at all.

func (MatchResult) Matched

func (mr MatchResult) Matched() bool

Matched returns true if a match was made by either name or stream.

type Rar

type Rar struct {
	// If true, errors that occurred while reading or writing a file in the archive
	// will be logged and the operation will continue for the remaining files.
	ContinueOnError bool

	// Password to open archives.
	Password string
}

func (Rar) Archive

func (r Rar) Archive(_ context.Context, _ io.Writer, _ []File) error

Archive is not implemented for RAR, but the method exists so that Rar satisfies the ArchiveFormat interface.

func (Rar) Extract

func (r Rar) Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error

func (Rar) Match

func (r Rar) Match(filename string, stream io.Reader) (MatchResult, error)

func (Rar) Name

func (Rar) Name() string

type SevenZip

type SevenZip struct {
	// If true, errors that occurred while reading or writing a file in the archive
	// will be logged and the operation will continue for the remaining files.
	ContinueOnError bool

	// The password, if dealing with an encrypted archive.
	Password string
}

func (SevenZip) Archive

func (z SevenZip) Archive(_ context.Context, _ io.Writer, _ []File) error

Archive is not implemented for 7z, but the method exists so that SevenZip satisfies the ArchiveFormat interface.

func (SevenZip) Extract

func (z SevenZip) Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error

Extract extracts files from z by implementing the Extractor interface. sourceArchive must be io.ReaderAt and io.Seeker, which, oddly enough, are mismatched interfaces from io.Reader, which requires a method signature. This signature is chosen for the interface because you can Read() from anything you can Read() or Seek(). Because of the nature of the zip archive format, if sourceArchive is not io.Seeker and io.ReaderAt, an error is returned.

func (SevenZip) Match

func (z SevenZip) Match(filename string, stream io.Reader) (MatchResult, error)

func (SevenZip) Name

func (z SevenZip) Name() string

type Sz

type Sz struct{}

Sz facilitates Snappy compression.

func (Sz) Match

func (sz Sz) Match(filename string, stream io.Reader) (MatchResult, error)

func (Sz) Name

func (sz Sz) Name() string

func (Sz) OpenReader

func (Sz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Sz) OpenWriter

func (Sz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Tar

type Tar struct {
	// If true, errors that occurred while reading or writing a file in the archive
	// will be logged and the operation will continue for the remaining files.
	ContinueOnError bool
}

func (Tar) Archive

func (t Tar) Archive(ctx context.Context, output io.Writer, files []File) error

func (Tar) ArchiveAsync

func (t Tar) ArchiveAsync(ctx context.Context, output io.Writer, files <-chan File) error

func (Tar) Extract

func (t Tar) Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error

func (Tar) Insert

func (t Tar) Insert(ctx context.Context, into io.ReadWriteSeeker, files []File) error

func (Tar) Match

func (t Tar) Match(filename string, stream io.Reader) (MatchResult, error)

func (Tar) Name

func (Tar) Name() string

type Xz

type Xz struct{}

Xz facilitates xz compression.

func (Xz) Match

func (x Xz) Match(filename string, stream io.Reader) (MatchResult, error)

func (Xz) Name

func (Xz) Name() string

func (Xz) OpenReader

func (Xz) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Xz) OpenWriter

func (Xz) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Zip

type Zip struct {
	// Only compress files which are not already in a compressed format.
	SelectiveCompression bool

	// Method or algorithm for compressing stored files.
	Compression uint16

	// If true, errors that occurred while reading or writing a file in the archive
	// will be logged and the operation will continue for the remaining files.
	ContinueOnError bool

	// Encoding for files in zip archives whose names and comments are not UTF-8 encoded.
	TextEncoding string
}

func (Zip) Archive

func (z Zip) Archive(ctx context.Context, output io.Writer, files []File) error

func (Zip) ArchiveAsync

func (z Zip) ArchiveAsync(ctx context.Context, output io.Writer, files <-chan File) error

func (Zip) Extract

func (z Zip) Extract(ctx context.Context, sourceArchive io.Reader, pathsInArchive []string, handleFile FileHandler) error

Extract extracts files from z by implementing the Extractor interface. sourceArchive must be io.ReaderAt and io.Seeker, which, oddly enough, are mismatched interfaces from io.Reader, which requires a method signature. This signature is chosen for the interface because you can Read() from anything you can Read() or Seek(). Because of the nature of the zip archive format, if sourceArchive is not io.Seeker and io.ReaderAt, an error is returned.

func (Zip) Match

func (z Zip) Match(filename string, stream io.Reader) (MatchResult, error)

func (Zip) Name

func (z Zip) Name() string

type Zlib

type Zlib struct {
	CompressionLevel int
}

Zlib facilitates zlib compression.

func (Zlib) Match

func (zz Zlib) Match(filename string, stream io.Reader) (MatchResult, error)

func (Zlib) Name

func (Zlib) Name() string

func (Zlib) OpenReader

func (Zlib) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Zlib) OpenWriter

func (zz Zlib) OpenWriter(w io.Writer) (io.WriteCloser, error)

type Zstd

type Zstd struct {
	EncoderOptions []zstd.EOption
	DecoderOptions []zstd.DOption
}

Zstd facilitates Zstandard compression.

func (Zstd) Match

func (zs Zstd) Match(filename string, stream io.Reader) (MatchResult, error)

func (Zstd) Name

func (Zstd) Name() string

func (Zstd) OpenReader

func (zs Zstd) OpenReader(r io.Reader) (io.ReadCloser, error)

func (Zstd) OpenWriter

func (zs Zstd) OpenWriter(w io.Writer) (io.WriteCloser, error)

Directories

Path Synopsis
test
src

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL