fastzip

package module
v0.1.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 9, 2023 License: MIT Imports: 24 Imported by: 9

README

fastzip

godoc Build Status

Fastzip is an opinionated Zip archiver and extractor with a focus on speed.

  • Archiving and extraction of files and directories can only occur within a specified directory.
  • Permissions, ownership (uid, gid on linux/unix) and modification times are preserved.
  • Buffers used for copying files are recycled to reduce allocations.
  • Files are archived and extracted concurrently.
  • By default, the excellent github.com/klauspost/compress/flate library is used for compression and decompression.

Example

Archiver
// Create archive file
w, err := os.Create("archive.zip")
if err != nil {
  panic(err)
}
defer w.Close()

// Create new Archiver
a, err := fastzip.NewArchiver(w, "~/fastzip-archiving")
if err != nil {
  panic(err)
}
defer a.Close()

// Register a non-default level compressor if required
// a.RegisterCompressor(zip.Deflate, fastzip.FlateCompressor(1))

// Walk directory, adding the files we want to add
files := make(map[string]os.FileInfo)
err = filepath.Walk("~/fastzip-archiving", func(pathname string, info os.FileInfo, err error) error {
	files[pathname] = info
	return nil
})

// Archive
if err = a.Archive(context.Background(), files); err != nil {
  panic(err)
}
Extractor
// Create new extractor
e, err := fastzip.NewExtractor("archive.zip", "~/fastzip-extraction")
if err != nil {
  panic(err)
}
defer e.Close()

// Extract archive files
if err = e.Extract(context.Background()); err != nil {
  panic(err)
}

Benchmarks

Archiving and extracting a Go 1.13 GOROOT directory, 342M, 10308 files.

StandardFlate is using compress/flate, NonStandardFlate is klauspost/compress/flate, both on level 5. This was performed on a server with an SSD and 24-cores. Each test was conducted using the WithArchiverConcurrency and WithExtractorConcurrency options of 1, 2, 4, 8 and 16.

$ go test -bench Benchmark* -archivedir go1.13 -benchtime=30s -timeout=20m

goos: linux
goarch: amd64
pkg: github.com/saracen/fastzip
BenchmarkArchiveStore_1-24                            39         788604969 ns/op         421.66 MB/s     9395405 B/op     266271 allocs/op
BenchmarkArchiveStandardFlate_1-24                     2        16154127468 ns/op         20.58 MB/s    12075824 B/op     257251 allocs/op
BenchmarkArchiveStandardFlate_2-24                     4        8686391074 ns/op          38.28 MB/s    15898644 B/op     260757 allocs/op
BenchmarkArchiveStandardFlate_4-24                     7        4391603068 ns/op          75.72 MB/s    19295604 B/op     260871 allocs/op
BenchmarkArchiveStandardFlate_8-24                    14        2291624196 ns/op         145.10 MB/s    21999205 B/op     260970 allocs/op
BenchmarkArchiveStandardFlate_16-24                   16        2105056696 ns/op         157.96 MB/s    29237232 B/op     261225 allocs/op
BenchmarkArchiveNonStandardFlate_1-24                  6        6011250439 ns/op          55.32 MB/s    11070960 B/op     257204 allocs/op
BenchmarkArchiveNonStandardFlate_2-24                  9        3629347294 ns/op          91.62 MB/s    18870130 B/op     262279 allocs/op
BenchmarkArchiveNonStandardFlate_4-24                 18        1766182097 ns/op         188.27 MB/s    22976928 B/op     262349 allocs/op
BenchmarkArchiveNonStandardFlate_8-24                 34        1002516188 ns/op         331.69 MB/s    29860872 B/op     262473 allocs/op
BenchmarkArchiveNonStandardFlate_16-24                46         757112363 ns/op         439.20 MB/s    42036132 B/op     262714 allocs/op
BenchmarkExtractStore_1-24                            20        1625582744 ns/op         202.66 MB/s    22900375 B/op     330528 allocs/op
BenchmarkExtractStore_2-24                            42         786644031 ns/op         418.80 MB/s    22307976 B/op     329272 allocs/op
BenchmarkExtractStore_4-24                            92         384075767 ns/op         857.76 MB/s    22247288 B/op     328667 allocs/op
BenchmarkExtractStore_8-24                           165         215884636 ns/op        1526.02 MB/s    22354996 B/op     328459 allocs/op
BenchmarkExtractStore_16-24                          226         157087517 ns/op        2097.20 MB/s    22258691 B/op     328393 allocs/op
BenchmarkExtractStandardFlate_1-24                     6        5501808448 ns/op          23.47 MB/s    86148462 B/op     495586 allocs/op
BenchmarkExtractStandardFlate_2-24                    13        2748387174 ns/op          46.99 MB/s    84232141 B/op     491343 allocs/op
BenchmarkExtractStandardFlate_4-24                    21        1511063035 ns/op          85.47 MB/s    84998750 B/op     490124 allocs/op
BenchmarkExtractStandardFlate_8-24                    32         995911009 ns/op         129.67 MB/s    86188957 B/op     489574 allocs/op
BenchmarkExtractStandardFlate_16-24                   46         652641882 ns/op         197.88 MB/s    88256113 B/op     489575 allocs/op
BenchmarkExtractNonStandardFlate_1-24                  7        4989810851 ns/op          25.88 MB/s    64552948 B/op     373541 allocs/op
BenchmarkExtractNonStandardFlate_2-24                 13        2478287953 ns/op          52.11 MB/s    63413947 B/op     373183 allocs/op
BenchmarkExtractNonStandardFlate_4-24                 26        1333552250 ns/op          96.84 MB/s    63546389 B/op     373925 allocs/op
BenchmarkExtractNonStandardFlate_8-24                 37         817039739 ns/op         158.06 MB/s    64354655 B/op     375357 allocs/op
BenchmarkExtractNonStandardFlate_16-24                63         566984549 ns/op         227.77 MB/s    65444227 B/op     379664 allocs/op

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrMinConcurrency = errors.New("concurrency must be at least 1")
)

Functions

func FlateCompressor

func FlateCompressor(level int) func(w io.Writer) (io.WriteCloser, error)

FlateCompressor returns a pooled performant zip.Compressor configured to a specified compression level. Invalid flate levels will panic.

func FlateDecompressor

func FlateDecompressor() func(r io.Reader) io.ReadCloser

FlateDecompressor returns a pooled performant zip.Decompressor.

func StdFlateCompressor

func StdFlateCompressor(level int) func(w io.Writer) (io.WriteCloser, error)

StdFlateCompressor returns a pooled standard library zip.Compressor configured to a specified compression level. Invalid flate levels will panic.

func StdFlateDecompressor

func StdFlateDecompressor() func(r io.Reader) io.ReadCloser

StdFlateDecompressor returns a pooled standard library zip.Decompressor.

func ZstdCompressor added in v0.1.10

func ZstdCompressor(level int) func(w io.Writer) (io.WriteCloser, error)

func ZstdDecompressor added in v0.1.10

func ZstdDecompressor() func(r io.Reader) io.ReadCloser

ZstdDecompressor returns a pooled zstd decoder.

Types

type Archiver

type Archiver struct {
	// contains filtered or unexported fields
}

Archiver is an opinionated Zip archiver.

Only regular files, symlinks and directories are supported. Only files that are children of the specified chroot directory will be archived.

Access permissions, ownership (unix) and modification times are preserved.

func NewArchiver

func NewArchiver(w io.Writer, chroot string, opts ...ArchiverOption) (*Archiver, error)

NewArchiver returns a new Archiver.

func (*Archiver) Archive

func (a *Archiver) Archive(ctx context.Context, files map[string]os.FileInfo) (err error)

Archive archives all files, symlinks and directories.

func (*Archiver) Close

func (a *Archiver) Close() error

Close closes the underlying ZipWriter.

func (*Archiver) RegisterCompressor

func (a *Archiver) RegisterCompressor(method uint16, comp zip.Compressor)

RegisterCompressor registers custom compressors for a specified method ID. The common methods Store and Deflate are built in.

func (*Archiver) Written

func (a *Archiver) Written() (bytes, entries int64)

Written returns how many bytes and entries have been written to the archive. Written can be called whilst archiving is in progress.

type ArchiverOption

type ArchiverOption func(*archiverOptions) error

ArchiverOption is an option used when creating an archiver.

func WithArchiverBufferSize added in v0.1.2

func WithArchiverBufferSize(n int) ArchiverOption

WithArchiverBufferSize sets the buffer size for each file to be compressed concurrently. If a compressed file's data exceeds the buffer size, a temporary file is written (to the stage directory) to hold the additional data. The default is 2 mebibytes, so if concurrency is 16, 32 mebibytes of memory will be allocated.

func WithArchiverConcurrency

func WithArchiverConcurrency(n int) ArchiverOption

WithArchiverConcurrency will set the maximum number of files to be compressed concurrently. The default is set to GOMAXPROCS.

func WithArchiverMethod

func WithArchiverMethod(method uint16) ArchiverOption

WithArchiverMethod sets the zip method to be used for compressible files.

func WithArchiverOffset

func WithArchiverOffset(n int64) ArchiverOption

WithArchiverOffset sets the offset of the beginning of the zip data. This should be used when zip data is appended to an existing file.

func WithStageDirectory

func WithStageDirectory(dir string) ArchiverOption

WithStageDirectory sets the directory to be used to stage compressed files before they're written to the archive. The default is the directory to be archived.

type Extractor

type Extractor struct {
	// contains filtered or unexported fields
}

Extractor is an opinionated Zip file extractor.

Files are extracted in parallel. Only regular files, symlinks and directories are supported. Files can only be extracted to the specified chroot directory.

Access permissions, ownership (unix) and modification times are preserved.

func NewExtractor

func NewExtractor(filename, chroot string, opts ...ExtractorOption) (*Extractor, error)

NewExtractor opens a zip file and returns a new extractor.

Close() should be called to close the extractor's underlying zip.Reader when done.

func NewExtractorFromReader added in v0.1.1

func NewExtractorFromReader(r io.ReaderAt, size int64, chroot string, opts ...ExtractorOption) (*Extractor, error)

NewExtractor returns a new extractor, reading from the reader provided.

The size of the archive should be provided.

Unlike with NewExtractor(), calling Close() on the extractor is unnecessary.

func (*Extractor) Close

func (e *Extractor) Close() error

Close closes the underlying ZipReader.

func (*Extractor) Extract

func (e *Extractor) Extract(ctx context.Context) (err error)

Extract extracts files, creates symlinks and directories from the archive.

func (*Extractor) Files

func (e *Extractor) Files() []*zip.File

Files returns the file within the archive.

func (*Extractor) RegisterDecompressor

func (e *Extractor) RegisterDecompressor(method uint16, dcomp zip.Decompressor)

RegisterDecompressor allows custom decompressors for a specified method ID. The common methods Store and Deflate are built in.

func (*Extractor) Written

func (e *Extractor) Written() (bytes, entries int64)

Written returns how many bytes and entries have been written to disk. Written can be called whilst extraction is in progress.

type ExtractorOption

type ExtractorOption func(*extractorOptions) error

ExtractorOption is an option used when creating an extractor.

func WithExtractorChownErrorHandler

func WithExtractorChownErrorHandler(fn func(name string, err error) error) ExtractorOption

WithExtractorChownErrorHandler sets an error handler to be called if errors are encountered when trying to preserve ownership of extracted files. Returning nil will continue extraction, returning any error will cause Extract() to error.

func WithExtractorConcurrency

func WithExtractorConcurrency(n int) ExtractorOption

WithExtractorConcurrency will set the maximum number of files being extracted concurrently. The default is set to GOMAXPROCS.

Directories

Path Synopsis
internal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL