zip

package
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 23, 2022 License: Apache-2.0, BSD-3-Clause Imports: 13 Imported by: 0

README

Walking zipped content in a memory optimised manner

Intent

The following modifications have been made to the original standard zip package to allow for iterating through zipped content in a memory considerate manner.

The standard package has been optimised for goals other than what we required. The upstream implementation collects a slice of *File with a size equal to the number of files contained within a zip. This implementation works in a iterative/streaming manner where upon encountering a *File in the zip directory tree, the file is immediately passed to a function provided by the user for processing.

Package API

Two top-level functions have been introduced zip.WalkZipFile and WalkZipReaderAt that are akin to the standard package zip.OpenReader and zip.NewReader functions, respectively, but accepting another parameter, a WalkFn, and immediately walking through the contents of the zip file provided by the file or reader, passing each entry to the given WalkFn.

The WalkFn, when passed a *File should return a bool and an error. A false bool or a non-nil error will cause the walk to stop and return the error to the user.

Changes made relative to the codebase in the standard zip package
  • All Write-related code has been removed where possible. We are only interested in reading.
  • All ways to create a Reader/ReadCloser have been removed from the package API, allowing only a single entrypoint to the Walk functions that we have implemented.
  • After making the above changes, all unused code has been removed apart from some constants that were part of constant groups.
  • The Reader.init method has been renamed to Reader.walk to reperesent the functionality that it now holds. More details on the cahnges for this can be found below.
  • The Reader.Files field, a []*File has been removed to make it impossible to incur the memory penalty of accumulating a slice of *File details.
  • All tests that are appropriate to keep have been migrated to represent equivalents of their original but using the new Walk methods. An extra test file reader_local_test.go has been introduced, containing tests specific for our implementation.
  • Some tests that use standard internal package have been removed as it is impossible for us to reference them.
  • Examples have been removed.
  • Some general changes have been made to make the codebase meet some of our linter requirements, i.e. Matching names of method receivers in method definitions, some extra error checking and removal of unnecessary conversions between value types.
Changes to Reader.init/Reader.walk
  • Reader.init is where the standard package reads through the directory tree of the zip file to accumulate a slice of *File for each file in the zip. This method has been changed to accept a WalkFn which each *File is passed to instead of appending it to a slice. The WalkFn is passed into the two top-level Walk functions by the caller.
  • All logic related to Reader.Files has been removed.
  • The given WalkFn, when passed a *File should return a bool and an error. A false bool or a non-nil error will cause the walk to stop and return the error to the user.
  • If the WalkFn never returns a false or non-nil error, the expected number of directory records is compared against the number of files iterated through, in a similar manner that the length of Reader.Files was previously checked. If the numbers do not match, an error is returned from the top-level Walk functions.
  • File.readDataDescriptor is no longer called on each file as it is iterated, this change has been made for performance reasons as it is an expensive call and the information populated by File.readDataDescriptor is not required unless the File is being open. This is now called within the File.Open method so that it is only called if the file is going to be opened. A user can now determine whether the file should be opened from the filename or size, for example, and then only have f.readDataDescriptor called for those that are opened.
Changes to File
  • File.readDataDescriptor now accepts a bodyOffset rather than calling findBodyOffset to determine it. This change was made because in File.Open, we now find the body offset when Open is called and can inject it into readDataDescriptor to avoid findBodyOffset being called multiple times per Open.

Documentation

Overview

Package zip provides support for reading and writing ZIP archives.

See: https://www.pkware.com/appnote

This package does not support disk spanning.

A note about ZIP64:

To be backwards compatible the FileHeader has both 32 and 64 bit Size fields. The 64 bit fields will always contain the correct value and for normal archives both fields will be the same. For files requiring the ZIP64 format the 32 bit fields will be 0xffffffff and the 64 bit fields must be used instead.

Index

Constants

View Source
const (
	Store   uint16 = 0 // no compression
	Deflate uint16 = 8 // DEFLATE compressed
)

Compression methods.

Variables

View Source
var (
	ErrFormat    = errors.New("zip: not a valid zip file")
	ErrAlgorithm = errors.New("zip: unsupported compression algorithm")
	ErrChecksum  = errors.New("zip: checksum error")
)

Functions

func WalkZipFile

func WalkZipFile(name string, walkFn WalkFn) error

WalkZipFile will open the Zip file specified by name and walk the contents of it, passing each *File encountered to the given WalkFn.

func WalkZipReaderAt

func WalkZipReaderAt(r io.ReaderAt, size int64, walkFn WalkFn) error

WalkZipReaderAt will use the given ReaderAt, which is assumed to have the given size in bytes, walking the contents of it and passing each *File encountered to the given WalkFn.

Types

type Compressor

type Compressor func(w io.Writer) (io.WriteCloser, error)

A Compressor returns a new compressing writer, writing to w. The WriteCloser's Close method must be used to flush pending data to w. The Compressor itself must be safe to invoke from multiple goroutines simultaneously, but each returned writer will be used only by one goroutine at a time.

type Decompressor

type Decompressor func(r io.Reader) io.ReadCloser

A Decompressor returns a new decompressing reader, reading from r. The ReadCloser's Close method must be used to release associated resources. The Decompressor itself must be safe to invoke from multiple goroutines simultaneously, but each returned reader will be used only by one goroutine at a time.

type File

type File struct {
	FileHeader
	// contains filtered or unexported fields
}

A File is a single file in a ZIP archive. The file information is in the embedded FileHeader. The file content can be accessed by calling Open.

func (*File) DataOffset

func (f *File) DataOffset() (offset int64, err error)

DataOffset returns the offset of the file's possibly-compressed data, relative to the beginning of the zip file.

Most callers should instead use Open, which transparently decompresses data and verifies checksums.

func (*File) Open

func (f *File) Open() (io.ReadCloser, error)

Open returns a ReadCloser that provides access to the File's contents. Multiple files may be read concurrently.

func (*File) OpenRaw

func (f *File) OpenRaw() (io.Reader, error)

OpenRaw returns a Reader that provides access to the File's contents without decompression.

type FileHeader

type FileHeader struct {
	// Name is the name of the file.
	//
	// It must be a relative path, not start with a drive letter (such as "C:"),
	// and must use forward slashes instead of back slashes. A trailing slash
	// indicates that this file is a directory and should have no data.
	//
	// When reading zip files, the Name field is populated from
	// the zip file directly and is not validated for correctness.
	// It is the caller's responsibility to sanitize it as
	// appropriate, including canonicalizing slash directions,
	// validating that paths are relative, and preventing path
	// traversal through filenames ("../../../").
	Name string

	// Comment is any arbitrary user-defined string shorter than 64KiB.
	Comment string

	// NonUTF8 indicates that Name and Comment are not encoded in UTF-8.
	//
	// By specification, the only other encoding permitted should be CP-437,
	// but historically many ZIP readers interpret Name and Comment as whatever
	// the system's local character encoding happens to be.
	//
	// This flag should only be set if the user intends to encode a non-portable
	// ZIP file for a specific localized region. Otherwise, the Writer
	// automatically sets the ZIP format's UTF-8 flag for valid UTF-8 strings.
	NonUTF8 bool

	CreatorVersion uint16
	ReaderVersion  uint16
	Flags          uint16

	// Method is the compression method. If zero, Store is used.
	Method uint16

	// Modified is the modified time of the file.
	//
	// When reading, an extended timestamp is preferred over the legacy MS-DOS
	// date field, and the offset between the times is used as the timezone.
	// If only the MS-DOS date is present, the timezone is assumed to be UTC.
	//
	// When writing, an extended timestamp (which is timezone-agnostic) is
	// always emitted. The legacy MS-DOS date field is encoded according to the
	// location of the Modified time.
	Modified     time.Time
	ModifiedTime uint16 // Deprecated: Legacy MS-DOS date; use Modified instead.
	ModifiedDate uint16 // Deprecated: Legacy MS-DOS time; use Modified instead.

	CRC32              uint32
	CompressedSize     uint32 // Deprecated: Use CompressedSize64 instead.
	UncompressedSize   uint32 // Deprecated: Use UncompressedSize64 instead.
	CompressedSize64   uint64
	UncompressedSize64 uint64
	Extra              []byte
	ExternalAttrs      uint32 // Meaning depends on CreatorVersion
}

FileHeader describes a file within a zip file. See the zip spec for details.

func FileInfoHeader

func FileInfoHeader(fi fs.FileInfo) (*FileHeader, error)

FileInfoHeader creates a partially-populated FileHeader from an fs.FileInfo. Because fs.FileInfo's Name method returns only the base name of the file it describes, it may be necessary to modify the Name field of the returned header to provide the full path name of the file. If compression is desired, callers should set the FileHeader.Method field; it is unset by default.

func (*FileHeader) FileInfo

func (h *FileHeader) FileInfo() fs.FileInfo

FileInfo returns an fs.FileInfo for the FileHeader.

func (*FileHeader) ModTime deprecated

func (h *FileHeader) ModTime() time.Time

ModTime returns the modification time in UTC using the legacy ModifiedDate and ModifiedTime fields.

Deprecated: Use Modified instead.

func (*FileHeader) Mode

func (h *FileHeader) Mode() (mode fs.FileMode)

Mode returns the permission and mode bits for the FileHeader.

func (*FileHeader) SetModTime deprecated

func (h *FileHeader) SetModTime(t time.Time)

SetModTime sets the Modified, ModifiedTime, and ModifiedDate fields to the given time in UTC.

Deprecated: Use Modified instead.

func (*FileHeader) SetMode

func (h *FileHeader) SetMode(mode fs.FileMode)

SetMode changes the permission and mode bits for the FileHeader.

type ReadCloser

type ReadCloser struct {
	Reader
	// contains filtered or unexported fields
}

A ReadCloser is a Reader that must be closed when no longer needed.

func (*ReadCloser) Close

func (rc *ReadCloser) Close() error

Close closes the Zip file, rendering it unusable for I/O.

type Reader

type Reader struct {
	Comment string
	// contains filtered or unexported fields
}

A Reader serves content from a ZIP archive.

func (*Reader) RegisterDecompressor

func (z *Reader) RegisterDecompressor(method uint16, dcomp Decompressor)

RegisterDecompressor registers or overrides a custom decompressor for a specific method ID. If a decompressor for a given method is not found, Reader will default to looking up the decompressor at the package level.

type WalkFn

type WalkFn func(*File) (bool, error)

WalkFn is called for every file in a zip. The file passed to is can be used during the duration of a callback and should not be held onto for use after the walk has been completed. WalkFn should return a bool for whether the walk of the zip should continue or not, along with an error for whether the walk for the given file was successful. If an error or false are returned, the walk will no longer proceed to subsequent files.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL