packfile

package
v0.0.0-...-2ee2aa9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 28, 2022 License: MIT Imports: 17 Imported by: 0

Documentation

Overview

Package packfile contains methods and structs to read and write packfiles

Index

Constants

View Source
const (
	ExtPackfile = ".pack"
	ExtIndex    = ".idx"
)

list of file extensions

Variables

View Source
var (
	// ErrIntOverflow is an error thrown when the packfile couldn't
	// be parsed because some data couldn't fit in an int64
	ErrIntOverflow = errors.New("int64 overflow")
	// ErrInvalidMagic is an error thrown when a file doesn't have
	// the expected magic.
	ErrInvalidMagic = errors.New("invalid magic")
	// ErrInvalidVersion is an error thrown when a file has an
	// unsupported version
	ErrInvalidVersion = errors.New("invalid version")
	// ErrInvalidObjectSize represents a object which size doesn't
	// match the expected size
	ErrInvalidObjectSize = errors.New("invalid object")
)
View Source
var OidWalkStop = errors.New("stop walking") //nolint // the linter expects all errors to start with Err, but since here we're faking an error we don't want that

OidWalkStop is a fake error used to tell Walk() to stop

Functions

This section is empty.

Types

type OidWalkFunc

type OidWalkFunc = func(oid ginternals.Oid) error

OidWalkFunc represents a function that will be apply on all oid found by Walk()

type Pack

type Pack struct {
	// contains filtered or unexported fields
}

Pack represents a Packfile The packfile contains a header, a content, and a footer Header: 12 bytes

The first 4 bytes contain the magic ('P', 'A', 'C', 'K')
The next 4 bytes contains the version (0, 0, 0, 2)
The last 4 bytes contains the number of objects in the packfile

Content: Variable size

         The content contains all the objects of the packfile, each zlib
         compressed.
         Before every zlib compressed objects comes a few bytes of
         metadata about the object (the type and size of the object).
         The size of the metadata is variable, so every byte contains
         a MSB (Most Significant bit, the most left bit of a byte) that
         indicates if the next byte is also part of the size or not.
         The very first byte of the metadata contains:
         - The MSB (1 bit)
         - The type of the object (3 bits)
         - the beginning of the size (4 bits)
         The subsequent bytes contains:
         - The MSB (1 bit)
			- The next part of the size (7 bits)
        The chucks of the size are little-endian encoded (right to left):
        Final_size = [part_2][part_1][part_0]
        /!\ The size of the object cannot be used to extract the
        object. The size corresponds to the real size of the object
        and not the size of the zlib compressed object (which is)
        what we have here). It's possible that the compressed object
        has a bigger size than the de-compressed object.

Footer: 20 bytes

Contains the SHA1 sum of the packfile (without this SHA)

https://github.com/git/git/blob/master/Documentation/technical/pack-format.txt

func NewFromFile

func NewFromFile(fs afero.Fs, filePath string) (pack *Pack, err error)

NewFromFile returns a pack object from the given file The pack will need to be closed using Close()

func (*Pack) Close

func (pck *Pack) Close() error

Close frees the resources

func (*Pack) GetObject

func (pck *Pack) GetObject(oid ginternals.Oid) (*object.Object, error)

GetObject returns the object that has the given SHA

func (*Pack) ID

func (pck *Pack) ID() ginternals.Oid

ID returns the ID of the packfile

func (*Pack) ObjectCount

func (pck *Pack) ObjectCount() uint32

ObjectCount returns the number of objects in the packfile

func (*Pack) WalkOids

func (pck *Pack) WalkOids(f OidWalkFunc) error

WalkOids walks over all the OIDs of the packfile

type PackIndex

type PackIndex struct {
	// contains filtered or unexported fields
}

PackIndex represents a packfile's PackIndex file (.idx) The index contains data to help parsing the packfile The index contains a header, 5 layers, and a footer. header: 8 bytes - See indexHeader to know the header format Layer1: 1024 bytes. Contains 256 entries of 4 bytes.

Each entry contains the CUMULATIVE number of objects having
a oid starting by oid[0].
(oid[0] is an hex number, 0 <= x <= 255).
It's used to count how many objects have a SHA starting by
a specific value.
Example:
oid[0] represents the value of the 2 first chars of a SHA
So for 9b91da06e69613397b38e0808e0ba5ee6983251b, oid[0]
is equal to '9b' which corresponds to 155.
You'll then find the CUMULATIVE object count at the
position 155 * 4 in layer1.
To get the total of object starting with 9b, you will need
to look at the previous entry (9a at 154 * 4), and do
total_at_9b = cumul_9b - cummul_9a

Layer2: x*20 bytes - Contains the IDs (20 Bytes each) of all the objects

contained in the packfile

Layer3: x*4 bytes - Contains a CRC (Cyclic redundancy check) value

for each object. It's used to check that data did not get corrupt
by network operations.
https://en.wikipedia.org/wiki/Cyclic_redundancy_check

Layer4: x*4 - Contains the offset of each objects inside the packfile.

The first bit (and not byte, 1 byte = 8 bits) of the offset
(called MSB for Most Significant Bit) is used to store a special
value, and is not part of the offset:

If the packfile is < 2GB
  - The MSB will always be 0
  - The remaining bit (31, because it's 4 bytes of 8 bits
    minus the MSB, so 4*8-1) correspond to the offset of
    the object in the packfile.

If the packfile is > 2GB
  - The MSB may be 0, or 1
  - If 0, then the next 31 bits will contain the offset of
    the object in the packfile.
  - If 1, then the packfile offset doesn't fit in 4 bytes and
    has been stored in layer5. In that case the next 31 bits will
    corresponds to the new location of the offset in
    layer5.

Layer5: y*8 bytes - Only exists for packfile bigger than 2GB.

Basically the same as Layer4 but the offsets are on 8 bytes
instead of 4, because 4 bytes was too small to store those
offsets.

Footer: 40 bytes - Contains 2 sha of 20 bytes each

The first is the sha1 sum of the packfile
The second is the sha1 sum of the index file minus this sha

Resources: https://codewords.recurse.com/issues/three/unpacking-git-packfiles#idx-files https://git-scm.com/docs/pack-format

func NewIndex

func NewIndex(r readutil.BufferedReader) (idx *PackIndex, err error)

NewIndex returns an index object from the given reader

func (*PackIndex) GetObjectOffset

func (idx *PackIndex) GetObjectOffset(oid ginternals.Oid) (uint64, error)

GetObjectOffset returns the offset of Oid in the packfile If the object is not found ginternals.ErrObjectNotFound is returned

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL