varint

package module

v1.0.0 Latest Latest Go to latest Published: Dec 1, 2021 License: BSD-3-Clause Imports: 0 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/chmike/varint

Links

Open Source Insights

README ¶

Variable length unsigned integer encoding

Small unsigned integer values are more frequent than big values. A substantial compression may then be achieved by dropping the insignificant 0 bits when serializing integers in a byte sequence. The byte length of the encoded integer will then vary according to the number of significant bits.

The length of the encoded integer must be encoded with the integer bits so that the original integer can be restored. There exist different ways to encode a varying length integer. Here we consider only the encoding of uint64 values.

The most popular of such encoding is LEB128. It allows to encode integers of any byte size. It is the encoding used in UTF8 characters and in the encoding/binary package for Uvarint. The maximum byte length of an uint64 encoded in LEB128 is 10 bytes.

This package implements the prefix code. The significant bytes are serialized in big endian order, and the most significant bits of the first byte set to 1 encode the number of bytes that follow the first byte. This encoding requires less bit fiddling than the postfiy code to encode or decode. The maximum byte length of an uint64 prefix encoded is 9 bytes.

An alternate encoding is the postfix code. In this encoding the significant bytes are serialized in little endian order. The byte length is encoded as a number of 0 bits in the last byte which appears first in the encoded byte sequence. It may seam more performant than the prefix code since words may be written in memory in one instruction on little endian computers, but the bit fiddling adds a non nigligible overhead. Benchmarks comparing prefix en postfix code show indeed that prefix code is more performant when coded in Go.

There exist other encoding as the one use by SQLight, but we don't discuss them here.

Usage

To use this package you must first make sure that you have a go.mod file in your project. To create a go.mod file, you must issue the following command in your terminal where you replace myProgram with the name of your program.

$ go mod init myProgram

The second step is to import this package in the go source files that will use its functions.

import "github.com/chmike/varint"

Encoding

To encode a uint64 value, use the following function. It will serialize the value in the byte slice and return the number of bytes written. If the slice is too small to hold the encoded integer, the function returns 0. The function never panics.

func varint.Encode([]byte, uint64) int

In your program, you would encode the value 1234 in the byte slice b like this:

l := varint.Encode(b, 1234)
if l == 0 {
  // b is too small to hold the encoded value
}
b = b[l:]

Decoding

To decode an encoded uint64 value, use the following function. It will deserialize the value and return the number of bytes read. If the slice is too small to contain a value (slice empty or value truncated), the function returns the value 0 and 0 bytes read. The function never panics.

func varint.Decode([]byte) (uint64,int)

In your program, you would decode an encoded value in the byte slice b like this:

l, v := varint.Decod(b)
if l == 0 {
  // b is empty or the value is truncated
}
b = b[l:]

difference with the Uvarint functions

These functions have the same API as the binary.PutUvarint() and the binary.Uvarint functions and can conveniently replace them. There are a number of differences though.

the Uvarint() and PatUvarint functions may panic
the function PutUvarint() will never return 0
Uvarint() may return a negative number of bytes read in some conditions.

Preformance

The extensive study of performance comparing different implementation of postfix and prefix encoding led to this code. Its performance is compared with the binary.PutUvarint() and binary.Uvarint() and presented in the following graphic. The data is made available in docs/data.txt.

benchmarks

The LEB128 encoding with binary.PutUvarint() for small values up to 14 significant bits is faster than the varint.Encode() function. This is because it is inlined due to the simplicity of the code. But varint.Decode() is faster than binary.Uvarint().

The following graphic shows the time required to encode and decode a uint64 value. This clearly shows that varint is faster. Since a value is usually written once and read many times, it shows that prefix encoding is a more efficient encoding than LEB128.

benchmark encode and decode

Documentation ¶

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Decode ¶

func Decode(b []byte) (uint64, int)

Decode returns the varying length integer in front of b and the number of bytes read or zero if b is empty or the integer is truncated. This function doesn't panic.

func Encode ¶

func Encode(b []byte, v uint64) int

Encode v as a prefixed with 1 bits variable length integer into b. Returns the number of bytes written or 0 if b is too small. The encoded integer is at most 9 bytes long. This function doesn't panic.

Types ¶

This section is empty.

Source Files ¶

View all Source files

varint.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL