kanzi

package module
v1.9.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 10, 2021 License: Apache-2.0, Apache-2.0 Imports: 3 Imported by: 0

README

kanzi

Kanzi is a modern, modular, expendable and efficient lossless data compressor implemented in Go.

  • modern: state-of-the-art algorithms are implemented and multi-core CPUs can take advantage of the built-in multi-tasking.
  • modular: entropy codec and a combination of transforms can be provided at runtime to best match the kind of data to compress.
  • expandable: clean design with heavy use of interfaces as contracts makes integrating and expanding the code easy. No dependencies.
  • efficient: the code is optimized for efficiency (trade-off between compression ratio and speed).

Kanzi supports a wide range of compression ratios and can compress many files more than most common compressors (at the cost of decompression speed). It is not compatible with standard compression formats.

For more details, check https://github.com/flanglet/kanzi-go/wiki.

Credits

Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov

Disclaimer

Use at your own risk. Always keep a backup of your files. The bitstream format is not yet finalized.

Build Status Go Report Card Total alerts Documentation

Silesia corpus benchmark

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 18.04.05

go1.15rc1

Kanzi version 1.8 Go implementation. Block size is 100 MB.

Compressor Encoding (sec) Decoding (sec) Size
Original 211,938,580
Gzip 1.6 -4 3.4 1.1 71,045,115
Kanzi -l 1 2.4 1.3 69,840,720
Kanzi -l 1 -j 6 0.9 0.4 69,840,720
Zstd 1.4.5 -2 0.7 0.3 69,636,234
Zstd 1.4.5 -2 -T6 0.3 0.3 69,636,234
Gzip 1.6 -5 4.4 1.1 69,143,980
Brotli 1.0.9 -2 --large_window=30 1.5 0.9 68,033,377
Gzip 1.6 -9 14.3 1.0 67,631,990
Kanzi -l 2 5.0 2.3 60,147,109
Kanzi -l 2 -j 6 1.8 0.8 60,147,109
Zstd 1.4.5 -13 16.0 0.3 58,125,865
Zstd 1.4.5 -13 -T6 11.3 0.3 58,125,865
Orz 1.5.0 7.6 2.0 57,564,831
Brotli 1.0.9 -9 --large_window=30 36.6 0.8 56,232,817
Lzma 5.2.2 -3 24.3 2.4 55,743,540
Kanzi -l 3 10.8 6.8 54,996,910
Kanzi -l 3 -j 6 3.9 2.3 54,996,910
Bzip2 1.0.6 -9 14.9 5.2 54,506,769
Zstd 1.4.5 -19 61.8 0.3 53,261,006
Zstd 1.4.5 -19 -T6 53.4 0.3 53,261,006
Kanzi -l 4 12.3 6.9 51,739,977
Kanzi -l 4 -j 6 4.2 2.1 51,739,977
Brotli 1.0.9 --large_window=30 356.2 0.9 49,383,136
Lzma 5.2.2 -9 65.0 2.4 48,780,457
Kanzi -l 5 15.5 11.2 48,067,650
Kanzi -l 5 -j 6 5.3 3.7 48,067,650
Kanzi -l 6 26.3 22.2 46,543,124
Kanzi -l 6 -j 6 8.5 7.0 46,543,124
Tangelo 2.4 83.2 85.9 44,862,127
zpaq v7.14 m4 t1 107.3 112.2 42,628,166
zpaq v7.14 m4 t12 108.1 111.5 42,628,166
Kanzi -l 7 63.2 64.7 41,804,239
Kanzi -l 7 -j 6 23.3 21.9 41,804,239
Tangelo 2.0 302.0 310.9 41,267,068
Kanzi -l 8 85.1 86.0 40,423,483
Kanzi -l 8 -j 6 32.7 33.5 40,423,483
zpaq v7.14 m5 t1 343.1 352.0 39,112,924
zpaq v7.14 m5 t12 344.3 350.4 39,112,924

enwik8

i7-7700K @4.20GHz, 32GB RAM, Ubuntu 18.04.05

go1.15rc1

Kanzi version 1.8 Go implementation. Block size is 100 MB. 1 thread

Compressor Encoding (sec) Decoding (sec) Size
Original 100,000,000
Kanzi -l 1 1.40 0.75 32,654,135
Kanzi -l 2 2.48 1.22 27,410,862
Kanzi -l 3 5.22 3.36 25,670,935
Kanzi -l 4 5.02 2.64 22,481,393
Kanzi -l 5 7.17 4.50 21,232,214
Kanzi -l 6 10.99 8.34 20,951,898
Kanzi -l 7 24.10 24.39 19,515,358
Kanzi -l 8 31.84 32.79 19,099,778

Build

There are no dependencies, making the project easy to build.

Option 1: go get

cd $GOPATH

go get github.com/flanglet/kanzi-go

cd src/github.com/flanglet/kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go

Option 2: git clone

cd $GOPATH/src

mkdir github.com; cd github.com

mkdir flanglet; cd flanglet

git clone https://github.com/flanglet/kanzi-go.git

cd kanzi-go/app

go build Kanzi.go BlockCompressor.go BlockDecompressor.go InfoPrinter.go

Documentation

Index

Constants

View Source
const (
	EVT_COMPRESSION_START     = 0 // Compression starts
	EVT_DECOMPRESSION_START   = 1 // Decompression starts
	EVT_BEFORE_TRANSFORM      = 2 // Transform forward/inverse starts
	EVT_AFTER_TRANSFORM       = 3 // Transform forward/inverse ends
	EVT_BEFORE_ENTROPY        = 4 // Entropy encoding/decoding starts
	EVT_AFTER_ENTROPY         = 5 // Entropy encoding/decoding ends
	EVT_COMPRESSION_END       = 6 // Compression ends
	EVT_DECOMPRESSION_END     = 7 // Decompression ends
	EVT_AFTER_HEADER_DECODING = 8 // Compression header decoding ends
)
View Source
const (
	ERR_MISSING_PARAM       = 1
	ERR_BLOCK_SIZE          = 2
	ERR_INVALID_CODEC       = 3
	ERR_CREATE_COMPRESSOR   = 4
	ERR_CREATE_DECOMPRESSOR = 5
	ERR_OUTPUT_IS_DIR       = 6
	ERR_OVERWRITE_FILE      = 7
	ERR_CREATE_FILE         = 8
	ERR_CREATE_BITSTREAM    = 9
	ERR_OPEN_FILE           = 10
	ERR_READ_FILE           = 11
	ERR_WRITE_FILE          = 12
	ERR_PROCESS_BLOCK       = 13
	ERR_CREATE_CODEC        = 14
	ERR_INVALID_FILE        = 15
	ERR_STREAM_VERSION      = 16
	ERR_CREATE_STREAM       = 17
	ERR_INVALID_PARAM       = 18
	ERR_CRC_CHECK           = 19
	ERR_UNKNOWN             = 127
)

Variables

View Source
var LOG2 = [...]uint32{}/* 256 elements not displayed */

LOG2 is an array with 256 elements: int(Math.log2(x-1))

View Source
var LOG2_4096 = [...]uint32{}/* 257 elements not displayed */

LOG2_4096 is an array with 256 elements: 4096*Math.log2(x)

View Source
var SQUASH [4096]int

SQUASH contains p = 1/(1 + exp(-d)), d scaled by 8 bits, p scaled by 12 bits

View Source
var STRETCH [4096]int

STRETCH is the inverse of squash. d = ln(p/(1-p)), d scaled by 8 bits, p by 12 bits. d has range -2047 to 2047 representing -8 to 8. p in [0..4095].

Functions

func Abs

func Abs(x int32) int32

Abs returns the absolute value of the input without a branch

func ComputeFirstOrderEntropy1024 added in v1.8.0

func ComputeFirstOrderEntropy1024(blockLen int, histo []int) int

ComputeFirstOrderEntropy1024 computes the order 0 entropy of the block and scales the result by 1024 (result in the [0..1024] range) Fills in the histogram with order 0 frequencies. Incoming array size must be at least 256

func ComputeHistogram

func ComputeHistogram(block []byte, freqs []int, isOrder0, withTotal bool)

ComputeHistogram computes the order 0 or order 1 histogram for the input block and returns it in the 'freqs' slice. If withTotal is true, the last spot in each frequencies order 0 array is for the total (each order 0 frequency slice must be of length 257 in this case).

func ComputeJobsPerTask

func ComputeJobsPerTask(jobsPerTask []uint, jobs, tasks uint) []uint

ComputeJobsPerTask computes the number of jobs associated with each task given a number of jobs available and a number of tasks to perform. The provided 'jobsPerTask' slice is returned as result.

func IsPowerOf2

func IsPowerOf2(x int32) bool

IsPowerOf2 returns true if the input value is a power of two

func Log2

func Log2(x uint32) (uint32, error)

Log2 returns a fast, integer rounded value for log2(x)

func Log2NoCheck

func Log2NoCheck(x uint32) uint32

Log2NoCheck does the same as Log2() minus a null check on input value

func Log2_1024

func Log2_1024(x uint32) (uint32, error)

Log2_1024 returns 1024 * log2(x). Max error is around 0.1%

func Lsb

func Lsb(x int32) int32

Lsb returns the least significant bit

func Max

func Max(x, y int32) int32

Max returns the maximum of 2 values without a branch

func Min

func Min(x, y int32) int32

Min returns the minimum of 2 values without a branch

func Msb

func Msb(x int32) int32

Msb returns the most significant bit

func PositiveOrNull

func PositiveOrNull(x int32) int32

PositiveOrNull returns the input value if positive else 0

func ResetLsb

func ResetLsb(x int32) int32

ResetLsb sets the Least significan bit to 0

func RoundUpPowerOfTwo

func RoundUpPowerOfTwo(x int32) int32

RoundUpPowerOfTwo returns the smnallest power of two greater than or equal to the input value

func Squash

func Squash(d int) int

Squash returns p = 1/(1 + exp(-d)), d scaled by 8 bits, p scaled by 12 bits

Types

type ByteTransform

type ByteTransform interface {
	// Forward applies the function to the src and writes the result
	// to the destination. Returns number of bytes read, number of bytes
	// written and possibly an error.
	Forward(src, dst []byte) (uint, uint, error)

	// Inverse applies the reverse function to the src and writes the result
	// to the destination. Returns number of bytes read, number of bytes
	// written and possibly an error.
	Inverse(src, dst []byte) (uint, uint, error)

	// MaxEncodedLen returns the max size required for the encoding output buffer
	MaxEncodedLen(srcLen int) int
}

ByteTransform is a function that transforms the input byte slice and writes the result in the output byte slice. The result may have a different size.

type DataType added in v1.9.0

type DataType int

DataType captures the type of input data

const (
	DT_UNDEFINED  DataType = 0
	DT_TEXT       DataType = 1
	DT_MULTIMEDIA DataType = 2
	DT_X86        DataType = 3
	DT_NUMERIC    DataType = 4
	DT_BASE64     DataType = 5
	DT_DNA        DataType = 6
	DT_BIN        DataType = 7
)

type EntropyDecoder

type EntropyDecoder interface {
	// Read decodes data from the bitstream and return it in the provided buffer.
	// Return the number of bytes read from the bitstream
	Read(block []byte) (int, error)

	// BitStream returns the underlying bitstream
	BitStream() InputBitStream

	// Dispose must be called before getting rid of the entropy decoder
	// Trying to decode after a call to dispose gives undefined behavior
	Dispose()
}

EntropyDecoder entropy decodes data from a bitstream

type EntropyEncoder

type EntropyEncoder interface {
	// Write encodes the data provided into the bitstream. Return the number of bytes
	// written to the bitstream
	Write(block []byte) (int, error)

	// BitStream returns the underlying bitstream
	BitStream() OutputBitStream

	// Dispose must be called before getting rid of the entropy encoder
	// Trying to encode after a call to dispose gives undefined behavior
	Dispose()
}

EntropyEncoder entropy encodes data to a bitstream

type Event

type Event struct {
	// contains filtered or unexported fields
}

Event a compression/decompression event

func NewEvent

func NewEvent(evtType, id int, size int64, hash uint32, hashing bool, evtTime time.Time) *Event

NewEvent creates a new Event instance with size and hash info

func NewEventFromString

func NewEventFromString(evtType, id int, msg string, evtTime time.Time) *Event

NewEventFromString creates a new Event instance that wraps a message

func (*Event) Hash

func (this *Event) Hash() uint32

Hash returns the hash info

func (*Event) Hashing

func (this *Event) Hashing() bool

Hashing returns true if the event contains a hash info

func (*Event) ID

func (this *Event) ID() int

ID returns the id info

func (*Event) Size

func (this *Event) Size() int64

Size returns the size info

func (*Event) String

func (this *Event) String() string

String returns a string representation of this event. If the event wraps a message, the the message is returned. Owtherwise a string is built from the fields.

func (*Event) Time

func (this *Event) Time() time.Time

Time returns the time info

func (*Event) Type

func (this *Event) Type() int

Type returns the type info

type InputBitStream

type InputBitStream interface {
	// ReadBit returns the next bit in the bitstream. Panics if closed or EOS is reached.
	ReadBit() int

	// ReadBits reads 'length' (in [1..64]) bits from the bitstream .
	// Returns the bits read as an uint64.
	// Panics if closed or EOS is reached.
	ReadBits(length uint) uint64

	// ReadArray reads 'length' bits from the bitstream and put them in the byte slice.
	// Returns the number of bits read.
	// Panics if closed or EOS is reached.
	ReadArray(bits []byte, length uint) uint

	// Close makes the bitstream unavailable for further reads.
	Close() (bool, error)

	// Read returns the number of bits read
	Read() uint64

	// HasMoreToRead returns false when the bitstream is closed or the EOS has been reached
	HasMoreToRead() (bool, error)
}

InputBitStream is a bitstream reader

type IntTransform

type IntTransform interface {
	// Forward applies the function to the src and writes the result
	// to the destination. Returns number of bytes read, number of bytes
	// written and possibly an error.
	Forward(src, dst []int) (uint, uint, error)

	// Inverse applies the reverse function to the src and writes the result
	// to the destination. Returns number of bytes read, number of bytes
	// written and possibly an error.
	Inverse(src, dst []int) (uint, uint, error)

	// MaxEncodedLen returns the max size required for the encoding output buffer
	// If the max size of the output buffer is not known, return -1
	MaxEncodedLen(srcLen int) int
}

IntTransform is a function that transforms the input int slice and writes the result in the output int slice. The result may have a different size.

type Listener

type Listener interface {
	ProcessEvent(evt *Event)
}

Listener is an interface implemented by event processors

type OutputBitStream

type OutputBitStream interface {
	// WriteBit writes the least significant bit of the input integer
	// Panics if closed or an IO error is received.
	WriteBit(bit int)

	// WriteBits writes the least significant bits of 'bits' to the bitstream.
	// Length is the number of bits to write (in [1..64]).
	// Returns the number of bits written.
	// Panics if closed or an IO error is received.
	WriteBits(bits uint64, length uint) uint

	// WriteArray writes bits out of the byte slice. Length is the number of bits.
	// Returns the number of bits written.
	// Panics if closed or an IO error is received.
	WriteArray(bits []byte, length uint) uint

	// Close makes the bitstream unavailable for further writes.
	Close() (bool, error)

	// Written returns the number of bits written
	Written() uint64
}

OutputBitStream is a bitstream writer

type Predictor

type Predictor interface {
	// Update updates the internal probability model based on the observed bit
	Update(bit byte)

	// Get returns the value representing the probability of the next bit being 1
	// in the [0..4095] range.
	// E.G. 410 represents roughly a probability of 10% for 1
	Get() int
}

Predictor predicts the probability of the next bit being 1.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL