solidblock

package module
v0.0.0-...-45df20a Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 26, 2019 License: MIT Imports: 6 Imported by: 4

README

Solidblock

Solidblock is a Go library providing io.Readers for solid compression and codec binding/chaining.

Solid Compression Reader

Wrapped around a compressed solid block of concatenated files, it provides sequential access to the files:

// file contents
files := [][]byte{
    []byte("file 1\n"),
    []byte("file 2\n"),
}

// file metadata
var metadata struct {
    sizes []uint64
    crcs  []uint32
}
metadata.sizes = []uint64{
    uint64(len(files[0])),
    uint64(len(files[1])),
}
metadata.crcs = []uint32{
    crc32.ChecksumIEEE(files[0]),
    crc32.ChecksumIEEE(files[1]),
}

// Concatenate files to compressed block
block := new(bytes.Buffer)
w := gzip.NewWriter(block)
w.Write(files[0])
w.Write(files[1])
w.Close()

// Open gzip reader to compressed block
r, err := gzip.NewReader(block)
if err != nil {
    panic(err)
}

// Create a new solidblock reader
s := solidblock.New(r, metadata.sizes, metadata.crcs)

for {
    err := s.Next()
    if err == io.EOF {
        break
    }
    if err != nil {
        panic(err)
    }

    io.Copy(os.Stdout, s)
}

Codec Binding

To improve compression, some codecs (such as BCJ2), split data up into multiple streams that compress better individually. solidblock.Binder provides a simple way to pair together the inputs and outputs of various codecs/readers.

For example:

func BCJ2Decoder(inputs []io.Reader) ([]io.Reader, error) {
    // 1. take 4 input readers
    // 2. do magic
    // 3. return 1 reader
}

func GzipDecoder(inputs []io.Reader) ([]io.Reader, error) {
    if len(inputs) != 1 {
        panic("unsupported input configuration")
    }
    r, err := gzip.NewReader(inputs[0])
    return []io.Reader{r}, nil
}

file, err := os.Open("file")
if err != nil {
    panic(err)
}

// Assume file has 4 concatenated streams. 3 of the streams are from a BCJ2 
// encoder, compressed to gzip streams. 1 is the 4th stream of the BCJ2 encoder,
// but left uncompressed.
streams := make([]io.Reader, 4)
streams[0] = io.NewSectionReader(file, 0, 100)
streams[1] = io.NewSectionReader(file, 101, 200)
streams[2] = io.NewSectionReader(file, 201, 300)
streams[3] = io.NewSectionReader(file, 301, 400)

// Create a new binder
binder := solidblock.NewBinder()

// Create gzip decompressors for the 4 initial input streams.
gzip0InputIDs, gzip0OutputIDs := binder.AddCodec(GzipDecoder, 1, 1)
gzip1InputIDs, gzip1OutputIDs := binder.AddCodec(GzipDecoder, 1, 1)
gzip2InputIDs, gzip2OutputIDs := binder.AddCodec(GzipDecoder, 1, 1)

// Create BCJ2 decoder for the 4 gzip decoded streams.
bcj2InputIDs, bcj2outputIDs := binder.AddCodec(BCJ2Decoder, 4, 1)

// Connect initial streams to gzip decoders
binder.Reader(streams[0], gzip0InputIDs[0])
binder.Reader(streams[1], gzip1InputIDs[0])
binder.Reader(streams[2], gzip2InputIDs[0])

// Connect 4th initial stream straight to 4th input of BCJ2 decoder.
binder.Reader(streams[3], bcj2InputIDs[3])

// Pair the 3 gzip output streams to the 1st, 2nd, 3rd input of BCJ2 decoder.
binder.Pair(gzip0OutputIDs[0], bcj2InputIDs[0])
binder.Pair(gzip1OutputIDs[0], bcj2InputIDs[1])
binder.Pair(gzip2OutputIDs[0], bcj2InputIDs[2])

// Create single output to read from
outputs, err := binder.Outputs()
if err != nil {
    panic(err)
}
if len(outputs) != 1 {
    panic("output should only contain one stream")
}

io.Copy(os.Stdout, outputs[0])

A picture says 60 lines of code...

                                        +------------+
   concatenated file                    |bcj2 decoder+--->io.Reader
+--------------------+                  +-+--+--+--+-+
|                    |                    ^  ^  ^  ^
|  +--------------+  |   +------------+   |  |  |  |
|  |gzipped stream+------>gzip decoder+---+  |  |  |
|  +--------------+  |   +------------+      |  |  |
|                    |                       |  |  |
|  +--------------+  |   +------------+      |  |  |
|  |gzipped stream+------>gzip decoder+------+  |  |
|  +--------------+  |   +------------+         |  |
|                    |                          |  |
|  +--------------+  |   +------------+         |  |
|  |gzipped stream+------>gzip decoder+---------+  |
|  +--------------+  |   +------------+            |
|                    |                             |
|  +--------------+  |                             |
|  | uncompressed +--------------------------------+
|  |    stream    |  |
|  +--------------+  |
|                    |
+--------------------+

Documentation

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// ErrInputIsUnbound is returned when an input hasn't been binded to either
	// a reader/paired without an output.
	ErrInputIsUnbound = errors.New("input is unbound")

	// ErrUnexpectedOutputCount is returned when the amount of io.Readers
	// returned from a codec handler doesn't match the amount specified when
	// adding the codec.
	ErrUnexpectedOutputCount = errors.New("unexpected output count")
)
View Source
var (
	// ErrChecksumMismatch is returned when a file's crc check fails.
	ErrChecksumMismatch = errors.New("checksum mismatch")
)

Functions

This section is empty.

Types

type Binder

type Binder struct {
	// contains filtered or unexported fields
}

Binder holds information regarding codecs, their inputs/outputs and how they join together.

func NewBinder

func NewBinder() *Binder

NewBinder returns a new binder.

func (*Binder) AddCodec

func (b *Binder) AddCodec(fn func([]io.Reader) ([]io.Reader, error), inputs, outputs int) (in, out []int)

AddCodec adds a handler function for processing information from input(s) and producing output(s).

func (*Binder) Outputs

func (b *Binder) Outputs() ([]io.Reader, error)

Outputs returns any unbound output readers to ready from.

func (*Binder) Pair

func (b *Binder) Pair(in int, out int)

Pair pairs two streams, binding an in stream to an out stream.

func (*Binder) Reader

func (b *Binder) Reader(r io.Reader, in int)

Reader binds a reader to an in stream.

type Solidblock

type Solidblock struct {
	// contains filtered or unexported fields
}

Solidblock provides sequential access to files that have been concatenated into a single compressed data block.

Example
package main

import (
	"bytes"
	"compress/gzip"
	"hash/crc32"
	"io"
	"os"

	"github.com/saracen/solidblock"
)

func main() {
	// file contents
	files := [][]byte{
		[]byte("file 1\n"),
		[]byte("file 2\n"),
	}

	// file metadata
	var metadata struct {
		sizes []uint64
		crcs  []uint32
	}
	metadata.sizes = []uint64{
		uint64(len(files[0])),
		uint64(len(files[1])),
	}
	metadata.crcs = []uint32{
		crc32.ChecksumIEEE(files[0]),
		crc32.ChecksumIEEE(files[1]),
	}

	// Concatenate files to compressed block
	block := new(bytes.Buffer)
	w := gzip.NewWriter(block)
	w.Write(files[0])
	w.Write(files[1])
	w.Close()

	// Open gzip reader to compressed block
	r, err := gzip.NewReader(block)
	if err != nil {
		panic(err)
	}

	// Create a new solidblock reader
	s := solidblock.New(r, metadata.sizes, metadata.crcs)

	for {
		err := s.Next()
		if err == io.EOF {
			break
		}
		if err != nil {
			panic(err)
		}

		io.Copy(os.Stdout, s)
	}

}
Output:

file 1
file 2

func New

func New(r io.Reader, sizes []uint64, crcs []uint32) *Solidblock

New returns a new solidblock reader.

func (*Solidblock) Next

func (fr *Solidblock) Next() error

Next advances to the next file entry in solid block.

Calling Next without reading the current file is supported. Only when Read is called will decompression occur for current file. Any skipped files will still need to be decompressed, but their contents is discarded.

io.EOF is returned at the end of the input.

func (*Solidblock) Read

func (fr *Solidblock) Read(p []byte) (int, error)

Read reads from the current file in solid block. It returns (0, io.EOF) when it reaches the end of that file, until Next is called to advance to the next file.

func (*Solidblock) Size

func (fr *Solidblock) Size() int64

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL