erasure

package module
v0.0.0-...-0747d90 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 18, 2019 License: MIT Imports: 2 Imported by: 0

README

go-erasure Build Status Coverage Status

Disclaimer: I recommend the klauspost/reedsolomon erasure coding library over this one as it is more performant and has better support for multiple architectures.

Go bindings for erasure coding (Reed-Solomon coding).

Erasure coding is similar to RAID based parity encoding, but is more generalized and powerful. When defining an erasure code, you specify a k and m variable. m is the number of shards you wish to encode and k is the number shards it takes to recreate your original data. Hence k must be less than m and usually not equal (as that would be a pointless encoding). The real magic with erasure coding is that fact that ANY k of the m shards can recreate the original data. For example, a erasure coding scheme of k=8 and m=12 means any four of the encoded shards can be lost while the original data can still be constructed from the valid remaining eight shards.

This library is aimed at simplicity and performance. It only has three methods including a constructor which are all thread-safe! Internally it uses Cgo to utilize a complex C library. For a more in-depth look into this library be sure to check out the Intel® Storage Acceleration Library and especially their corresponding video. One feature it does add is an optimization for decoding. Since there are m choose k possible inverse matrices for decoding, this library caches them (via lazy-loading) so as reduce the amount of time decoding. It does so by utilizing a trie where the sorted error list of shards is the key to the trie and the corresponding decode matrix is the value.

I hope you find it useful and pull requests are welcome!

Usage

See the GoDoc for an API reference

Encode and decode random data
package main

import (
  "bytes"
  "log"
  "math/rand"
  
  "github.com/somethingnew2-0/go-erasure"
)

func corrupt(source, errList []byte, shardLength int) []byte {
	corrupted := make([]byte, len(source))
	copy(corrupted, source)
	for _, err := range errList {
		for i := 0; i < shardLength; i++ {
			corrupted[int(err)*shardLength+i] = 0x00
		}
	}
	return corrupted
}

func main() {
	m := 12
	k := 8
	shardLength := 16 // Length of a shard
	size := k * shardLength // Length of the data blob to encode

	code := erasure.NewCode(m, k, size)

	source := make([]byte, size)
	for i := range source {
		source[i] = byte(rand.Int63() & 0xff) //0x62
	}

	encoded := code.Encode(source)

	errList := []byte{0, 2, 3, 4}

	corrupted := corrupt(append(source, encoded...), errList, shardLength)

	recovered := code.Decode(corrupted, errList, true)

	if !bytes.Equal(source, recovered) {
		log.Fatal("Source was not sucessfully recovered with 4 errors")
	}
}

Development

To start run source dev.sh or more simply . dev.sh to setup the git hooks and GOPATH for this project.

Run go test or go test -bench . to test the unit tests and benchmark tests.

Documentation

Overview

Erasure coding is similar to RAID based parity encoding, but is more generalized and powerful. When defining an erasure code, you specify a K and M variable. M is the number of shards you wish to encode and K is the number shards it takes to recreate your original data. Hence K must be less than M and usually not equal (as that would be a pointless encoding). The real magic with erasure coding is that fact that ANY K of the M shards can recreate the original data. For example, a erasure coding scheme of K=8 and M=12 means any four of the encoded shards can be lost while the original data can still be constructed from the valid remaining eight shards.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

This section is empty.

Types

type Code

type Code struct {
	M            int
	K            int
	ShardLength  int
	EncodeMatrix []byte
	// contains filtered or unexported fields
}

Manages state of the erasure coding scheme and its values should be considered read-only.

func NewCode

func NewCode(m int, k int, size int) *Code

Constructor for creating a new erasure coding scheme. M is the total number of shards output by the encoding. K is the number of shards that can recreate any data that was encoded. Size is the size of the byte array to encode. It should be divisible by K as each shard will be Size / K in length. The maximum value for K and M is 127.

func (*Code) Decode

func (c *Code) Decode(encoded []byte, errList []byte, cache bool) (recovered []byte)

Data buffer to decode must be of the (M/K)*Size given in the constructor. The error list must contain M-K values, corresponding to the shards with errors (eg. [0, 2, 4, 6]). Cache stores the decode matrices in a trie, enabling a faster decode with a memory tradeoff. The returned decoded data is the orignal data of length Size

func (*Code) Encode

func (c *Code) Encode(data []byte) (encoded []byte)

The data buffer to encode must be of the length Size given in the constructor. The returned encoded buffer is (M-K)*Shard length, since the first Size bytes of the encoded data is just the original data due to the identity matrix.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL