erasure

package module

v0.0.0-...-0747d90 Latest Latest Go to latest Published: Oct 18, 2019 License: MIT Imports: 2 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/somethingnew2-0/go-erasure

Links

Open Source Insights

README ¶

go-erasure

Disclaimer: I recommend the klauspost/reedsolomon erasure coding library over this one as it is more performant and has better support for multiple architectures.

Go bindings for erasure coding (Reed-Solomon coding).

Erasure coding is similar to RAID based parity encoding, but is more generalized and powerful. When defining an erasure code, you specify a k and m variable. m is the number of shards you wish to encode and k is the number shards it takes to recreate your original data. Hence k must be less than m and usually not equal (as that would be a pointless encoding). The real magic with erasure coding is that fact that ANY k of the m shards can recreate the original data. For example, a erasure coding scheme of k=8 and m=12 means any four of the encoded shards can be lost while the original data can still be constructed from the valid remaining eight shards.

This library is aimed at simplicity and performance. It only has three methods including a constructor which are all thread-safe! Internally it uses Cgo to utilize a complex C library. For a more in-depth look into this library be sure to check out the Intel® Storage Acceleration Library and especially their corresponding video. One feature it does add is an optimization for decoding. Since there are m choose k possible inverse matrices for decoding, this library caches them (via lazy-loading) so as reduce the amount of time decoding. It does so by utilizing a trie where the sorted error list of shards is the key to the trie and the corresponding decode matrix is the value.

I hope you find it useful and pull requests are welcome!

Usage

See the GoDoc for an API reference

Encode and decode random data

package main

import (
  "bytes"
  "log"
  "math/rand"
  
  "github.com/somethingnew2-0/go-erasure"
)

func corrupt(source, errList []byte, shardLength int) []byte {
	corrupted := make([]byte, len(source))
	copy(corrupted, source)
	for _, err := range errList {
		for i := 0; i < shardLength; i++ {
			corrupted[int(err)*shardLength+i] = 0x00
		}
	}
	return corrupted
}

func main() {
	m := 12
	k := 8
	shardLength := 16 // Length of a shard
	size := k * shardLength // Length of the data blob to encode

	code := erasure.NewCode(m, k, size)

	source := make([]byte, size)
	for i := range source {
		source[i] = byte(rand.Int63() & 0xff) //0x62
	}

	encoded := code.Encode(source)

	errList := []byte{0, 2, 3, 4}

	corrupted := corrupt(append(source, encoded...), errList, shardLength)

	recovered := code.Decode(corrupted, errList, true)

	if !bytes.Equal(source, recovered) {
		log.Fatal("Source was not sucessfully recovered with 4 errors")
	}
}

Development

To start run source dev.sh or more simply . dev.sh to setup the git hooks and GOPATH for this project.

Run go test or go test -bench . to test the unit tests and benchmark tests.

Documentation ¶

Overview ¶

Erasure coding is similar to RAID based parity encoding, but is more generalized and powerful. When defining an erasure code, you specify a K and M variable. M is the number of shards you wish to encode and K is the number shards it takes to recreate your original data. Hence K must be less than M and usually not equal (as that would be a pointless encoding). The real magic with erasure coding is that fact that ANY K of the M shards can recreate the original data. For example, a erasure coding scheme of K=8 and M=12 means any four of the encoded shards can be lost while the original data can still be constructed from the valid remaining eight shards.

Index ¶

type Code
- func NewCode(m int, k int, size int) *Code
- func (c *Code) Decode(encoded []byte, errList []byte, cache bool) (recovered []byte)
- func (c *Code) Encode(data []byte) (encoded []byte)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

This section is empty.

Types ¶

type Code ¶

type Code struct {
	M            int
	K            int
	ShardLength  int
	EncodeMatrix []byte
	// contains filtered or unexported fields
}

Manages state of the erasure coding scheme and its values should be considered read-only.

func NewCode ¶

func NewCode(m int, k int, size int) *Code

Constructor for creating a new erasure coding scheme. M is the total number of shards output by the encoding. K is the number of shards that can recreate any data that was encoded. Size is the size of the byte array to encode. It should be divisible by K as each shard will be Size / K in length. The maximum value for K and M is 127.

func (*Code) Decode ¶

func (c *Code) Decode(encoded []byte, errList []byte, cache bool) (recovered []byte)

Data buffer to decode must be of the (M/K)*Size given in the constructor. The error list must contain M-K values, corresponding to the shards with errors (eg. [0, 2, 4, 6]). Cache stores the decode matrices in a trie, enabling a faster decode with a memory tradeoff. The returned decoded data is the orignal data of length Size

func (*Code) Encode ¶

func (c *Code) Encode(data []byte) (encoded []byte)

The data buffer to encode must be of the length Size given in the constructor. The returned encoded buffer is (M-K)*Shard length, since the first Size bytes of the encoded data is just the original data due to the identity matrix.

Source Files ¶

View all Source files

erasure.go

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL