reedsolomon

package module
v1.1.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2019 License: MIT Imports: 5 Imported by: 12

README

Reed-Solomon

GoDoc MIT licensed Build Status Go Report Card Sourcegraph

Introduction:

  • Erasure Codes(based on Reed-Solomon Codes) engine in pure Go.

  • It's a kind of Systematic Codes, which means the input data is embedded in the encoded output .

  • High Performance: More than 15GB/s per physics core.

  • High Reliability:

  1. At least two companies are using this library in their storage system. (More than dozens PB data)
  2. Full test of galois field calculation and invertible matrices (You can also find the mathematical proof in this repo).
  • Based on Klauspost ReedSolomon & Intel ISA-L with some additional changes/optimizations.

  • It's the backend of XRS (Erasure Codes which can save about 30% I/O in reconstruction process).

Specification

Math
  • Coding over in GF(2^8).

  • Primitive Polynomial: x^8 + x^4 + x^3 + x^2 + 1 (0x1d).

  • Cauchy Matrix is the generator matrix.

    • Any submatrix of encoding matrix is invertible (See the proof here).
  • Galois Field Tool: Generate primitive polynomial and it's log, exponent, multiply and inverse tables etc.

  • Inverse Matrices Tool: Calculate the number of inverse matrices with specific data & parity number.

XP has written an excellent article (Here, in Chinese) about how Erasure Codes works and the math behind it. It's a good start to read it.

Accelerate
  • SIMD: Screaming Fast Galois Field Arithmetic Using Intel SIMD Instructions

  • Reduce memory I/O: Write cache-friendly code. In the process of two matrices multiply, we will have to read data times, and keep the temporary results, then write to memory. If we could put more data into CPU's Cache but not read/write memory again and again, the performance should improve a lot.

  • Cache inverse matrices: It'll save thousands ns, not much, but it's still meaningful for small data.

  • ...

Here (in Chinese) is an article about how to write a fast Erasure Codes engine. (Written by me years ago, need update, but the main ideas still work)

Performance

Performance depends mainly on:

  • CPU instruction extension.

  • Number of data/parity row vectors.

Platform:

AWS c5d.xlarge (Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz)

All test run on a single Core.

Encode:

I/O = (data + parity) * vector_size / cost

Base means no SIMD.

Data Parity Vector size AVX512 I/O (MB/S) AVX2 I/O (MB/S) Base I/O (MB/S)
10 2 4KB 29683.69 21371.43 910.45
10 2 1MB 17664.67 15505.58 917.26
10 2 8MB 10363.05 9323.60 914.62
10 4 4KB 17708.62 12705.35 531.82
10 4 1MB 11970.42 9804.57 536.31
10 4 8MB 7957.9 6941.69 534.82
12 4 4KB 16902.12 12065.14 511.95
12 4 1MB 11478.86 9392.33 514.24
12 4 8MB 7949.81 6760.49 513.06
Reconstruct:

I/O = (data + reconstruct_data_num) * vector_size / cost

Data Parity Vector size Reconstruct Data Num AVX512 I/O (MB/S)
10 4 4KB 1 29830.36
10 4 4KB 2 21649.61
10 4 4KB 3 17088.41
10 4 4KB 4 14567.26
Update:

I/O = (2 + parity_num + parity_num) * vector_size / cost

Data Parity Vector size AVX512 I/O (MB/S)
10 4 4KB 36444.13
Replace:

I/O = (parity_num + parity_num + replace_data_num) * vector_size / cost

Data Parity Vector size Replace Data Num AVX512 I/O (MB/S)
10 4 4KB 1 78464.33
10 4 4KB 2 50068.71
10 4 4KB 3 38808.11
10 4 4KB 4 32457.60
10 4 4KB 5 28679.46
10 4 4KB 6 26151.85

PS:

And we must know the benchmark test is quite different with encoding/decoding in practice. Because in benchmark test loops, the CPU Cache may help a lot.

  • Klauspost ReedSolomon: It's the most commonly used Erasure Codes library in Go. Impressive performance, friendly API, and it can support multi platforms(with fast Galois Field Arithmetic). Inspired me a lot.

  • Intel ISA-L: The ideas of Cauchy matrix and saving memory I/O are from it.

Documentation

Overview

Warn: Don't modify it. The galois field tables data is generated by mathtool/gentbls.go

Package reedsolomon implements Erasure Codes (systematic codes), it's based on: Reed-Solomon Codes over GF(2^8). Primitive Polynomial: x^8+x^4+x^3+x^2+1.

Galois Filed arithmetic using Intel SIMD instructions (AVX512 or AVX2).

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrMismatchVects    = errors.New("too few/many vects given")
	ErrZeroVectSize     = errors.New("vect size is 0")
	ErrMismatchVectSize = errors.New("vects size mismatched")
)
View Source
var (
	ErrNoNeedReconst   = errors.New("no need reconst")
	ErrTooManyLost     = errors.New("too many lost")
	ErrHasLostConflict = errors.New("dpHas&lost are conflicting")
)
View Source
var (
	ErrMismatchParityNum = errors.New("parity number mismatched")
	ErrIllegalVectIndex  = errors.New("illegal vect index")
)
View Source
var (
	ErrTooManyReplace  = errors.New("too many data for replacing")
	ErrMismatchReplace = errors.New("number of replaceRows and data mismatch")
)
View Source
var EnableAVX512 = true

EnableAVX512 may slow down CPU Clock (maybe not). TODO need more research: https://lemire.me/blog/2018/04/19/by-how-much-does-avx-512-slow-down-your-cpu-a-first-experiment/

You can modify it before new RS.

View Source
var ErrIllegalVects = errors.New("illegal data/parity number: <= 0 or data+parity > 256")
View Source
var ErrNotSquare = errors.New("not a square matrix")
View Source
var ErrSingularMatrix = errors.New("matrix is singular")

Functions

func SplitNeedReconst

func SplitNeedReconst(dataCnt int, needReconst []int) (dataNeed, parityNeed []int)

SplitNeedReconst splits data lost & parity lost.

Types

type RS

type RS struct {
	DataNum   int // DataNum is the number of data row vectors.
	ParityNum int // ParityNum is the number of parity row vectors.

	GenMatrix matrix // Generator matrix.
	// contains filtered or unexported fields
}

RS Reed-Solomon Codes receiver.

func New

func New(dataNum, parityNum int) (r *RS, err error)

New create an RS with specific data & parity numbers.

func (*RS) Encode

func (r *RS) Encode(vects [][]byte) (err error)

Encode encodes data for generating parity. It multiplies generator matrix by vects[:r.DataNum] to get parity vectors, and write into vects[r.DataNum:].

func (*RS) Reconst

func (r *RS) Reconst(vects [][]byte, dpHas, needReconst []int) (err error)

Reconst reconstructs missing vectors, vects: All vectors, len(vects) = dataNum + parityNum. dpHas: Survived data & parity index, need dataNum indexes at least. needReconst: Vectors indexes which need to be reconstructed.

e.g: in 3+2, the whole index: [0,1,2,3,4], if vects[0,4] are lost & they need to be reconstructed (Maybe you only need vects[0], so the needReconst should be [0], but not [0,4]). the "dpHas" will be [1,2,3] ,and you must be sure that vects[1] vects[2] vects[3] have correct data, results will be written into vects[0]&vects[4] directly.

func (*RS) Replace

func (r *RS) Replace(data [][]byte, replaceRows []int, parity [][]byte) (err error)

Replace replaces oldData vectors with 0 or replaces 0 with newData vectors.

In practice, If len(replaceRows) > dataNum-parityNum, it's better to use Encode, because Replace need to read len(replaceRows) + parityNum vectors, if replaceRows are too many, the cost maybe larger than Encode (Encode only need read dataNum). Think about an EC compute node, and dataNum+parityNum data nodes model.

It's used in two situations: 1. We didn't have enough data for filling in a stripe, but still did ec encode, we need replace several zero vectors with new vectors which have data after we get enough data finally. 2. After compact, we may have several useless vectors in a stripe, we need replaces these useless vectors with zero vectors for free space.

Warn: data's index & replaceRows must has the same sort.

func (*RS) Update

func (r *RS) Update(oldData []byte, newData []byte, row int, parity [][]byte) (err error)

Update updates parity_data when one data_vect changes. row: It's the new data's index in the whole vectors.

Directories

Path Synopsis
This tools will calculate the number of inverse matrices with specific data & parity number.
This tools will calculate the number of inverse matrices with specific data & parity number.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL