ten

package module
v0.0.0-...-f8d227e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 19, 2020 License: MIT Imports: 5 Imported by: 0

README

ten

Efficient binary encoding for tensors. The format is supported by WebDataset with the .ten filename extension.

The reference implemention developed in Python by tmbdev at Nvidia can be found here

Installation

go get github.com/AlexanderEkdahl/ten

This package uses no external dependencies outside of the Go standard library.

Usage

// Writes encoded tensor to w
e := NewEncoder(w)
e.Encode([]float32{1, 2, 3, 4, 5, 6, 7, 8, 9}, []int{3, 3}, "policy")

// Reads encoded tensor from r
d := NewDecoder(r)
tensorData, shape, info, err := d.Decode()

float16 is not supported due to missing support in Go (#32022).

Benchmarks

$ go test -bench .
goos: linux
goarch: amd64
pkg: github.com/AlexanderEkdahl/ten
BenchmarkDecoder/100-16         	  531340	      2107 ns/op	 189.88 MB/s
BenchmarkDecoder/500-16         	  224413	      5368 ns/op	 372.55 MB/s
BenchmarkDecoder/1000-16        	  120788	      9430 ns/op	 424.20 MB/s
BenchmarkDecoder/10000-16       	   15444	     78619 ns/op	 508.79 MB/s
BenchmarkEncoder/100-16         	  920988	      1231 ns/op	 324.97 MB/s
BenchmarkEncoder/500-16         	  324014	      3880 ns/op	 515.49 MB/s
BenchmarkEncoder/1000-16        	  159812	      7117 ns/op	 562.01 MB/s
BenchmarkEncoder/10000-16       	   18616	     64677 ns/op	 618.46 MB/s

Documentation

Overview

Package ten provides efficient binary encoding for tensors. The format is 8 byte aligned and can be used directly for computations when transmitted, say, via RDMA. The format is supported by WebDataset with the `.ten` filename extension. It is also used by Tensorcom, Tensorcom RDMA, and can be used for fast tensor storage with LMDB and in disk files (which can be memory mapped).

Data is encoded as a series of chunks:

  • magic number (int64)
  • length in bytes (int64)
  • bytes (multiple of 64 bytes long)

Arrays are a header chunk followed by a data chunk. Header chunks have the following structure:

  • dtype (int64)
  • 8 byte array name
  • ndim (int64)
  • dim[0]
  • dim[1]
  • ...

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrMagicNumberMismatch     = fmt.Errorf("magic number mismatch")
	ErrNegativeLength          = fmt.Errorf("negative length")
	ErrNegativeDimensions      = fmt.Errorf("negative dimensions")
	ErrDecodingUnsupportedType = fmt.Errorf("unsupported data type")
)

Decoding errors

View Source
var (
	ErrTooManyDimensions = fmt.Errorf("too many dimensions")
	ErrInfoTooLong       = fmt.Errorf("info can not exceed 8 bytes")
)

Encoding errors

View Source
var MagicNumber = []byte{0x7e, 0x54, 0x65, 0x6e, 0x42, 0x69, 0x6e, 0x7e}

MagicNumber is the magic number before every chunk.

Functions

This section is empty.

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

A Decoder reads and decodes tensor data from an input stream.

func NewDecoder

func NewDecoder(r io.Reader) *Decoder

NewDecoder returns a new decoder that reads from r.

func (*Decoder) Decode

func (d *Decoder) Decode() (tensorData interface{}, shape []int, info string, err error)

Decode reads the next ten-encoded tensor from its input.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

An Encoder writes tensors to an output stream.

func NewEncoder

func NewEncoder(w io.Writer) *Encoder

NewEncoder returns a new encoder that writes to w.

func (*Encoder) Encode

func (e *Encoder) Encode(tensorData interface{}, shape []int, info string) error

Encode writes the tensor encoding of t to the stream along with a custom info header.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL