bin

package module
v0.4.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 3, 2023 License: MIT Imports: 9 Imported by: 1

README

Binmap

I've found that using the stdlib binary interface to read and write data is a little cumbersome and tedious, since any operation can result in an error. While this makes sense given the problem domain, the API leaves something to be desired.

I'd love to have a way to batch operations, so I don't have so much if err != nil. If an error occurs at any point, then I'm able to fail fast and handle one error at the end.

I'd also like to work easily with io.Readers rather than having to read everything into memory first to dissect it piecemeal. While this can be accomplished with binary.Read, I still have the issue of too much error handling code cluttered around the code I want to write.

Why not just use Gob?

Go provides a binary encoding/decoding protocol for Go types with Gob. This is useful for Go-only environments, but it's a bit cumbersome to customize for cases that fall outside the "Gob way" of serializing/deserializing structs.

I needed a library that allows me to more exactly and compactly specify a binary mapping protocol with a nice to use API, and I didn't see anything that met this need directly in the standard library (see existing API notes above).

If you're only using Go, don't need to conform to some external protocol or define your own, and don't need to do too much format customization, then Gob may be the right choice for you.

There are other, standardized formats available that may be a better fit, depending on your use-case and constraints:

Goals

  • I'd like to have an easier to use interface for reading/writing binary data.
  • I'd like to declare binary IO operations, execute them, and handle a single error at the end.
  • I'd like to be able to reuse binary IO operations, and even pass them into more complex pipelines.
  • I'd like to be able to declare dynamic behavior, like when the size of the next read is determined by the current field.
  • I'd like to declare a read loop based on a read field value, and pass the loop construct to a larger pipeline.
  • Struct tag field binding would be fantastic, but reflection is... fraught. I'll see how this goes, and I'll probably take some hints from how the stdlib is handling this.
    • There's too much possibility of dynamic or dependent logic with a lot of binary payloads, and the number of edge cases for implementing this is more than I want to deal with.
    • I'm pretty happy with the API for mapping definition so far, and I'd rather simplify that than get into reflection with struct field tags. I feel like it's much more understandable (and thus maintainable) code.

How it works

This package centers around the Mapper interface. A mapper is anything that knows how to read and write binary data, and is easily adaptable to new data types with custom logic with the Any mapper.

Any given Mapper is expected to be short-lived, especially if the underlying data representation in Go code is changing often. This mechanism makes heavy use of pointers, and even pointer-pointers in some cases, which means that there's a fair bit of indirection to make this work. There are also a lot of generics in this code to both limit input types to what is readily supported, and to keep duplication to a minimum.

Note that using different mapper procedures between software versions is effectively a breaking change, and should be handled like you would handle database migrations. There are certain patterns that make this easier to work with, explained below.

Directly supported types

There are several primitive types that are directly supported. Keep in mind that type restrictions mostly come from what binary.Read and binary.Write support, and this package also inherits the design constraints of simplicity over speed mentioned in the encoding/binary docs.

  • Integers with Int.
    • Note that int and uint are not supported because these are not necessarily of a known binary size at compile time.
  • Floats with Float.
  • Booleans with Bool.
  • Bytes with Byte, and byte slices with FixedBytes and LenBytes.
  • Complex 64/128 with Complex.
  • Signed and unsigned varints with Varint/Uvarint.
  • General slice mappers are provided with Slice, LenSlice, and DynamicSlice.
  • Size types with Size, which are restricted to any known-size, unsigned integer.
  • Strings, both with FixedString for fixed-width string fields, and null-terminated strings with NullTermString.
    • Plain strings are always encoded as UTF-8 strings.
    • There are UTF-16 variants of these mappers that have the "Uni16" prefix.
    • In the case where you're reading/writing win32 UTF-16 strings - which are consistently encoded little-endian - and that conflicts with your endianness policy, there is an OverrideEndian function to express this policy change with a single mapper.
  • More interesting types, such as Map for arbitrary maps, and even DataTable for persisting structs-of-arrays.
  • As already mentioned, the Any mapper can be used to add arbitrary mapping logic for any type you'd like to express.
    • An Any mapper just needs a ReadFunc and WriteFunc.
    • This mapper function doesn't require a target because it's intended to be flexible, and the assumption is that a target would be available in a closure context.

Common patterns

Binary serialization can get pretty complicated, depending on the data structures involved. Fortunately, there are some commonly used patterns, library features, and guidelines that help manage this complexity.

  • There are few assumptions made about or constraints applied to your data representation, but all persisted data must either be of a fixed size when persisted, or include an unambiguous delimiter (like a null terminator for a string).
    • This means that you are charged with managing things like binary format migrations and validation (see the versioned mapping and validated read sections below).
  • Any given Mapper is not intended to live very long in memory. It's generally a single-use construct.
  • Mapping is not concurrency safe by default. This library makes no attempt to "lock/unlock" an object in any way before, during, or after (de)serialization, unless your mapper is wrapped with the Lock or RWLock helpers.
  • Panics that happen within a Mapper's Read or Write methods will be propagated to the caller, unless it's wrapped with an OnPanic helper.

See the example directory for more details.

Mapper method

Expressing a mapper method that creates a consistent Mapper for your data in a struct, and then using that to expose read and write methods seems to work well in practice.

import (
	"encoding/binary"
	bin "github.com/saylorsolutions/binmap"
	"io"
)

type User struct {
	username string
}

func (u *User) mapper() bin.Mapper {
	return bin.NullTermString(&u.username)
}

func (u *User) Read(r io.Reader) error {
	return u.mapper().Read(r, binary.BigEndian)
}

func (u *User) Write(w io.Writer) error {
	return u.mapper().Write(w, binary.BigEndian)
}
Mapper Sequence

The previous pattern can be extended to map more fields with MapSequence. This provides a tremendous level of flexibility, since the result of MapSequence is itself a Mapper.

import (
	"encoding/binary"
	bin "github.com/saylorsolutions/binmap"
	"io"
)

type User struct {
	id           uint64
	username     string
	passwordHash []byte
}

func (u *User) mapper() bin.Mapper {
	return bin.MapSequence(
		bin.Int(&u.id),
		bin.NullTermString(&u.username),
		bin.DynamicSlice(&u.passwordHash, bin.Byte),
	)
}

func (u *User) Read(r io.Reader) error {
	return u.mapper().Read(r, binary.BigEndian)
}

func (u *User) Write(w io.Writer) error {
	return u.mapper().Write(w, binary.BigEndian)
}
Mapper of Mappers

Once the previous patterns have been established, extensions may be made for additional types within your data. Types included in your top-level structure can themselves have a mapper method that specifies how they are binary mapped.

Note: That the use of LenSlice is an arbitrary choice, and not a requirement of embedding slices of types in other types. It's generally preferred to use DynamicSlice unless you're encoding the length of a slice as a field in your struct, or you always know the length of a slice ahead of time.

package main

import (
	"encoding/binary"
	bin "github.com/saylorsolutions/binmap"
	"io"
)

type Contact struct {
	email          string
	allowMarketing bool
}

func (c *Contact) mapper() bin.Mapper {
	return bin.MapSequence(
		bin.FixedString(&c.email, 128),
		bin.Bool(&c.allowMarketing),
	)
}

type User struct {
	id           uint64
	username     string
	passwordHash []byte
	numContacts  uint16
	contacts     []Contact
}

func (u *User) mapper() bin.Mapper {
	return bin.MapSequence(
		bin.Int(&u.id),
		bin.NullTermString(&u.username),
		bin.DynamicSlice(&u.passwordHash, bin.Byte),
		bin.LenSlice(&u.contacts, &u.numContacts, func(c *Contact) bin.Mapper {
			return c.mapper()
		}),
	)
}

func (u *User) Read(r io.Reader) error {
	return u.mapper().Read(r, binary.BigEndian)
}

func (u *User) Write(w io.Writer) error {
	return u.mapper().Write(w, binary.BigEndian)
}

This makes reading a struct from a binary source incredibly trivial, with a single error to handle regardless of the mapping logic expressed.

func ReadUser(r io.Reader) (*User, error) {
	u := new(User)
	if err := u.Read(r); err != nil {
		return nil, err
	}
	return u, nil
}
ValidateRead and NormalizeWrite

Input validation is important, especially in cases where changes in persisted data could lead to changes to a struct's internal, unexposed state. This can easily be added in the Read and Write methods added above, or with ValidateRead and NormalizeWrite, to ensure that business rule constraints are encoded as part of the persistence logic.

var ErrNoContact = errors.New("all users must have a contact")

mapper = bin.ValidateRead(mapper, func() error {
	if len(u.contacts) == 0 {
		return ErrNoContact
	}
})
mapper = bin.NormalizeWrite(mapper, func() error {
	if len(u.contacts) == 0 {
		return ErrNoContact
	}
})

For more complex logic, it may be preferrable to use EventHandler.

EventHandler

Additional, custom logic can be added as part of mapping with NewEventHandler.

Note: An After* handler will be run regardless, but an error returned from an After* handler will only be propagated if the underlying read/write operation returns a nil error.

mapper = bin.NewEventHandler(mapper, bin.EventHandler{
	BeforeRead: func() error {
		log.Println("About to read a thing...")
	}
	AfterRead: func(err error) error {
		if err != nil {
			log.Println("Uh-oh, failed to write:", err)
		} else {
			log.Println("Successfully read a thing")
		}
		return err
	}
})
Versioned mapping

A binary representation of state can be stored permanently, so it's important to consider versioned mapping if the binary representation is expected to change (often or not), since that change is effectively a breaking change.

This can be handled pretty easily with a little forethought.

import (
	"encoding/binary"
	"errors"
	bin "github.com/saylorsolutions/binmap"
	"io"
)

type version = byte

const (
	v1 version = iota + 1
	v2
)

type User struct {
	username string
}

func (u *User) mapperV1() bin.Mapper {
	return bin.NullTermString(&u.username)
}

func (u *User) mapperV2() bin.Mapper {
	return bin.FixedString(&u.username, 32)
}

func (u *User) mapper() bin.Mapper {
	return bin.Any(
		func(r io.Reader, endian binary.ByteOrder) error {
			var v version
			if err := bin.Byte(&v).Read(r, endian); err != nil {
				return err
			}
			switch v {
			case v1:
				return u.mapperV1().Read(r, endian)
			case v2:
				return u.mapperV2().Read(r, endian)
			default:
				return errors.New("unknown version")
			}
		},
		func(w io.Writer, endian binary.ByteOrder) error {
			var v = v2
			return bin.MapSequence(
				bin.Byte(&v),
				u.mapperV2(),
			).Write(w, endian)
		},
	)
}

func (u *User) Read(r io.Reader) error {
	return u.mapper().Read(r, binary.BigEndian)
}

func (u *User) Write(w io.Writer) error {
	return u.mapper().Write(w, binary.BigEndian)
}

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrNilReadWrite = errors.New("nil read source or write target")
	ErrPanic        = errors.New("panic during Read or Write")
)
View Source
var (
	ErrUnbalancedTable = errors.New("unbalanced data table")
)

Functions

This section is empty.

Types

type AfterReadHandler added in v0.4.0

type AfterReadHandler = func(err error) error

type AfterWriteHandler added in v0.4.0

type AfterWriteHandler = func(err error) error

type AnyComplex added in v0.3.0

type AnyComplex interface {
	complex64 | complex128
}

type AnyFloat

type AnyFloat interface {
	float32 | float64
}

type AnyInt

type AnyInt interface {
	int8 | int16 | int32 | int64 | uint8 | uint16 | uint32 | uint64
}

type BeforeReadHandler added in v0.4.0

type BeforeReadHandler = func() error

type BeforeWriteHandler added in v0.4.0

type BeforeWriteHandler = func() error

type EventHandler added in v0.4.0

type EventHandler struct {
	BeforeRead  BeforeReadHandler
	AfterRead   AfterReadHandler
	BeforeWrite BeforeWriteHandler
	AfterWrite  AfterWriteHandler
	// contains filtered or unexported fields
}

func (*EventHandler) Read added in v0.4.0

func (h *EventHandler) Read(r io.Reader, endian binary.ByteOrder) (err error)

func (*EventHandler) Write added in v0.4.0

func (h *EventHandler) Write(w io.Writer, endian binary.ByteOrder) (err error)

type FieldMapper added in v0.3.0

type FieldMapper interface {
	// contains filtered or unexported methods
}

FieldMapper provides the logic necessary to read and write DataTable fields. Created with MapField.

func MapField added in v0.3.0

func MapField[T any](target *[]T, mapFn func(*T) Mapper) FieldMapper

MapField will associate a Mapper to each element in a target slice within a FieldMapper.

type KeyMapper added in v0.3.0

type KeyMapper[K comparable] func(key *K) Mapper

type Mapper

type Mapper interface {
	// Read data from a binary source.
	Read(r io.Reader, endian binary.ByteOrder) error
	// Write data to a binary target.
	Write(w io.Writer, endian binary.ByteOrder) error
}

Mapper is any procedure that knows how to read from and write to binary data, given an endianness policy.

func Any

func Any(read ReadFunc, write WriteFunc) Mapper

Any is provided to make it easy to create a custom Mapper for any given type.

func Bool

func Bool(b *bool) Mapper

Bool will map a single boolean.

func Byte

func Byte(b *byte) Mapper

Byte will map a single byte.

func Complex added in v0.3.0

func Complex[T AnyComplex](target *T) Mapper

Complex will map a complex64/128 number.

func DataTable added in v0.3.0

func DataTable(length *uint32, mappers ...FieldMapper) Mapper

DataTable will construct a Mapper that orchestrates reading and writing a data table. This is very helpful for situations where the caller is using the array of structs to struct of arrays optimization, and wants to persist this table. Each FieldMapper will be used to read a single field element, making up a DataTable row, before returning to the first FieldMapper to start the next row. The length parameter will set during read, and read during write to ensure that all mapped fields are of the same length.

func DynamicSlice

func DynamicSlice[E any](target *[]E, mapVal func(*E) Mapper) Mapper

DynamicSlice tries to accomplish a happy medium between LenSlice and Slice. A uint32 will be used to store the size of the given slice, but it's not necessary to read this from a field, rather it will be discovered at write time. This means that the size will be available at read time by first reading the uint32 with LenSlice, without requiring a caller provided field. In a scenario where a slice in a struct is used, this makes it easier to read and write because the struct doesn't need to store the size in a field.

func FixedBytes

func FixedBytes[S SizeType](buf *[]byte, length S) Mapper

FixedBytes maps a byte slice of a known length.

func FixedString

func FixedString(s *string, length int) Mapper

FixedString will map a string with a max length that is known ahead of time. The target string will not contain any trailing zero bytes if the encoded string is less than the space allowed.

func Float

func Float[T AnyFloat](f *T) Mapper

Float will map any floating point value.

func Int

func Int[T AnyInt](i *T) Mapper

Int will map any integer, excluding int.

func LenBytes

func LenBytes[S SizeType](buf *[]byte, length *S) Mapper

LenBytes is used for situations where an arbitrarily sized byte slice is encoded after its length. This mapper will read the length, and then length number of bytes into a byte slice. The mapper will write the length and bytes in the same order.

func LenSlice

func LenSlice[E any, S SizeType](target *[]E, count *S, mapVal func(*E) Mapper) Mapper

LenSlice is for situations where a slice is encoded with its length prepended. Otherwise, this behaves exactly like Slice.

func Lock added in v0.4.0

func Lock(mapper Mapper, mux *sync.Mutex) Mapper

Lock will manage locking and unlocking a sync.Mutex before/after a read/write.

func Map added in v0.3.0

func Map[K comparable, V any](target *map[K]V, keyMapper KeyMapper[K], valMapper ValMapper[V]) Mapper

func MapSequence

func MapSequence(mappings ...Mapper) Mapper

MapSequence creates a Mapper that uses each given Mapper in order.

func NewEventHandler added in v0.4.0

func NewEventHandler(mapper Mapper, handler EventHandler) Mapper

func NormalizeWrite added in v0.4.0

func NormalizeWrite(mapper Mapper, normalizer BeforeWriteHandler) Mapper

NormalizeWrite will run the normalizer before writing with the mapper.

func NullTermString

func NullTermString(s *string) Mapper

NullTermString will read and write null-byte terminated string. The string should not contain a null terminator, one will be added on write.

func OnPanic added in v0.4.0

func OnPanic(mapper Mapper, panicHandler func(any) error) Mapper

OnPanic will recover a panic from a Read or Write operation, and return the error returned from panicHandler wrapped with ErrPanic. If no error is returned from panicHandler, then a plain ErrPanic error will be returned.

func OverrideEndian added in v0.3.0

func OverrideEndian(m Mapper, endian binary.ByteOrder) Mapper

OverrideEndian will override the endian settings for a single operation. This is useful for UTF-16 strings which are often read/written little-endian.

func RWLock added in v0.4.0

func RWLock(mapper Mapper, mux *sync.RWMutex) Mapper

RWLock will manage locking and unlocking a sync.RWMutex before/after a read/write. Writing the mapper only requires read locking, while reading with the mapper requires write locking since state is being mutated.

func Size

func Size[S SizeType](size *S) Mapper

Size maps any value that can reasonably be used to express a size.

func Slice

func Slice[E any, S SizeType](target *[]E, count S, mapVal func(*E) Mapper) Mapper

Slice will produce a mapper informed from the given function to use a slice of values. The slice length must be known ahead of time. The mapVal function will be used to create a Mapper that relates to the type returned from allocNext. The returned Mapper will orchestrate the array construction according to the given function.

func Uni16FixedString added in v0.3.0

func Uni16FixedString(s *string, wcharlen int) Mapper

Uni16FixedString is the same as FixedString, except that it works with UTF-16 strings.

func Uni16NullTermString added in v0.3.0

func Uni16NullTermString(s *string) Mapper

Uni16NullTermString is the same as NullTermString, except that it works with UTF-16 strings.

func Uvarint added in v0.3.0

func Uvarint(target *uint64) Mapper

Uvarint encodes 16, 32, or 64-bit unsigned integers as a variable length integer. This is generally more efficient than reading/writing the full byte length.

func ValidateRead added in v0.4.0

func ValidateRead(mapper Mapper, validator AfterReadHandler) Mapper

ValidateRead will run the validator function after reading with the mapper.

func Varint added in v0.3.0

func Varint(target *int64) Mapper

Varint encodes 16, 32, or 64-bit signed integers as a variable length integer. This is generally more efficient than reading/writing the full byte length.

type ReadFunc

type ReadFunc func(r io.Reader, endian binary.ByteOrder) error

ReadFunc is a function that reads data from a binary source.

type SizeType

type SizeType interface {
	uint8 | uint16 | uint32 | uint64
}

type ValMapper added in v0.3.0

type ValMapper[V any] func(val *V) Mapper

type WriteFunc

type WriteFunc func(w io.Writer, endian binary.ByteOrder) error

WriteFunc is a function that writes data to a binary target.

Directories

Path Synopsis
example

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL