avrox

package module
v0.7.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 20, 2024 License: MIT Imports: 15 Imported by: 1

README

AvroX

AvroX enables Avro formatted data to be discoverable in a closed system. The idea behind it was to create a method for encoding structured data for NATS messages in a more compact format than using JSON + JSON schemas.

This package is still experimental!

TL;DR: Don't use it for anything important!

This package is a work in progress (WIP). Feel free to try it out, but don't expect it to be suitable for production use, and anticipate daily changes to the API. We publish this primarily because some of the tools we also publish utilize this package, and we believe it could become very useful eventually.

Take note of our disclaimer below!

What it delivers:
  • Highly concise binary encoding for small data sizes.
  • A JSON schema with additional data documentation.
  • Optional further data compression (we are currently using Snappy, but we support up to 7 types with AvroX).
  • Avro supports good native types for time, date, and binary data. This works for Go because of the wonderful hamba/avro/v2 package.
  • A usage experience similar to other marshaller implementations.
  • A three level versioning identifier (similar to a semver) with N.S.V which is namespace, schema, version.
  • Support for unmarshalling to a given list of schemas (unions) where the destinations can be a nil type or a concrete type. It returns then either a new allocated type of uses the given storage after identifying what schema is used in the source data.
  • Some basic types like string, int, map[string]any can be directly marshalled, while also utilizing Avro.
  • The unmarshaller automatically detects JSON (for manual debugging) as an alternative to Avro data (may get removed soon).
  • Seamless integration with the NATS CLI Tool --translate option through the use of our message converter tool msgcvt. This will also eventually support schema storage within NATS.
  • The schema can be used in an interpreted or compiled manner (we do not use compiled Avro so far).
  • Schema registry with namespace support, accommodating both public and private schemas.
  • AvroX Data could be discovered in a binary stream (although this is just an experiment)
  • Avro schema's can be also be autogenerated through avscgen which is currently still proprietary and may be release by us to the public eventually.
  • We are working also on an auto indexer that can generate indexes for the messages in a stream based on indexin information that can be added to a shemas fields (a bit like adding indexes when using a database).

(This list is not exhastive...)

How we arrived at the current stage:

During our research into alternative formats for storing a large volume of small data in a NATS JetStream, we examined various formats:

  • JSON + JSON Schema was our original idea. However, the overhead became rather significant when storing millions of messages. Ensuring the schema and JSON were in sync also required extra steps during implementation and testing. We believed there must be a more elegant solution, which led us to begin our search.
  • Gob was our first alternative, but it quickly became apparent that it actually increased data size when used with numerous individual messages and struct tags. It also required recompilation and lacked discoverability. Additionally, Go code is not inherently a schema. Parsing structs and struct tags to generate documents was quite cumbersome.
  • ProtoBuf necessitated recompilation and a considerable amount of additional tooling, as well as generated excessive code. We previously used it alongside Twirp before deciding to employ NATS for messaging at the border too (see: https://github.com/oderwat/go-nats-app). Twirp inspired us to consider supporting JSON as an alternative to the endpoints.
  • CBOR, with its Go package fxamacker/cbor, looked promising and somewhat reduced data size, but not significantly enough. It also lacked a robust schema representation. However, it could be parsed without the schema, like JSON. While working on this, we realized that a shareable, simple text schema was what we needed.
  • BSON was briefly considered but quickly ruled out.

As we experimented with various implementations and formats, our desired features became increasingly clear.

  1. It should have a small storage size.
  2. We want a mandatory schema for documentation and discovery.
  3. It should be very easy to use and plug-in, just like other marshallers.
  4. Debugging messages should be possible without recompiling the used tools.
  5. It should not hinder prototyping or the creation of quick tools.
  6. There should be a way to bypass it and revert to using JSON.
  7. It should be safe and performant.
  8. While an interpreted schema is beneficial, there should also be a way to generate specialized code for increased performance.
Disclaimer

This code and documentation are works in progress, and everything may change without further notice. We are shure there are bugs to fix and optimisations to make. This project utilizes the GPT-4 language model for generating some of its content.

MIT License / Copyright 2023 by METATEXX GmbH

Documentation

Index

Constants

View Source
const (
	CompNone   CompressionID = 0
	CompSnappy CompressionID = 1
	CompFlate  CompressionID = 2 // Uses -1 as compression parameter
	CompGZip   CompressionID = 3 // Uses -1 as compression parameter
	CompMax    CompressionID = 255

	// NamespacePrivate means that it is not registered and we use private schemas
	NamespacePrivate NamespaceID = 0
	// NamespaceBasic is reserved for the basic types and structs that are implemented through avrox
	NamespaceBasic NamespaceID = 1
	// NamespaceReserved1 is reserved for later
	NamespaceReserved1 NamespaceID = 2
	// NamespaceReserved2 is reserved for later
	NamespaceReserved2 NamespaceID = 3
	// NamespaceReserved3 is reserved for later
	NamespaceReserved3 NamespaceID = 4
	NamespaceMax       NamespaceID = 65535

	// Schema 0 means that it is not defined (but may belong to a namespace)
	SchemaUndefined SchemaID = 0
	SchemaMax       SchemaID = 16777215 // Schema<<8 | Version

	MagicFieldName = "Magic"
)
View Source
const MagicLen = 8 // The struct will be aligned to 8 bytes anyway

Variables

View Source
var (
	ErrLengthInvalid           = errors.New("data length should be exactly 8 bytes")
	ErrNamespaceIDOutOfRange   = errors.New("namespace must be between 0 and 31")
	ErrCompressionIDOutOfRange = errors.New("compression must be between 0 and 7")
	ErrCompressionUnsupported  = errors.New("compression type is unsupported")
	ErrSchemaIDOutOfRange      = errors.New("schema must be between 0 and 8191")
	ErrMarkerInvalid           = fmt.Errorf("data should start with magic marker (0x%02x)", Marker)
	ErrParityCheckFailed       = errors.New("parity check failed")
	ErrMarshallingFailed       = errors.New("marshalling failed")
	ErrMissingMagicField       = errors.New("missing magic field in struct")
	ErrMarshallAnyWithoutPtr   = errors.New("no ptr src for MarshalAny")
	ErrSchemaNil               = errors.New("schema is nil")
	ErrSchemaInvalid           = errors.New("schema is invalid")
	ErrDecompress              = errors.New("can not decompress")
	ErrDataFormatNotDetected   = errors.New("message format was not detected")
	ErrNoData                  = errors.New("no data")
	ErrNoBasicNamespace        = errors.New("no basic namespace")
	ErrNoBasicSchema           = errors.New("no basic schema")
	ErrNoBasicString           = errors.New("no basic string")
	ErrNoBasicInt              = errors.New("no basic int")
	ErrNoBasicByteSlice        = errors.New("no basic byte slice")
	ErrNoBasicMapStringAny     = errors.New("no basic map string any")
	ErrNoBasicTime             = errors.New("no basic time")
	ErrNoBasicRawDate          = errors.New("no basic rawdate")
	ErrNoBasicDecimal          = errors.New("no basic decimal")
	ErrWrongNamespace          = errors.New("namespace from schemer does not fit the magic entry")
	ErrWrongSchema             = errors.New("schema from schemer does not fit the magic entry")
	ErrNotAvroX                = errors.New("data is not avrox")
	ErrNoPointerDestination    = errors.New("not a pointer destination")
	ErrSchemerNotFound         = errors.New("schema from schemer is not in the given slice")
)
View Source
var BasicByteSliceAVSC string
View Source
var BasicDecimalAVSC string
View Source
var BasicIntAVSC string
View Source
var BasicMapStringAnyAVSC string
View Source
var BasicRawDateAVSC string
View Source
var BasicStringAVSC string
View Source
var BasicTimeAVSC string
View Source
var Marker byte = 0x93 // one of the best when analysing our data

Functions

func AvroDate

func AvroDate(t time.Time) time.Time

AvroDate truncates a go time.Time to the value that gets stored the avro logicalDate It also makes sure that the time is expressed in UTC()

func AvroTime

func AvroTime(t time.Time) time.Time

AvroTime truncates a go time.Time to the value that gets stored the avro logicalTime (which has a granularity of milliseconds while go has nanoseconds) It also makes sure that the time is expressed in UTC()

func CompressData added in v0.1.0

func CompressData(data []byte, cID CompressionID) ([]byte, error)

func DecodeMagic

func DecodeMagic(data []byte) (NamespaceID, SchemaID, CompressionID, error)

func DecompressData added in v0.1.0

func DecompressData(data []byte, cID CompressionID) ([]byte, error)

func IsMagic

func IsMagic(data []byte) bool

func JoinedSchemas

func JoinedSchemas(schemers ...Schemer) string

JoinedSchemas returns a json array of all schemers in the arguments

func Marshal

func Marshal(src Schemer, cID CompressionID, schema avro.Schema) ([]byte, error)

func MarshalAny

func MarshalAny(src any, schema avro.Schema, nID NamespaceID, sID SchemaID, cID CompressionID) ([]byte, error)

func MarshalBasic

func MarshalBasic(src any, cID CompressionID) ([]byte, error)

func Unmarshal

func Unmarshal(data []byte, dst Schemer, schema avro.Schema) error

Unmarshal uses the give schema for unmarshalling and checks if it fits to the decode data. This function is faster if the schema is given When the schema is not given it will parse the Schemer info. If the schema is given, it will check that this matches to the Schemer info

func UnmarshalAny

func UnmarshalAny[T any](data []byte, schema avro.Schema, dst *T) (NamespaceID, SchemaID, error)

func UnmarshalBasic

func UnmarshalBasic(src []byte) (any, error)

func UnmarshalInt

func UnmarshalInt(data []byte) (int, error)

func UnmarshalSchemer

func UnmarshalSchemer(src []byte, schemers ...Schemer) (any, error)

UnmarshalSchemer expects a slice with pre-allocated schemers and uses the magic in the data to unmarshal the correct one. It will return the used schema as any If no schema fits it will return an error

func UnmarshalString

func UnmarshalString(data []byte) (string, error)

func UnpackSchemVer

func UnpackSchemVer(schemaVerID SchemaID) (int, int)

Types

type BasicByteSlice

type BasicByteSlice struct {
	Magic [MagicLen]byte // 1.3.1
	Value []byte
}

func (BasicByteSlice) NamespaceID

func (BasicByteSlice) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicInt struct type

func (BasicByteSlice) Schema

func (BasicByteSlice) Schema() string

Schema returns the AVRO schema for the BasicString struct type

func (BasicByteSlice) SchemaID

func (BasicByteSlice) SchemaID() SchemaID

SchemaID returns the schema id for the BasicInt struct type

type BasicDecimal

type BasicDecimal struct {
	Magic [MagicLen]byte // 1.6.1
	Value *big.Rat
}

BasicDecimal is the container type to store a *bigRat value into a single avro schema

func (BasicDecimal) NamespaceID

func (BasicDecimal) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicDecimal struct type

func (BasicDecimal) Schema

func (BasicDecimal) Schema() string

Schema returns the AVRO schema for the BasicDecimal struct type

func (BasicDecimal) SchemaID

func (BasicDecimal) SchemaID() SchemaID

SchemaID returns the schema id for the BasicDecimal struct type

type BasicInt

type BasicInt struct {
	Magic [MagicLen]byte // 1.2.1
	Value int
}

func (BasicInt) NamespaceID

func (BasicInt) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicInt struct type

func (BasicInt) Schema

func (BasicInt) Schema() string

Schema returns the AVRO schema for the BasicString struct type

func (BasicInt) SchemaID

func (BasicInt) SchemaID() SchemaID

SchemaID returns the schema id for the BasicInt struct type

type BasicMapStringAny

type BasicMapStringAny struct {
	Magic [MagicLen]byte // 1.4.1
	Value map[string]any
}

func (BasicMapStringAny) NamespaceID

func (BasicMapStringAny) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicInt struct type

func (BasicMapStringAny) Schema

func (BasicMapStringAny) Schema() string

Schema returns the AVRO schema for the BasicString struct type

func (BasicMapStringAny) SchemaID

func (BasicMapStringAny) SchemaID() SchemaID

SchemaID returns the schema id for the BasicInt struct type

type BasicRawDate added in v0.3.0

type BasicRawDate struct {
	Magic [MagicLen]byte // 1.5.1
	Value rawdate.RawDate
}

BasicRawDate is the container type to store a timestamp in a single avro schema

func (BasicRawDate) NamespaceID added in v0.3.0

func (BasicRawDate) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicRawDate struct type

func (BasicRawDate) Schema added in v0.3.0

func (BasicRawDate) Schema() string

Schema returns the AVRO schema for the BasicRawDate struct type

func (BasicRawDate) SchemaID added in v0.3.0

func (BasicRawDate) SchemaID() SchemaID

SchemaID returns the schema id for the BasicRawDate struct type

type BasicString

type BasicString struct {
	Magic [MagicLen]byte // 1.1.1
	Value string
}

BasicString is the container type to store a string in a single avro schema

func (BasicString) NamespaceID

func (BasicString) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicString struct type

func (BasicString) Schema

func (BasicString) Schema() string

Schema returns the AVRO schema for the BasicString struct type

func (BasicString) SchemaID

func (BasicString) SchemaID() SchemaID

SchemaID returns the schema id for the BasicString struct type

type BasicTime

type BasicTime struct {
	Magic [MagicLen]byte // 1.5.1
	Value time.Time
}

BasicTime is the container type to store a timestamp in a single avro schema

func (BasicTime) NamespaceID

func (BasicTime) NamespaceID() NamespaceID

NamespaceID returns the namespace id for the BasicTime struct type

func (BasicTime) Schema

func (BasicTime) Schema() string

Schema returns the AVRO schema for the BasicTime struct type

func (BasicTime) SchemaID

func (BasicTime) SchemaID() SchemaID

SchemaID returns the schema id for the BasicTime struct type

type CompressionID

type CompressionID int

type Magic

type Magic [MagicLen]byte

func EncodeMagic

func EncodeMagic(namespace NamespaceID, schema SchemaID, compression CompressionID) (Magic, error)

func EncodePrivateMagic

func EncodePrivateMagic(compression CompressionID) (Magic, error)

func MustEncodeBasicMagic

func MustEncodeBasicMagic(schemaID SchemaID, compression CompressionID) Magic

func MustEncodePrivateMagic

func MustEncodePrivateMagic(compression CompressionID) Magic

type NamespaceID

type NamespaceID int

type SchemaID

type SchemaID int // Schema<<8 | 8 bit version
const (
	// BasicStringSchemaID is the id for the avro schema of struct BasicString
	BasicStringSchemaID SchemaID = 1<<8 + 1

	// BasicIntSchemaID is the id for the avro schema of struct BasicInt
	BasicIntSchemaID SchemaID = 2<<8 + 1

	// BasicByteSliceSchemaID is the id for the avro schema of struct BasicInt
	BasicByteSliceSchemaID SchemaID = 3<<8 + 1

	// BasicMapStringAnySchemaID is the id for the avro schema of struct BasicMapStringAny
	BasicMapStringAnySchemaID SchemaID = 4<<8 + 1

	// BasicTimeSchemaID is the id for the avro schema of struct BasicTime
	BasicTimeSchemaID SchemaID = 5<<8 + 1

	// BasicDecimalSchemaID is the id for the avro schema of struct BasicDecimal (*big.Rat / decimal.fixed)
	BasicDecimalSchemaID SchemaID = 6<<8 + 1

	// BasicRawDateSchemaID is the id for the avro schema of struct BasicRawDate (rawdate.Rawdate)
	BasicRawDateSchemaID SchemaID = 7<<8 + 1
)

func PackSchemVer

func PackSchemVer(schemaID SchemaID, version int) SchemaID

type Schemer

type Schemer interface {
	NamespaceID() NamespaceID
	SchemaID() SchemaID
	Schema() string
}

Directories

Path Synopsis
examples
nats
Package rawdate provides a simple date handling utility without time.
Package rawdate provides a simple date handling utility without time.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL