molecule

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 17, 2021 License: MIT Imports: 8 Imported by: 4

README

GoDoc C.I

Molecule

Molecule is a Go library for parsing protobufs in an efficient and zero-allocation manner. The API is loosely based on this excellent Go JSON parsing library.

This library is in alpha and the API could change. The current APIs are fairly low level, but additional helpers may be added in the future to make certain operations more ergonomic.

Rationale

The standard Unmarshal protobuf interface in Go makes it difficult to manually control allocations when parsing protobufs. In addition, its common to only require access to a subset of an individual protobuf's fields. These issues make it hard to use protobuf in performance critical paths.

This library attempts to solve those problems by introducing a streaming, zero-allocation interface that allows users to have complete control over which fields are parsed, and how/when objects are allocated.

The downside, of course, is that molecule is more difficult to use (and easier to misuse) than the standard protobuf libraries so its recommended that it only be used in situations where performance is important. It is not a general purpose replacement for proto.Unmarshal(). It is recommended that users familiarize themselves with the proto3 encoding before attempting to use this library.

Features

  1. Unmarshal all protobuf primitive types with a streaming, zero-allocation API.
  2. Support for iterating through protobuf messages in a streaming fashion.
  3. Support for iterating through packed protobuf repeated fields (arrays) in a streaming fashion.

Not Supported

  1. Proto2 syntax (some things will probably work, but nothing is tested).
  2. Repeated fields encoded not using the "packed" encoding (although in theory they can be parsed using this library, there just aren't any special helpers).
  3. Map fields. It should be possible to parse maps using this library's API, but it would be a bid tedious. I plan on adding better support for this once I settle on a reasonable API.
  4. Probably lots of other things.

Examples

The godocs have numerous runnable examples.

Attributions

This library is mostly a thin wrapper around other people's work:

  1. The interface was inspired by this jsonparser library.
  2. The codec for interacting with protobuf streams was lifted from this protobuf reflection library. The code was manually vendored instead of imported to reduce dependencies.

Dependencies

The core molecule library has zero external dependencies. The go.sum file does contain some dependencies introduced from the tests package, however, those should not be included transitively when using this library.

Documentation

Overview

Example

Example demonstrates how the molecule library can be used to parse a protobuf message.

// Proto definitions:
//
//   message Test {
//     string string_field = 1;
//     int64 int64_field = 2;
//     repeated int64 repeated_int64_field = 3;
//   }

m := &simple.Test{
	StringField: "hello world!",
	Int64Field:  10,
}
marshaled, err := proto.Marshal(m)
if err != nil {
	panic(err)
}

var (
	buffer   = codec.NewBuffer(marshaled)
	strVal   Value
	int64Val Value
)
err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
	if fieldNum == 1 {
		strVal = value
	}
	if fieldNum == 2 {
		int64Val = value
	}

	// Continue scanning.
	return true, nil
})
if err != nil {
	panic(err)
}

str, err := strVal.AsStringUnsafe()
if err != nil {
	panic(err)
}
int64V, err := int64Val.AsInt64()
if err != nil {
	panic(err)
}

fmt.Println("StringField:", str)
fmt.Println("Int64Field:", int64V)
Output:

StringField: hello world!
Int64Field: 10
Example (Nested)

Example_nested demonstrates how to use the MessageEach function to decode a nested message.

// Proto definitions:
//
//   message Test {
//       string string_field = 1;
//       int64 int64_field = 2;
//       repeated int64 repeated_int64_field = 3;
//   }
//
//   message Nested {
//       Test nested_message = 1;
//   }

var (
	test   = &simple.Test{StringField: "Hello world!"}
	nested = &simple.Nested{NestedMessage: test}
)
marshaled, err := proto.Marshal(nested)
if err != nil {
	panic(err)
}

var (
	buffer = codec.NewBuffer(marshaled)
	strVal Value
)
err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
	if fieldNum == 1 {
		packedArr, err := value.AsBytesUnsafe()
		if err != nil {
			return false, err
		}

		buffer := codec.NewBuffer(packedArr)
		err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
			if fieldNum == 1 {
				strVal = value
			}
			// Found it, stop scanning.
			return false, nil
		})
		if err != nil {
			return false, err
		}

		// Found it, stop scanning.
		return false, nil
	}
	// Continue scanning.
	return true, nil
})
if err != nil {
	panic(err)
}

str, err := strVal.AsStringUnsafe()
if err != nil {
	panic(err)
}

fmt.Println("NestedMessage.StringField:", str)
Output:

NestedMessage.StringField: Hello world!
Example (Repeated)

Example_repeated demonstrates how to use the PackedRepeatedEach function to decode a repeated field encoded in the packed (proto 3) format.

// Proto definitions:
//
//   message Test {
//     string string_field = 1;
//     int64 int64_field = 2;
//     repeated int64 repeated_int64_field = 3;
//   }

int64s := []int64{1, 2, 3, 4, 5, 6, 7}
m := &simple.Test{RepeatedInt64Field: int64s}
marshaled, err := proto.Marshal(m)
if err != nil {
	panic(err)
}

var (
	buffer          = codec.NewBuffer(marshaled)
	unmarshaledInts = []int64{}
)
err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
	if fieldNum == 3 {
		packedArr, err := value.AsBytesUnsafe()
		if err != nil {
			return false, err
		}

		buffer := codec.NewBuffer(packedArr)
		if err := PackedRepeatedEach(buffer, codec.FieldType_INT64, func(v Value) (bool, error) {
			vInt64, err := v.AsInt64()
			if err != nil {
				return false, err
			}
			unmarshaledInts = append(unmarshaledInts, vInt64)
			return true, nil
		}); err != nil {
			return false, err
		}

		// Found it, stop scanning.
		return false, nil
	}
	// Continue scanning.
	return true, nil
})
if err != nil {
	panic(err)
}

fmt.Println("Int64s:", unmarshaledInts)
Output:

Int64s: [1 2 3 4 5 6 7]

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func MessageEach

func MessageEach(buffer *codec.Buffer, fn MessageEachFn) error

MessageEach iterates over each top-level field in the message stored in buffer and calls fn on each one.

Example (SelectAField)

ExampleMessageEach_SelectAField desmonates how the MessageEach function can be used to select an individual field.

// Proto definitions:
//
//   message Test {
//     string string_field = 1;
//     int64 int64_field = 2;
//     repeated int64 repeated_int64_field = 3;
//   }

m := &simple.Test{StringField: "hello world!"}
marshaled, err := proto.Marshal(m)
if err != nil {
	panic(err)
}

var (
	buffer = codec.NewBuffer(marshaled)
	strVal Value
)
err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
	if fieldNum == 1 {
		strVal = value
		// Found it, stop scanning.
		return false, nil
	}
	// Continue scanning.
	return true, nil
})
if err != nil {
	panic(err)
}

str, err := strVal.AsStringUnsafe()
if err != nil {
	panic(err)
}

fmt.Println("StringField:", str)
Output:

StringField: hello world!

func Next

func Next(buffer *codec.Buffer, value *Value) (fieldNum int32, err error)

Next populates the given value with the next value in the field and returns the field number or an error if one was encountered while reading the next field value

func PackedRepeatedEach

func PackedRepeatedEach(buffer *codec.Buffer, fieldType codec.FieldType, fn PackedRepeatedEachFn) error

PackedRepeatedEach iterates over each value in the packed repeated field stored in buffer and calls fn on each one.

The fieldType argument should match the type of the value stored in the repeated field.

PackedRepeatedEach only supports repeated fields encoded using packed encoding.

Example

ExamplePackedRepeatedEach demonstrates how to use the PackedRepeatedEach function to decode a repeated field encoded in the packed (proto 3) format.

// Proto definitions:
//
//   message Test {
//     string string_field = 1;
//     int64 int64_field = 2;
//     repeated int64 repeated_int64_field = 3;
//   }

int64s := []int64{1, 2, 3, 4, 5, 6, 7}
m := &simple.Test{RepeatedInt64Field: int64s}
marshaled, err := proto.Marshal(m)
if err != nil {
	panic(err)
}

var (
	buffer          = codec.NewBuffer(marshaled)
	unmarshaledInts = []int64{}
)
err = MessageEach(buffer, func(fieldNum int32, value Value) (bool, error) {
	if fieldNum == 3 {
		packedArr, err := value.AsBytesUnsafe()
		if err != nil {
			panic(err)
		}

		buffer := codec.NewBuffer(packedArr)
		err = PackedRepeatedEach(buffer, codec.FieldType_INT64, func(v Value) (bool, error) {
			vInt64, err := v.AsInt64()
			if err != nil {
				return false, err
			}
			unmarshaledInts = append(unmarshaledInts, vInt64)
			return true, nil
		})
		if err != nil {
			return false, err
		}

		// Found it, stop scanning.
		return false, nil
	}
	// Continue scanning.
	return true, nil
})
if err != nil {
	panic(err)
}

fmt.Println("Int64s:", unmarshaledInts)
Output:

Int64s: [1 2 3 4 5 6 7]

Types

type MessageEachFn

type MessageEachFn func(fieldNum int32, value Value) (bool, error)

MessageEachFn is a function that will be called for each top-level field in a message passed to MessageEach.

type PackedRepeatedEachFn

type PackedRepeatedEachFn func(value Value) (bool, error)

PackedRepeatedEachFn is a function that is called for each value in a repeated field.

type ProtoStream

type ProtoStream struct {

	// The BufferFactory creates new, empty buffers as needed.  Users may
	// override this function to provide pre-initialized buffers of a larger
	// size, or from a buffer pool, for example.
	BufferFactory func() []byte
	// contains filtered or unexported fields
}

A ProtoStream supports writing protobuf data in a streaming fashion. Its methods will write their output to the wrapped `io.Writer`. Zero values are not included.

ProtoStream instances are *not* threadsafe and *not* re-entrant.

func NewProtoStream

func NewProtoStream(outputWriter io.Writer) *ProtoStream

NewProtoStream creates a new ProtoStream writing to the given Writer. If the writer is nil, the stream cannot be used until it has been set with `Reset`.

func (*ProtoStream) Bool

func (ps *ProtoStream) Bool(fieldNumber int, value bool) error

Bool writes a value of proto type bool to the stream.

func (*ProtoStream) Bytes

func (ps *ProtoStream) Bytes(fieldNumber int, value []byte) error

Bytes writes the given bytes to the stream.

func (*ProtoStream) Double

func (ps *ProtoStream) Double(fieldNumber int, value float64) error

Double writes a value of proto type double to the stream.

func (*ProtoStream) DoublePacked

func (ps *ProtoStream) DoublePacked(fieldNumber int, values []float64) error

DoublePacked writes a slice of values of proto type double to the stream, in packed form.

func (*ProtoStream) Embedded

func (ps *ProtoStream) Embedded(fieldNumber int, inner func(*ProtoStream) error) error

Embedded is used for constructing embedded messages. It calls the given function with a new ProtoStream, then embeds the result in the current stream.

NOTE: if the inner function creates an empty message (such as for a struct at its zero value), that empty message will still be added to the stream.

Example
/* Encoding the following:
     *
     * message MultiSearch {
     *   string api_key = 10;
     *   repeated SearchRequest request = 11;
     * }
	 *
	 * message SearchRequest {
	 *   string query = 1;
	 *   int32 page_number = 2;
	 *   int32 result_per_page = 3;
	 * }
*/;
	 * }
*/
var err error
output := bytes.NewBuffer([]byte{})
ps := NewProtoStream(output)

// values copied from the .proto file
const fieldAPIKey = 10
const fieldRequest = 11
const fieldQuery int = 1
const fieldPageNumber int = 2
const fieldResultPerPage int = 3

err = ps.String(fieldAPIKey, "abc-123")
if err != nil {
	panic(err)
}

err = ps.Embedded(fieldRequest, func(ps *ProtoStream) error {
	err = ps.String(fieldQuery, "author=octavia+butler")
	if err != nil {
		return err
	}

	err = ps.Int32(fieldPageNumber, 2)
	if err != nil {
		return err
	}

	err = ps.Int32(fieldResultPerPage, 100)
	if err != nil {
		return err
	}

	return nil
})
if err != nil {
	panic(err)
}

err = ps.Embedded(fieldRequest, func(ps *ProtoStream) error {
	err = ps.String(fieldQuery, "author=margaret+atwood")
	if err != nil {
		return err
	}

	err = ps.Int32(fieldPageNumber, 0)
	if err != nil {
		return err
	}

	err = ps.Int32(fieldResultPerPage, 10)
	if err != nil {
		return err
	}

	return nil
})
if err != nil {
	panic(err)
}

// The encoded result is in `output.Bytes()`.
Output:

func (*ProtoStream) Fixed32

func (ps *ProtoStream) Fixed32(fieldNumber int, value uint32) error

Fixed32 writes a value of proto type fixed32 to the stream.

func (*ProtoStream) Fixed32Packed

func (ps *ProtoStream) Fixed32Packed(fieldNumber int, values []uint32) error

Fixed32Packed writes a slice of values of proto type fixed32 to the stream, in packed form.

func (*ProtoStream) Fixed64

func (ps *ProtoStream) Fixed64(fieldNumber int, value uint64) error

Fixed64 writes a value of proto type fixed64 to the stream.

func (*ProtoStream) Fixed64Packed

func (ps *ProtoStream) Fixed64Packed(fieldNumber int, values []uint64) error

Fixed64Packed writes a slice of values of proto type fixed64 to the stream, in packed form.

func (*ProtoStream) Float

func (ps *ProtoStream) Float(fieldNumber int, value float32) error

Float writes a value of proto type double to the stream.

func (*ProtoStream) FloatPacked

func (ps *ProtoStream) FloatPacked(fieldNumber int, values []float32) error

FloatPacked writes a slice of values of proto type float to the stream, in packed form.

func (*ProtoStream) Int32

func (ps *ProtoStream) Int32(fieldNumber int, value int32) error

Int32 writes a value of proto type int32 to the stream.

func (*ProtoStream) Int32Packed

func (ps *ProtoStream) Int32Packed(fieldNumber int, values []int32) error

Int32Packed writes a slice of values of proto type int32 to the stream, in packed form.

func (*ProtoStream) Int64

func (ps *ProtoStream) Int64(fieldNumber int, value int64) error

Int64 writes a value of proto type int64 to the stream.

func (*ProtoStream) Int64Packed

func (ps *ProtoStream) Int64Packed(fieldNumber int, values []int64) error

Int64Packed writes a slice of values of proto type int64 to the stream, in packed form.

func (*ProtoStream) Reset

func (ps *ProtoStream) Reset(outputWriter io.Writer)

Reset sets the Writer to which this ProtoStream streams. If the writer is nil, then the protostream cannot be used until Reset is called with a non-nil value.

func (*ProtoStream) Sfixed32

func (ps *ProtoStream) Sfixed32(fieldNumber int, value int32) error

Sfixed32 writes a value of proto type sfixed32 to the stream.

func (*ProtoStream) Sfixed32Packed

func (ps *ProtoStream) Sfixed32Packed(fieldNumber int, values []int32) error

Sfixed32Packed writes a slice of values of proto type sfixed32 to the stream, in packed form.

func (*ProtoStream) Sfixed64

func (ps *ProtoStream) Sfixed64(fieldNumber int, value int64) error

Sfixed64 writes a value of proto type sfixed64 to the stream.

func (*ProtoStream) Sfixed64Packed

func (ps *ProtoStream) Sfixed64Packed(fieldNumber int, values []int64) error

Sfixed64Packed writes a slice of values of proto type sfixed64 to the stream, in packed form.

func (*ProtoStream) Sint32

func (ps *ProtoStream) Sint32(fieldNumber int, value int32) error

Sint32 writes a value of proto type sint32 to the stream.

func (*ProtoStream) Sint32Packed

func (ps *ProtoStream) Sint32Packed(fieldNumber int, values []int32) error

Sint32Packed writes a slice of values of proto type sint32 to the stream, in packed form.

Example
/* Encoding the following:
 *
 * message Numbers {
 *   repeated int32 number = 22;
 * }
 */
 */
var err error
output := bytes.NewBuffer([]byte{})
ps := NewProtoStream(output)

const fieldNumber = 22

numbers := []int32{20, -30, -31, 1999}

err = ps.Sint32Packed(fieldNumber, numbers)
if err != nil {
	panic(err)
}

res := bytes.NewReader(output.Bytes())
key, _ := binary.ReadUvarint(res)
fmt.Printf("key: 0x%x = 22<<3 + 2\n", key)
leng, _ := binary.ReadUvarint(res)
fmt.Printf("length: 0x%x\n", leng)
v, _ := binary.ReadUvarint(res)
fmt.Printf("v[0]: 0x%x\n", v)
v, _ = binary.ReadUvarint(res)
fmt.Printf("v[1]: 0x%x\n", v)
v, _ = binary.ReadUvarint(res)
fmt.Printf("v[2]: 0x%x\n", v)
v, _ = binary.ReadUvarint(res)
fmt.Printf("v[3]: 0x%x\n", v)
Output:

key: 0xb2 = 22<<3 + 2
length: 0x5
v[0]: 0x28
v[1]: 0x3b
v[2]: 0x3d
v[3]: 0xf9e

func (*ProtoStream) Sint64

func (ps *ProtoStream) Sint64(fieldNumber int, value int64) error

Sint64 writes a value of proto type sint64 to the stream.

func (*ProtoStream) Sint64Packed

func (ps *ProtoStream) Sint64Packed(fieldNumber int, values []int64) error

Sint64Packed writes a slice of values of proto type sint64 to the stream, in packed form.

func (*ProtoStream) String

func (ps *ProtoStream) String(fieldNumber int, value string) error

String writes a string to the stream.

func (*ProtoStream) Uint32

func (ps *ProtoStream) Uint32(fieldNumber int, value uint32) error

Uint32 writes a value of proto type uint32 to the stream.

func (*ProtoStream) Uint32Packed

func (ps *ProtoStream) Uint32Packed(fieldNumber int, values []uint32) error

Uint32Packed writes a slice of values of proto type uint32 to the stream, in packed form.

func (*ProtoStream) Uint64

func (ps *ProtoStream) Uint64(fieldNumber int, value uint64) error

Uint64 writes a value of proto type uint64 to the stream.

func (*ProtoStream) Uint64Packed

func (ps *ProtoStream) Uint64Packed(fieldNumber int, values []uint64) error

Uint64Packed writes a slice of values of proto type uint64 to the stream, in packed form.

type Value

type Value struct {
	// WireType is the protobuf wire type that was used to encode the field.
	WireType codec.WireType
	// Number will contain the value for any fields encoded with the
	// following wire types:
	//
	// 1. varint
	// 2. Fixed32
	// 3. Fixed64
	Number uint64
	// Bytes will contain the value for any fields encoded with the
	// following wire types:
	//
	// 1. bytes
	//
	// Bytes is an unsafe view over the bytes in the buffer. To obtain a "safe" copy
	// call value.AsSafeBytes() or copy Bytes directly.
	Bytes []byte
}

Value represents a protobuf value. It contains the original wiretype that the value was encoded with as well as a variety of helper methods for interpreting the raw value based on the field's actual type.

func (*Value) AsBool

func (v *Value) AsBool() (bool, error)

AsBool interprets the value as a bool.

func (*Value) AsBytesSafe

func (v *Value) AsBytesSafe() ([]byte, error)

AsBytesSafe interprets the value as a byte slice by allocating a safe copy of the underlying data.

func (*Value) AsBytesUnsafe

func (v *Value) AsBytesUnsafe() ([]byte, error)

AsBytesUnsafe interprets the value as a byte slice. The returned []byte is an unsafe view over the underlying bytes. Use AsBytesSafe() to obtain a "safe" [] that is a copy of the underlying data.

func (*Value) AsDouble

func (v *Value) AsDouble() (float64, error)

AsDouble interprets the value as a double.

func (*Value) AsFixed32

func (v *Value) AsFixed32() (uint32, error)

AsFixed32 interprets the value as a fixed32.

func (*Value) AsFixed64

func (v *Value) AsFixed64() (uint64, error)

AsFixed64 interprets the value as a fixed64.

func (*Value) AsFloat

func (v *Value) AsFloat() (float32, error)

AsFloat interprets the value as a float.

func (*Value) AsInt32

func (v *Value) AsInt32() (int32, error)

AsInt32 interprets the value as an int32.

func (*Value) AsInt64

func (v *Value) AsInt64() (int64, error)

AsInt64 interprets the value as an int64.

func (*Value) AsSFixed32

func (v *Value) AsSFixed32() (int32, error)

AsSFixed32 interprets the value as a SFixed32.

func (*Value) AsSFixed64

func (v *Value) AsSFixed64() (int64, error)

AsSFixed64 interprets the value as a SFixed64.

func (*Value) AsSint32

func (v *Value) AsSint32() (int32, error)

AsSint32 interprets the value as a sint32.

func (*Value) AsSint64

func (v *Value) AsSint64() (int64, error)

AsSint64 interprets the value as a sint64.

func (*Value) AsStringSafe

func (v *Value) AsStringSafe() (string, error)

AsStringSafe interprets the value as a string by allocating a safe copy of the underlying data.

func (*Value) AsStringUnsafe

func (v *Value) AsStringUnsafe() (string, error)

AsStringUnsafe interprets the value as a string. The returned string is an unsafe view over the underlying bytes. Use AsStringSafe() to obtain a "safe" string that is a copy of the underlying data.

func (*Value) AsUint32

func (v *Value) AsUint32() (uint32, error)

AsUint32 interprets the value as a uint32.

func (*Value) AsUint64

func (v *Value) AsUint64() (uint64, error)

AsUint64 interprets the value as a uint64.

Directories

Path Synopsis
src
codec
Package codec contains all the logic required for interacting with the protobuf raw encoding.
Package codec contains all the logic required for interacting with the protobuf raw encoding.
proto
Package simple is a generated protocol buffer package.
Package simple is a generated protocol buffer package.
protowire
Package protowire parses and formats the raw wire encoding.
Package protowire parses and formats the raw wire encoding.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL