protein

package module
v1.15.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 19, 2018 License: Apache-2.0 Imports: 27 Imported by: 0

README

Protein

Protein Status Build Status Coverage Status GoDoc

Protein is an encoding/decoding library for Protobuf that comes with schema-versioning and runtime-decoding capabilities.

It has multiple use-cases, including but not limited to:

  • setting up schema registries
  • decoding Protobuf payloads without the need to know their schema at compile-time (all the while keeping strong-typing around!)
  • identifying & preventing applicative bugs and data corruption issues
  • creating custom-made container formats for on-disk storage
  • ...and more!

Protein is divided into 3 sub-components that all play a critical role in its operation:

  • The protoscan API walks through the symbols of the running executable to find every instanciated protobuf schemas, collect them, build the dependency trees that link them together and finally compute the deterministic versioning hashes that ultimately define them.
  • The protostruct API is capable of creating structure definitions at runtime using the dependency trees of protobuf schemas that were previously computed by the protoscan API.
  • The Transcoder class implements the high-level, user-facing encoding/decoding API that ties all of this together and offers the tools to cover the various use-cases cited above.

An upcoming blog post detailing the inner workings of these components is in the works and shall be available soon.

Have a look at the Quickstart section to get started.


IMPORTANT NOTE REGARDING VENDORING (I.E. IT WON'T COMPILE)

Protein makes use of Go's linkname feature in order to be able to sniff protobuf schemas off of memory.

The go:linkname directive instructs the compiler to declare a local symbol as an alias for an external one, whether it is public or private. This allows Protein to bind to some of the private methods of the official protobuf package for various reasons (see here and there for more information).

Unfortunately, vendoring modifies symbol names due to the way mangling works; e.g. a symbol called github.com/gogo/protobuf/protoc-gen-gogo/generator.(*Generator).goTag actually becomes github.com/myname/myproject/vendor/github.com/gogo/protobuf/protoc-gen-gogo/generator.(*Generator).goTag once the gogo/protobuf package gets vendored inside the myname/myproject package.

These modifications to the symbol names result in the dreaded relocation target <symbol-name> not defined error at compile time.

The good news is that Protein provides all that's necessary to fix those errors automatically, you just have to follow these commands:

$ go get -u github.com/znly/linkname-gen
$ go generate ./vendor/github.com/znly/protein/...

And voila, it compiles again!


Table of Contents:

Usage

Building

Install dependencies:

$ make deps

Build Protein:

$ go build ./...
Quickstart

This quickstart demonstrates the use of the Protein package in order to:

  • initialize a Transcoder
  • sniff the local protobuf schemas from memory
  • synchronize the local schema-database with a remote datastore (redis in this example)
  • use a Transcoder to encode & decode protobuf payloads using an already known schema
  • use a Transcoder to decode protobuf payloads without any prior knowledge of their schema

The complete code for the following quickstart can be found here.

Prerequisites

First, we need to set up a local redis server that will be used as a schema registry later on:

$ docker run -p 6379:6379 --name schema-reg --rm redis:3.2 redis-server

Then we open up a pool of connections to this server using garyburd/redigo:

p := &redis.Pool{
  Dial: func() (redis.Conn, error) {
    return redis.DialURL("redis://localhost:6379/0")
  },
}
defer p.Close()

Initializing a Transcoder

// initialize a `Transcoder` that is transparently kept in-sync with
// a `redis` datatore.
trc, err := NewTranscoder(
  // this context defines the timeout & deadline policies when pushing
  // schemas to the local `redis`; i.e. it is forwarded to the
  // `TranscoderSetter` function that is passed below
  context.Background(),
  // the schemas found in memory will be versioned using a MD5 hash
  // algorithm, prefixed by the 'PROT-' string
  protoscan.MD5, "PROT-",
  // configure the `Transcoder` to push every protobuf schema it can find
  // in memory into the specified `redis` connection pool
  TranscoderOptSetter(NewTranscoderSetterRedis(p)),
  // configure the `Transcoder` to query the given `redis` connection pool
  // when it cannot find a specific protobuf schema in its local cache
  TranscoderOptGetter(NewTranscoderGetterRedis(p)),
)

Now that the Transcoder has been initialized, the local redis datastore should contain all the protobuf schemas that were sniffed from memory, as defined by their respective versioning hashes:

$ docker run -it --link schema-reg:redis --rm redis:3.2 redis-cli -h redis -p 6379 -c KEYS '*'

  1) "PROT-31c64ad1c6476720f3afee6881e6f257"
  2) "PROT-56b347c6c212d3176392ab9bf5bb21ee"
  3) "PROT-c2dbc910081a372f31594db2dc2adf72"
  4) "PROT-09595a7e58d28b081d967b69cb00e722"
  5) "PROT-05dc5bd440d980600ecc3f1c4a8e315d"
  6) "PROT-8cbb4e79fdeadd5f0ff0971bbf7de31e"
     ... etc ...

Encoding stuff

We'll create a simple object for testing encoding & decoding functions, using the TestSchemaXXX protobuf schema defined here:

obj := &test.TestSchemaXXX{
  Ids: map[int32]string{
    42:  "the-answer",
    666: "the-devil",
  },
}

Encoding is nothing spectacular, the Transcoder will hide the "hard" work of bundling the versioning metadata within the payload:

// wrap the object and its versioning metadata within a `ProtobufPayload` object,
// then serialize the bundle as a protobuf binary blob
payload, err := trc.Encode(obj)
if err != nil {
  log.Fatal(err)
}

Decoding stuff

We'll try to decode the previous payload into the following object:

var myObj test.TestSchemaXXX

Trying to decode the payload using the standard proto.Unmarshal method will fail in quite cryptic ways since vanilla protobuf is unaware of how Protein bundles the versioning metadata within the payload... Don't do this!

_ = proto.Unmarshal(payload, &myObj) // NOPE NOPE NOPE!

Using the Transcoder, on the other hand, will allow to properly unbundle the data from the metadata before unmarshalling the payload:

err = trc.DecodeAs(payload, &myObj)
if err != nil {
  log.Fatal(err)
}
fmt.Println(myObj.Ids[42]) // prints the answer!

Decoding stuff dynamically (!)

Runtime-decoding does not require any more effort on the part of the end-user than simple decoding does, although a lot of stuff is actually happening behind-the-scenes:

  1. the versioning metadata is extracted from the payload
  2. the corresponding schema as well as its dependencies are lazily fetched from the redis datastore if they're not already available in the local cache (using the TranscoderGetter that was passed to the constructor)
  3. a structure-type definition is created from these schemas using Go's reflection APIs, with the right protobuf tags & hints for the protobuf deserializer to do its thing
  4. an instance of this structure is created, then the payload is unmarshalled into it
myRuntimeObj, err := trc.Decode(context.Background(), payload)
if err != nil {
  log.Fatal(err)
}
myRuntimeIDs := myRuntimeObj.Elem().FieldByName("IDs")
fmt.Println(myRuntimeIDs.MapIndex(reflect.ValueOf(int32(666)))) // prints the devil!
Error handling

Protein uses the pkg/errors package to handle error propagation throughout the call stack; please take a look at the related documentation for more information on how to properly handle these errors.

Logging

Protein rarely logs, but when it does, it uses the global logger from Uber's Zap package.
You can thus control the behavior of Protein's logger however you like by calling zap.ReplaceGlobals at your convenience.

For more information, see Zap's documentation.

Monitoring

Protein does not offer any kind of monitoring hooks, yet.

Performance

Configuration:

MacBook Pro (Retina, 15-inch, Mid 2015)
Intel(R) Core(TM) i7-4870HQ CPU @ 2.50GHz

Encoding:

## gogo/protobuf ##

BenchmarkTranscoder_Encode/gogo/protobuf        300000    4287 ns/op
BenchmarkTranscoder_Encode/gogo/protobuf-2     1000000    2195 ns/op
BenchmarkTranscoder_Encode/gogo/protobuf-4     1000000    1268 ns/op
BenchmarkTranscoder_Encode/gogo/protobuf-8     1000000    1258 ns/op
BenchmarkTranscoder_Encode/gogo/protobuf-24    1000000    1536 ns/op

## znly/protein ##

BenchmarkTranscoder_Encode/znly/protein         300000    5556 ns/op
BenchmarkTranscoder_Encode/znly/protein-2       500000    2680 ns/op
BenchmarkTranscoder_Encode/znly/protein-4      1000000    1638 ns/op
BenchmarkTranscoder_Encode/znly/protein-8      1000000    1798 ns/op
BenchmarkTranscoder_Encode/znly/protein-24     1000000    2288 ns/op

Decoding:

## gogo/protobuf ##

BenchmarkTranscoder_DecodeAs/gogo/protobuf      300000    5970 ns/op
BenchmarkTranscoder_DecodeAs/gogo/protobuf-2    500000    3226 ns/op
BenchmarkTranscoder_DecodeAs/gogo/protobuf-4   1000000    2125 ns/op
BenchmarkTranscoder_DecodeAs/gogo/protobuf-8   1000000    2015 ns/op
BenchmarkTranscoder_DecodeAs/gogo/protobuf-24  1000000    2380 ns/op

## znly/protein ##

BenchmarkTranscoder_Decode/znly/protein        200000     6777 ns/op
BenchmarkTranscoder_Decode/znly/protein-2      500000     3986 ns/op
BenchmarkTranscoder_Decode/znly/protein-4      500000     2630 ns/op
BenchmarkTranscoder_Decode/znly/protein-8      500000     2973 ns/op
BenchmarkTranscoder_Decode/znly/protein-24     500000     3037 ns/op

See transcoder_test.go for the actual benchmarking code.

Contributing

Contributions of any kind are welcome; whether it is to fix a bug, clarify some documentation/comments or simply correct english mistakes and typos: do feel free to send us a pull request.

Protein is pretty-much frozen in terms of features; if you still find it to be lacking something, please file an issue to discuss it first.
Also, do not hesitate to open an issue if some piece of documentation looks either unclear or incomplete to you, nay is just missing entirely.

Code contributions must be thoroughly tested and documented.

Running tests
$ docker-compose -f test/docker-compose.yml up
$ ## wait for the datastores to be up & running, then
$ make test
Running benchmarks
$ make bench

Authors

See AUTHORS for the list of contributors.

See also

License License

The Apache License version 2.0 (Apache2) - see LICENSE for more details.

Copyright (c) 2017 Zenly hello@zen.ly @zenlyapp

Documentation

Overview

Package protein is an encoding/decoding library for Protobuf that comes with schema-versioning and runtime-decoding capabilities.

It has diverse use-cases, including but not limited to:

  • setting up schema registries
  • decoding Protobuf payloads without the need to know their schema at compile-time
  • identifying & preventing applicative bugs and data corruption issues
  • creating custom-made container formats for on-disk storage
  • ...and more!

Package protein is a generated protocol buffer package.

It is generated from these files:

github.com/znly/protein/protobuf/protobuf_payload.proto
github.com/znly/protein/protobuf/protobuf_schema.proto

It has these top-level messages:

ProtobufPayload
ProtobufSchema
Example

This example demonstrates the use the Protein package in order to: initialize a `Transcoder`, sniff the local protobuf schemas from memory, synchronize the local schema-database with a remote datastore (here `redis`), use a `Transcoder` to encode & decode protobuf payloads using an already known schema, use a `Transcoder` to decode protobuf payloads without any prior knowledge of their schema.

// A local `redis` server must be up & running for this example to work:
//
//   $ docker run -p 6379:6379 --name my-redis --rm redis:3.2 redis-server

// open up a new `redis` connection pool
redisURI := os.Getenv("PROT_REDIS_URI")
p := &redis.Pool{
	Dial: func() (redis.Conn, error) {
		return redis.DialURL(redisURI)
	},
}
defer p.Close()

/* INITIALIZATION */

// initialize a `Transcoder` that is transparently kept in-sync with
// a `redis` datatore.
trc, err := NewTranscoder(
	// this context defines the timeout & deadline policies when pushing
	// schemas to the local `redis`; i.e. it is forwarded to the
	// `TranscoderSetter` function that is passed below
	context.Background(),
	// the schemas found in memory will be versioned using a MD5 hash
	// algorithm, prefixed by the 'PROT-' string
	protoscan.MD5, "PROT-",
	// configure the `Transcoder` to push every protobuf schema it can find
	// in memory into the specified `redis` connection pool
	TranscoderOptSetter(NewTranscoderSetterRedis(p)),
	// configure the `Transcoder` to query the given `redis` connection pool
	// when it cannot find a specific protobuf schema in its local cache
	TranscoderOptGetter(NewTranscoderGetterRedis(p)))
if err != nil {
	panic(err)
}

// At this point, the local `redis` datastore should contain all the
// protobuf schemas known to the `Transcoder`, as defined by their respective
// versioning hashes:
//
//   $ docker run -it --link my-redis:redis --rm redis:3.2 redis-cli -h redis -p 6379 -c KEYS '*'
//
//   1) "PROT-31c64ad1c6476720f3afee6881e6f257"
//   2) "PROT-56b347c6c212d3176392ab9bf5bb21ee"
//   3) "PROT-c2dbc910081a372f31594db2dc2adf72"
//   4) "PROT-09595a7e58d28b081d967b69cb00e722"
//   5) "PROT-05dc5bd440d980600ecc3f1c4a8e315d"
//   6) "PROT-8cbb4e79fdeadd5f0ff0971bbf7de31e"
//   ... etc ...

/* ENCODING */

// create a simple object to be serialized
ts, _ := types.TimestampProto(time.Now())
obj := &test.TestSchemaXXX{
	Ids: map[int32]string{
		42:  "the-answer",
		666: "the-devil",
	},
	Ts: *ts,
}

// wrap the object and its versioning metadata within a `ProtobufPayload`
// object, then serialize the bundle as a protobuf binary blob
payload, err := trc.Encode(obj)
if err != nil {
	log.Fatal(err)
}

/* DECODING */

var myObj test.TestSchemaXXX

// /!\ This will fail in cryptic ways since vanilla protobuf is unaware
// of how Protein bundles the versioning metadata within the payload...
// Don't do this!
_ = proto.Unmarshal(payload, &myObj)

// this will properly unbundle the data from the metadata before
// unmarshalling the payload
err = trc.DecodeAs(payload, &myObj)
if err != nil {
	log.Fatal(err)
}
fmt.Println("A:", myObj.Ids[42]) // prints the answer!

/* RUNTIME-DECODING */

// empty the content of the local cache of protobuf schemas in order to
// make sure that the `Transcoder` will have to lazily fetch the schema
// and its dependencies from the `redis` datastore during the decoding
trc.sm = NewSchemaMap()

// the `Transcoder` will do a lot of stuff behind the scenes so it can
// successfully decode the payload:
// 1. the versioning metadata is extracted from the payload
// 2. the corresponding schema as well as its dependencies are lazily
//    fetched from the `redis` datastore (using the `TranscoderGetter` that
//    was passed to the constructor)
// 3. a structure-type definition is created from these schemas using Go's
//    reflection APIs, with the right protobuf tags & hints for the protobuf
//    deserializer to do its thing
// 4. an instance of this structure is created, then the payload is
//    unmarshalled into it
myRuntimeObj, err := trc.Decode(context.Background(), payload)
if err != nil {
	log.Fatal(err)
}
myRuntimeIDs := myRuntimeObj.Elem().FieldByName("IDs")
fmt.Println("B:", myRuntimeIDs.MapIndex(reflect.ValueOf(int32(666)))) // prints the devil!
Output:

A: the-answer
B: the-devil

Index

Examples

Constants

This section is empty.

Variables

View Source
var (
	// TranscoderOptGetter is used to configure the `TranscoderGetter` used by
	// the `Transcoder`.
	// See `TranscoderGetter` documentation for more information.
	TranscoderOptGetter = func(getter TranscoderGetter) TranscoderOpt {
		return func(trc *Transcoder) { trc.getter = getter }
	}
	// TranscoderOptSetter is used to configure the `TranscoderSetter` used by
	// the `Transcoder`.
	// See `TranscoderSetter` documentation for more information.
	TranscoderOptSetter = func(setter TranscoderSetter) TranscoderOpt {
		return func(trc *Transcoder) { trc.setter = setter }
	}
	// TranscoderOptSetterMulti is used to configure the `TranscoderSetterMulti`
	// used by the `Transcoder`.
	// See `TranscoderSetterMulti` documentation for more information.
	TranscoderOptSetterMulti = func(setterM TranscoderSetterMulti) TranscoderOpt {
		return func(trc *Transcoder) { trc.setterMulti = setterM }
	}
	// TranscoderOptSerializer is used to configure the `TranscoderSerializer`
	// used by the `Transcoder`.
	// See `TranscoderSerializer` documentation for more information.
	TranscoderOptSerializer = func(serializer TranscoderSerializer) TranscoderOpt {
		return func(trc *Transcoder) { trc.serializer = serializer }
	}
	// TranscoderOptDeserializer is used to configure the `TranscoderDeserializer`
	// used by the `Transcoder`.
	// See `TranscoderDeserializer` documentation for more information.
	TranscoderOptDeserializer = func(deserializer TranscoderDeserializer) TranscoderOpt {
		return func(trc *Transcoder) { trc.deserializer = deserializer }
	}
)

Functions

func CreateStructType

func CreateStructType(schemaUID string, sm *SchemaMap) (reflect_raw.Type, error)

CreateStructType constructs a new structure-type definition at runtime from a `ProtobufSchema` tree.

This newly-created structure embeds all the necessary tags & hints for the protobuf SDK to deserialize payloads into it. I.e., it allows for runtime-decoding of protobuf payloads.

This is a complex and costly operation, it is strongly recommended to cache the result like the `Transcoder` does.

It requires the new `reflect` APIs provided by Go 1.7+.

Example

This simple example demonstrates how to use the protostruct API in order to create a structure-type at runtime for a given protobuf schema, specified by its fully-qualified name.

// sniff all of local protobuf schemas and store them in a `SchemaMap`
sm, err := ScanSchemas(protoscan.SHA256, "PROT-")
if err != nil {
	zap.L().Fatal(err.Error())
}

// create a structure-type definition for the '.test.TestSchemaXXX'
// protobuf schema
structType, err := CreateStructType(
	sm.GetByFQName(".test.TestSchemaXXX").SchemaUID, sm)
if err != nil {
	zap.L().Fatal(err.Error())
}

// pretty-print the resulting structure-type
structType = Clean(structType) // remove tags to ease reading
b, err := format.Source(       // gofmt
	[]byte(fmt.Sprintf("type TestSchemaXXX %s", structType)))
if err != nil {
	zap.L().Fatal(err.Error())
}
fmt.Println(string(b))
Output:

type TestSchemaXXX struct {
	SchemaUID string
	FQNames   []string
	Weathers  []int32
	TSStd     time.Time
	DurStd    []*time.Duration
	Deps      map[string]*struct {
		Key   string
		Value string
	}
	IDs map[int32]string
	TS  struct {
		Seconds int64
		Nanos   int32
	}
	Ots *struct {
		TS *struct {
			Seconds int64
			Nanos   int32
		}
	}
	Nss []struct {
		Key   string
		Value string
	}
}

Types

type ProtobufPayload

type ProtobufPayload struct {
	// SchemaUID is the unique, deterministic & versioned identifier of the
	// `payload`'s schema.
	SchemaUID string `protobuf:"bytes,1,opt,name=schema_uid,json=schemaUid,proto3" json:"schema_uid,omitempty"`
	// Payload is the actual, marshaled protobuf payload.
	Payload []byte `protobuf:"bytes,2,opt,name=payload,proto3" json:"payload,omitempty"`
}

ProtobufPayload is a protobuf payload annotated with the unique versioning identifier of its schema.

This allows a `ProtobufPayload` to be decoded at runtime using Protein's `Transcoder`.

See `ScanSchemas`'s documentation for more information.

func (*ProtobufPayload) Descriptor

func (*ProtobufPayload) Descriptor() ([]byte, []int)

func (*ProtobufPayload) GetPayload

func (m *ProtobufPayload) GetPayload() []byte

func (*ProtobufPayload) GetSchemaUID

func (m *ProtobufPayload) GetSchemaUID() string

func (*ProtobufPayload) GoString

func (this *ProtobufPayload) GoString() string

func (*ProtobufPayload) ProtoMessage

func (*ProtobufPayload) ProtoMessage()

func (*ProtobufPayload) Reset

func (m *ProtobufPayload) Reset()

func (*ProtobufPayload) String

func (m *ProtobufPayload) String() string

type ProtobufSchema

type ProtobufSchema struct {
	// SchemaUID is the unique, deterministic & versioned identifier of this
	// schema.
	SchemaUID string `protobuf:"bytes,1,opt,name=schema_uid,json=schemaUid,proto3" json:"schema_uid,omitempty"`
	// FQName is the fully-qualified name of this schema,
	// e.g. `.google.protobuf.Timestamp`.
	FQName string `protobuf:"bytes,2,opt,name=fq_name,json=fqName,proto3" json:"fq_name,omitempty"`
	// Descriptor is either a Message or an Enum protobuf descriptor.
	//
	// Types that are valid to be assigned to Descr:
	//	*ProtobufSchema_Message
	//	*ProtobufSchema_Enum
	Descr isProtobufSchema_Descr `protobuf_oneof:"descr"`
	// Deps contains every direct and indirect dependencies that this schema
	// relies on.
	//
	// Key: the dependency's `schemaUID`
	// Value: the dependency's fully-qualified name
	Deps map[string]string `` /* 132-byte string literal not displayed */
}

ProtobufSchema is a versioned protobuf Message or Enum descriptor that can be used to decode `ProtobufPayload`s at runtime.

See `ScanSchemas`'s documentation for more information.

func (*ProtobufSchema) Descriptor

func (*ProtobufSchema) Descriptor() ([]byte, []int)

func (*ProtobufSchema) GetDeps

func (m *ProtobufSchema) GetDeps() map[string]string

func (*ProtobufSchema) GetDescr

func (m *ProtobufSchema) GetDescr() isProtobufSchema_Descr

func (*ProtobufSchema) GetEnum

func (*ProtobufSchema) GetFQName

func (m *ProtobufSchema) GetFQName() string

func (*ProtobufSchema) GetMessage

func (*ProtobufSchema) GetSchemaUID

func (m *ProtobufSchema) GetSchemaUID() string

func (*ProtobufSchema) GoString

func (this *ProtobufSchema) GoString() string

func (*ProtobufSchema) ProtoMessage

func (*ProtobufSchema) ProtoMessage()

func (*ProtobufSchema) Reset

func (m *ProtobufSchema) Reset()

func (*ProtobufSchema) String

func (m *ProtobufSchema) String() string

func (*ProtobufSchema) XXX_OneofFuncs

func (*ProtobufSchema) XXX_OneofFuncs() (func(msg proto.Message, b *proto.Buffer) error, func(msg proto.Message, tag, wire int, b *proto.Buffer) (bool, error), func(msg proto.Message) (n int), []interface{})

XXX_OneofFuncs is for the internal use of the proto package.

type ProtobufSchema_Enum

type ProtobufSchema_Enum struct {
	Enum *google_protobuf.EnumDescriptorProto `protobuf:"bytes,31,opt,name=enm,oneof"`
}

func (*ProtobufSchema_Enum) GoString

func (this *ProtobufSchema_Enum) GoString() string

type ProtobufSchema_Message

type ProtobufSchema_Message struct {
	Message *google_protobuf.DescriptorProto `protobuf:"bytes,30,opt,name=msg,oneof"`
}

func (*ProtobufSchema_Message) GoString

func (this *ProtobufSchema_Message) GoString() string

type SchemaMap

type SchemaMap struct {
	// contains filtered or unexported fields
}

SchemaMap is a thread-safe mapping & reverse-mapping of `ProtobufSchema`s.

It atomically maintains two data-structures in parallel: a map of schemaUIDs to `ProtobufSchema`s, and a map of fully-qualified schema names to schemaUIDs.

The `SchemaMap` is the main data-structure behind a `Transcoder`, used to store and retrieve every `ProtobufSchema`s that have been cached locally.

func LoadSchemas added in v1.2.1

func LoadSchemas(fileDescriptorProtos map[string][]byte,
	hasher protoscan.Hasher, hashPrefix string, failOnDuplicate ...bool,
) (*SchemaMap, error)

LoadSchemas is the exact same thing as `ScanSchemas` except for the fact that it uses the given set of user-specified `fileDescriptorProtos` instead of scanning for matching symbols.

See `ScanSchemas`'s documentation for more information.

func NewSchemaMap

func NewSchemaMap() *SchemaMap

NewSchemaMap returns a new SchemaMap with its maps & locks pre-allocated.

func ScanSchemas

func ScanSchemas(
	hasher protoscan.Hasher, hashPrefix string, failOnDuplicate ...bool,
) (*SchemaMap, error)

ScanSchemas retrieves every protobuf schema instanciated by any of the currently loaded protobuf libraries (e.g. `golang/protobuf`, `gogo/protobuf`...), computes the dependency graphs that link them, builds the `ProtobufSchema` objects then returns a new `SchemaMap` filled with all those schemas.

A `ProtobufSchema` is a data-structure that holds a protobuf descriptor as well as a map of all its dependencies' schemaUIDs. A schemaUID uniquely & deterministically identifies a protobuf schema based on its descriptor and all of its dependencies' descriptors. It essentially is the versioned identifier of a schema. For more information, see `ProtobufSchema`'s documentation as well as the `protoscan` implementation, especially `descriptor_tree.go`.

The specified `hasher` is the actual function used to compute these schemaUIDs, see the `protoscan.Hasher` documentation for more information. The `hashPrefix` string will be preprended to the resulting hash that was computed via the `hasher` function. E.g. by passing `protoscan.MD5` as a `Hasher` and `PROT-` as a `hashPrefix`, the resulting schemaUIDs will be of the form 'PROT-<MD5hex>'.

As a schema and/or its dependencies follow their natural evolution, each and every historic version of them will thus have been stored with their own unique identifiers.

`failOnDuplicate` is an optional parameter that defaults to true; have a look at `ScanSchemas` implementation to understand what it does and when (if ever) would you need to set it to false instead.

Finally, have a look at the `protoscan` sub-packages as a whole for more information about how all of this machinery works; the code is heavily documented.

Example

This example demonstrates how to use Protein's `ScanSchemas` function in order to sniff all the locally instanciated schemas into a `SchemaMap`, then walk over this map in order to print each schema as well as its dependencies.

// sniff local protobuf schemas into a `SchemaMap` using a MD5 hasher,
// and prefixing each resulting UID with 'PROT-'
sm, err := ScanSchemas(protoscan.MD5, "PROT-")
if err != nil {
	zap.L().Fatal(err.Error())
}

// walk over the map to print the schemas and their respective dependencies
var output string
sm.ForEach(func(ps *ProtobufSchema) error {
	// discard schemas not orginating from protein's test package
	if !strings.HasPrefix(ps.GetFQName(), ".test") {
		return nil
	}
	output += fmt.Sprintf("[%s] %s\n", ps.GetSchemaUID(), ps.GetFQName())
	for uid, name := range ps.GetDeps() {
		// discard dependencies not orginating from protein's test package
		if strings.HasPrefix(name, ".test") {
			output += fmt.Sprintf(
				"[%s] depends on: [%s] %s\n", ps.GetSchemaUID(), uid, name,
			)
		}
	}
	return nil
})

// sort rows so the output stays deterministic
rows := strings.Split(output, "\n")
sort.Strings(rows)
for i, r := range rows { // prettying
	if strings.Contains(r, "depends on") {
		rows[i] = "\t" + strings.Join(strings.Split(r, " ")[1:], " ")
	}
}
output = strings.Join(rows, "\n")
fmt.Println(output)
Output:

[PROT-048ddab197df688302a76296293ba101] .test.OtherTestSchemaXXX
[PROT-1ed0887b99e22551676141f133ee3813] .test.TestSchemaXXX
	depends on: [PROT-048ddab197df688302a76296293ba101] .test.OtherTestSchemaXXX
	depends on: [PROT-393cb6dc1b4fc350cf10ca99f429301d] .test.TestSchemaXXX.WeatherType
	depends on: [PROT-6926276ca6306966d1a802c3b8f75298] .test.TestSchemaXXX.IdsEntry
	depends on: [PROT-c43da9745d68bd3cb97dc0f4905f3279] .test.TestSchemaXXX.NestedEntry
	depends on: [PROT-f6be24770f6e8d5edc8ef12c94a23010] .test.TestSchemaXXX.DepsEntry
[PROT-393cb6dc1b4fc350cf10ca99f429301d] .test.TestSchemaXXX.WeatherType
[PROT-3fecf73710581dfb3f46718988b9316e] .test.TestSchema.GhostType
[PROT-4f6928d2737ba44dac0e3df123f80284] .test.TestSchema.DepsEntry
[PROT-6926276ca6306966d1a802c3b8f75298] .test.TestSchemaXXX.IdsEntry
[PROT-8b244a1a35e88f1e1aad8915dd603021] .test.TestSchema
	depends on: [PROT-3fecf73710581dfb3f46718988b9316e] .test.TestSchema.GhostType
	depends on: [PROT-4f6928d2737ba44dac0e3df123f80284] .test.TestSchema.DepsEntry
[PROT-c43da9745d68bd3cb97dc0f4905f3279] .test.TestSchemaXXX.NestedEntry
[PROT-f6be24770f6e8d5edc8ef12c94a23010] .test.TestSchemaXXX.DepsEntry
	depends on: [PROT-c43da9745d68bd3cb97dc0f4905f3279] .test.TestSchemaXXX.NestedEntry

func (*SchemaMap) Add

func (sm *SchemaMap) Add(schemas map[string]*ProtobufSchema) *SchemaMap

Add walks over the given map of `schemas` and add them to the `SchemaMap` while making sure to atomically maintain both the internal map and reverse-map.

func (*SchemaMap) ForEach

func (sm *SchemaMap) ForEach(f func(s *ProtobufSchema) error) error

ForEach applies the specified function `f` to every entry in the `SchemaMap`.

ForEach is guaranteed to see a consistent view of the internal mapping.

func (*SchemaMap) GetByFQName

func (sm *SchemaMap) GetByFQName(fqName string) *ProtobufSchema

GetByFQName returns the first `ProtobufSchema`s that matches the specified fully-qualified name (e.g. `.google.protobuf.timestamp`).

If more than one version of the same schema are stored in the `SchemaMap`, a fully-qualified name will naturally point to several distinct schemaUIDs. When this happens, `GetByFQName` will always return the first one to had been inserted in the map.

This is thread-safe.

func (*SchemaMap) GetByUID

func (sm *SchemaMap) GetByUID(schemaUID string) *ProtobufSchema

GetByUID returns the `ProtobufSchema` associated with the specified `schemaUID`.

This is thread-safe.

func (*SchemaMap) Size

func (sm *SchemaMap) Size() int

Size returns the number of schemas stored in the `SchemaMap`.

type Transcoder

type Transcoder struct {
	// contains filtered or unexported fields
}

A Transcoder is a protobuf encoder/decoder with schema-versioning as well as runtime-decoding capabilities.

func NewTranscoder

func NewTranscoder(ctx context.Context,
	hasher protoscan.Hasher, hashPrefix string, opts ...TranscoderOpt,
) (*Transcoder, error)

NewTranscoder returns a new `Transcoder`.

See `ScanSchemas`'s documentation for more information regarding the use of `hasher` and `hashPrefix`.

See `TranscoderOpt`'s documentation for the list of available options.

The given context is passed to the user-specified `TranscoderSetter`, if any.

func NewTranscoderFromSchemaMap added in v1.2.1

func NewTranscoderFromSchemaMap(ctx context.Context,
	sm *SchemaMap, opts ...TranscoderOpt,
) (*Transcoder, error)

NewTranscoderFromSchemaMap returns a new `Transcoder` backed by the user-specified `sm` schema-map. This is reserved for advanced usages.

When using the vanilla `NewTranscoder` constructor, the schema-map is internally computed using the `ScanSchemas` function of the protoscan API. This constructor allows the developer to build this map themselves when needed; more often than not, this is achieved by using the `LoadSchemas` function from the protoscan API.

func (*Transcoder) Decode

func (t *Transcoder) Decode(
	ctx context.Context, payload []byte,
) (reflect.Value, error)

Decode decodes the given protein-encoded `payload` into a dynamically generated structure-type.

It is used when you need to work with protein-encoded data in a completely agnostic way (e.g. when you merely know the respective names of the fields you're interested in, such as a generic data-enricher for example).

When decoding a specific version of a schema for the first-time in the lifetime of a `Transcoder`, a structure-type must be created from the dependency tree of this schema. This is a costly operation that involves a lot of reflection, see `CreateStructType` documentation for more information. Fortunately, the resulting structure-type is cached so that it can be freely re-used by later calls to `Decode`; i.e. you pay the price only once.

Also, when trying to decode a specific schema for the first-time, `Decode` might not have all of the dependencies directly available in its local `SchemaMap`, in which case it will call the user-defined `TranscoderGetter` in the hope that it might return these missing dependencies. This user-defined function may or may not do some kind of I/O; the given context will be passed to it.

Once again, this price is paid only once.

func (*Transcoder) DecodeAs

func (t *Transcoder) DecodeAs(payload []byte, msg proto.Message) error

DecodeAs decodes the given protein-encoded `payload` into the specified protobuf `Message` using the standard protobuf methods, thus bypassing all of the runtime-decoding and schema versioning machinery.

It is very often used when you need to work with protein-encoded data in a non-agnostic way (i.e. when you know beforehand how you want to decode and interpret the data).

`DecodeAs` basically adds zero overhead compared to a straightforward `proto.Unmarshal` call.

`DecodeAs` never does any kind of I/O.

func (*Transcoder) Encode

func (t *Transcoder) Encode(msg proto.Message, fqName ...string) ([]byte, error)

Encode bundles the given protobuf `Message` and its associated versioning metadata together within a `ProtobufPayload`, marshals it all together in a byte-slice then returns the result.

`Encode` needs the message's fully-qualified name in order to reverse-lookup its schemaUID (i.e. its versioning hash).

In order to find this name, it will look at different places until either one of those does return a result or none of them does, in which case the encoding will fail. In search order, those places are: 1. first, the `fqName` parameter is checked; if it isn't set, then 2. the `golang/protobuf` package is queried for the FQN; if it isn't available there then 3. finally, the `gogo/protobuf` package is queried too, as a last resort.

Note that a single fully-qualified name might point to multiple schemaUIDs if multiple versions of that schema are currently available in the `SchemaMap`. When this happens, the first schemaUID from the list will be used, which corresponds to the first version of the schema to have ever been added to the local `SchemaMap` (i.e. the oldest one).

func (*Transcoder) FQName added in v1.3.2

func (t *Transcoder) FQName(ctx context.Context, schemaUID string) string

FQName returns the fully-qualified name of the protobuf schema associated with `schemaUID`.

Iff this schema cannot be found in the local cache, it'll try and fetch it from the remote registry via a call to `GetAndUpsert`.

An empty string is returned if the schema is found neither locally nor remotely.

func (*Transcoder) GetAndUpsert added in v1.1.0

func (t *Transcoder) GetAndUpsert(
	ctx context.Context, schemaUID string,
) (map[string]*ProtobufSchema, error)

GetAndUpsert retrieves the `ProtobufSchema` associated with the specified `schemaUID`, plus all of its direct & indirect dependencies.

The retrieval process is done in two steps:

- First, the root schema, as identified by `schemaUID`, is fetched from the local `SchemaMap`; if it cannot be found in there, it'll try to retrieve it via the user-defined `TranscoderGetter`, as passed to the constructor of the `Transcoder`. If it cannot be found in there either, then a schema-not-found error is returned.

- Second, this exact same process is applied for every direct & indirect dependency of the root schema. Once again, a schema-not-found error is returned if one or more dependency couldn't be found (the returned error does indicate which of them).

The `ProtobufSchema`s found during this process are both: - added to the local `SchemaMap` so that they don't need to be searched for ever again during the lifetime of this `Transcoder`, and - returned to the caller as flattened map.

func (*Transcoder) LoadState added in v1.5.0

func (t *Transcoder) LoadState(path string) error

LoadState loads the state of the Transcoder from disk. The current state is not overwritten, it is merely appended to.

The on-disk format is the following:

PROT-xxx:::base64(schema1)\nPROT-xxx:::base64(schema2)\n...

This can be useful in situations such as shell implementations or CLI tools, where you don't want to re-fetch all the schemas you depend on every restart. This is in absolutely no way designed with performance in mind.

func (*Transcoder) SaveState added in v1.5.0

func (t *Transcoder) SaveState(path string) error

SaveState saves the current state of the Transcoder to disk.

The on-disk format is the following:

PROT-xxx:::base64(schema1)\nPROT-xxx:::base64(schema2)\n...

This can be useful in situations such as shell implementations or CLI tools, where you don't want to re-fetch all the schemas you depend on every restart. This is in absolutely no way designed with performance in mind.

type TranscoderDeserializer

type TranscoderDeserializer func(payload []byte, ps *ProtobufSchema) error

A TranscoderDeserializer is used to deserialize the payloads returned by a `TranscoderGetter` into a `ProtobufSchema`. See `TranscoderGetter` documentation for more information.

The default `TranscoderDeserializer` unwraps the schema from its `ProtobufPayload` wrapper; i.e. it uses Protein's decoding to decode the schema.

type TranscoderGetter

type TranscoderGetter func(ctx context.Context, schemaUID string) ([]byte, error)

A TranscoderGetter is called by the `Transcoder` when it cannot find a specific `schemaUID` in its local `SchemaMap`.

The function returns a byte-slice that will be deserialized into a `ProtobufSchema` by a `TranscoderDeserializer` (see below).

A `TranscoderGetter` is typically used to fetch `ProtobufSchema`s from a remote data-store. To that end, several ready-to-use implementations are provided by this package for different protocols: memcached, redis & CQL (i.e. cassandra). See `transcoder_helpers.go` for more information.

The default `TranscoderGetter` always returns a not-found error.

func NewTranscoderGetterCassandra

func NewTranscoderGetterCassandra(s *gocql.Session,
	table, keyCol, dataCol string,
) TranscoderGetter

NewTranscoderGetterCassandra returns a `TranscoderGetter` suitable for querying a binary blob from a cassandra-compatible store.

The <table> column-family is expected to have (at least) the following columns:

TABLE (
  <keyCol> ascii,
  <dataCol> blob,
PRIMARY KEY (<keyCol>))

The given context is forwarded to `gocql`.

func NewTranscoderGetterMemcached

func NewTranscoderGetterMemcached(c *memcache.Client) TranscoderGetter

NewTranscoderGetterMemcached returns a `TranscoderGetter` suitable for querying a binary blob from a memcached-compatible store.

The specified context will be ignored.

func NewTranscoderGetterRedis

func NewTranscoderGetterRedis(p *redis.Pool) TranscoderGetter

NewTranscoderGetterRedis returns a `TranscoderGetter` suitable for querying a binary blob from a redis-compatible store.

The specified context will be ignored.

type TranscoderOpt

type TranscoderOpt func(trc *Transcoder)

A TranscoderOpt is passed to the `Transcoder` constructor to configure various options.

type TranscoderSerializer

type TranscoderSerializer func(ps *ProtobufSchema) ([]byte, error)

A TranscoderSerializer is used to serialize `ProtobufSchema`s before passing them to a `TranscoderSetter`. See `TranscoderSetter` documentation for more information.

The default `TranscoderSerializer` wraps the schema within a `ProtobufPayload`; i.e. it uses Protein's encoding to encode the schema.

type TranscoderSetter

type TranscoderSetter func(ctx context.Context, schemaUID string, payload []byte) error

A TranscoderSetter is called by the `Transcoder` for every schema that it can find in memory.

The function receives a byte-slice that corresponds to a `ProtobufSchema` which has been previously serialized by a `TranscoderSerializer` (see below).

A `TranscoderSetter` is typically used to push the local `ProtobufSchema`s sniffed from memory into a remote data-store. To that end, several ready-to-use implementations are provided by this package for different protocols: memcached, redis & CQL (i.e. cassandra). See `transcoder_helpers.go` for more information.

The default `TranscoderSetter` is a no-op.

func NewTranscoderSetterCassandra

func NewTranscoderSetterCassandra(s *gocql.Session,
	table, keyCol, dataCol string,
) TranscoderSetter

NewTranscoderSetterCassandra returns a `TranscoderSetter` suitable for setting a binary blob into a redis-compatible store.

The <table> column-family is expected to have (at least) the following columns:

TABLE (
  <keyCol> ascii,
  <dataCol> blob,
PRIMARY KEY (<keyCol>))

The given context is forwarded to `gocql`.

func NewTranscoderSetterMemcached

func NewTranscoderSetterMemcached(c *memcache.Client) TranscoderSetter

NewTranscoderSetterMemcached returns a `TranscoderSetter` suitable for setting a binary blob into a memcached-compatible store.

The specified context will be ignored.

func NewTranscoderSetterRedis

func NewTranscoderSetterRedis(p *redis.Pool) TranscoderSetter

NewTranscoderSetterRedis returns a `TranscoderSetter` suitable for setting a binary blob into a redis-compatible store.

The specified context will be ignored.

type TranscoderSetterMulti added in v1.8.0

type TranscoderSetterMulti func(
	ctx context.Context, schemaUIDs []string, payloads [][]byte,
) error

A TranscoderSetterMulti is called by the `Transcoder` at the end of the initial schema scan in order to publish all the newly found schemas at once.

If a `TranscoderSetterMulti` is set-up, it will take precedence over any vanilla `TranscoderSetter` that might also be configured. In case of failure, the client will fallback to the simple `TranscoderSetter`, if any.

The function receives a slice of byte-slices that corresponds to all the `ProtobufSchema`s that were found in memory, and which have been previously serialized by a `TranscoderSerializer` (see below).

A `TranscoderSetterMulti` is typically used to push the local `ProtobufSchema`s sniffed from memory into a remote data-store. To that end, several ready-to-use implementations are provided by this package for different protocols: redis. See `transcoder_helpers.go` for more information.

Unlike its vanilla `TranscoderSetter` counterpart, this -Multi version guarantees a single round-trip to the remote database, independently of the number of schemas that were fetched from memory; i.e. it guarantees constant latencies.

The default `TranscoderSetterMulti` fallbacks to the simple `TranscoderSetter`.

func NewTranscoderSetterMultiRedis added in v1.8.0

func NewTranscoderSetterMultiRedis(p *redis.Pool) TranscoderSetterMulti

NewTranscoderSetterMultiRedis returns a `TranscoderSetterMulti` suitable for setting a set of binary blobs into a redis-compatible store.

The specified context will be ignored.

Directories

Path Synopsis
Package failure lists all the possible errors that can be returned either by `protein` or any of its sub-packages.
Package failure lists all the possible errors that can be returned either by `protein` or any of its sub-packages.
protobuf
test
Package test is a generated protocol buffer package.
Package test is a generated protocol buffer package.
Package protoscan provides the necessary tools & APIs to find, extract, version and build the dependency trees of all the protobuf schemas that have been instanciated by one or more protobuf library (golang/protobuf, gogo/protobuf...).
Package protoscan provides the necessary tools & APIs to find, extract, version and build the dependency trees of all the protobuf schemas that have been instanciated by one or more protobuf library (golang/protobuf, gogo/protobuf...).

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL