ogórek

package module
v1.2.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 10, 2021 License: MIT Imports: 12 Imported by: 76

README

ogórek

GoDoc Build Status

ogórek is a Go library for encoding and decoding pickles.

Fuzz Testing

Fuzz testing has been implemented for decoder and encoder. To run fuzz tests do the following:

go get github.com/dvyukov/go-fuzz/go-fuzz
go get github.com/dvyukov/go-fuzz/go-fuzz-build
go-fuzz-build github.com/kisielk/og-rek
go-fuzz -bin=./ogórek-fuzz.zip -workdir=./fuzz

Documentation

Overview

Package ogórek(*) is a library for decoding/encoding Python's pickle format.

Use Decoder to decode a pickle from input stream, for example:

d := ogórek.NewDecoder(r)
obj, err := d.Decode() // obj is interface{} representing decoded Python object

Use Encoder to encode an object as pickle into output stream, for example:

e := ogórek.NewEncoder(w)
err := e.Encode(obj)

The following table summarizes mapping of basic types in between Python and Go:

Python	   Go
------	   --

None	↔  ogórek.None
bool	↔  bool
int	↔  int64
int	←  int, intX, uintX
long	↔  *big.Int
float	↔  float64
float	←  floatX
list	↔  []interface{}
tuple	↔  ogórek.Tuple
dict	↔  map[interface{}]interface{}

str        ↔  string         (+)
bytes      ↔  ogórek.Bytes   (~)
bytearray  ↔  []byte

Python classes and instances are mapped to Class and Call, for example:

Python				Go
------	   			--

decimal.Decimal            ↔    ogórek.Class{"decimal", "Decimal"}
decimal.Decimal("3.14")    ↔    ogórek.Call{
					ogórek.Class{"decimal", "Decimal"},
					ogórek.Tuple{"3.14"},
				}

In particular on Go side it is thus by default safe to decode pickles from untrusted sources(^).

Pickle protocol versions

Over the time the pickle stream format was evolving. The original protocol version 0 is human-readable with versions 1 and 2 extending the protocol in backward-compatible way with binary encodings for efficiency. Protocol version 2 is the highest protocol version that is understood by standard pickle module of Python2. Protocol version 3 added ways to represent Python bytes objects from Python3(~). Protocol version 4 further enhances on version 3 and completely switches to binary-only encoding. Protocol version 5 added support for out-of-band data(%). Please see https://docs.python.org/3/library/pickle.html#data-stream-format for details.

On decoding ogórek detects which protocol is being used and automatically handles all necessary details.

On encoding, for compatibility with Python2, by default ogórek produces pickles with protocol 2. Bytes thus, by default, will be unpickled as str on Python2 and as bytes on Python3. If an earlier protocol is desired, or on the other hand, if Bytes needs to be encoded efficiently (protocol 2 encoding for bytes is far from optimal), and compatibility with pure Python2 is not an issue, the protocol to use for encoding could be explicitly specified, for example:

e := ogórek.NewEncoderWithConfig(w, &ogórek.EncoderConfig{
	Protocol: 3,
})
err := e.Encode(obj)

See EncoderConfig.Protocol for details.

Persistent references

Pickle was originally created for serialization in ZODB (http://zodb.org) object database, where on-disk objects can reference each other similarly to how one in-RAM object can have a reference to another in-RAM object.

When a pickle with such persistent reference is decoded, ogórek represents the reference with Ref placeholder similarly to Class and Call. However it is possible to hook into decoding and process such references in application specific way, for example loading the referenced object from the database:

d := ogórek.NewDecoderWithConfig(r, &ogórek.DecoderConfig{
	PersistentLoad: ...
})
obj, err := d.Decode()

Similarly, for encoding, an application can hook into serialization process and turn pointers to some in-RAM objects into persistent references.

Please see DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for details.

--------

(*) ogórek is Polish for "pickle".

(+) for Python2 both str and unicode are decoded into string with Python str being considered as UTF-8 encoded. Correspondingly for protocol ≤ 2 Go string is encoded as UTF-8 encoded Python str, and for protocol ≥ 3 as unicode.

(~) bytes can be produced only by Python3 or zodbpickle (https://pypi.org/project/zodbpickle), not by standard Python2. Respectively, for protocol ≤ 2, what ogórek produces is unpickled as bytes by Python3 or zodbpickle, and as str by Python2.

(^) contrary to Python implementation, where malicious pickle can cause the decoder to run arbitrary code, including e.g. os.system("rm -rf /").

(%) ogórek currently does not support out-of-band data.

Index

Constants

This section is empty.

Variables

View Source
var ErrInvalidPickleVersion = errors.New("invalid pickle version")

Functions

This section is empty.

Types

type Bytes added in v1.1.0

type Bytes string

Bytes represents Python's bytes.

type Call

type Call struct {
	Callable Class
	Args     Tuple
}

Call represents Python's call.

type Class

type Class struct {
	Module, Name string
}

Class represents a Python class.

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder is a decoder for pickle streams.

func NewDecoder

func NewDecoder(r io.Reader) *Decoder

NewDecoder constructs a new Decoder which will decode the pickle stream in r.

func NewDecoderWithConfig added in v1.1.0

func NewDecoderWithConfig(r io.Reader, config *DecoderConfig) *Decoder

NewDecoderWithConfig is similar to NewDecoder, but allows specifying decoder configuration.

func (*Decoder) Decode

func (d *Decoder) Decode() (interface{}, error)

Decode decodes the pickle stream and returns the result or an error.

type DecoderConfig added in v1.1.0

type DecoderConfig struct {
	// PersistentLoad, if !nil, will be used by decoder to handle persistent references.
	//
	// Whenever the decoder finds an object reference in the pickle stream
	// it will call PersistentLoad. If PersistentLoad returns !nil object
	// without error, the decoder will use that object instead of Ref in
	// the resulted built Go object.
	//
	// An example use-case for PersistentLoad is to transform persistent
	// references in a ZODB database of form (type, oid) tuple, into
	// equivalent-to-type Go ghost object, e.g. equivalent to zodb.BTree.
	//
	// See Ref documentation for more details.
	PersistentLoad func(ref Ref) (interface{}, error)
}

DecoderConfig allows to tune Decoder.

type Encoder

type Encoder struct {
	// contains filtered or unexported fields
}

An Encoder encodes Go data structures into pickle byte stream

func NewEncoder

func NewEncoder(w io.Writer) *Encoder

NewEncoder returns a new Encoder struct with default values

func NewEncoderWithConfig added in v1.1.0

func NewEncoderWithConfig(w io.Writer, config *EncoderConfig) *Encoder

NewEncoderWithConfig is similar to NewEncoder, but allows specifying the encoder configuration.

func (*Encoder) Encode

func (e *Encoder) Encode(v interface{}) error

Encode writes the pickle encoding of v to w, the encoder's writer

type EncoderConfig added in v1.1.0

type EncoderConfig struct {
	// Protocol specifies which pickle protocol version should be used.
	Protocol int

	// PersistentRef, if !nil, will be used by encoder to encode objects as persistent references.
	//
	// Whenever the encoders sees pointer to a Go struct object, it will call
	// PersistentRef to find out how to encode that object. If PersistentRef
	// returns nil, the object is encoded regularly. If !nil - the object
	// will be encoded as an object reference.
	//
	// See Ref documentation for more details.
	PersistentRef func(obj interface{}) *Ref
}

EncoderConfig allows to tune Encoder.

type None

type None struct{}

None is a representation of Python's None.

type OpcodeError

type OpcodeError struct {
	Key byte
	Pos int
}

OpcodeError is the error that Decode returns when it sees unknown pickle opcode.

func (OpcodeError) Error

func (e OpcodeError) Error() string

type Ref

type Ref struct {
	// persistent ID of referenced object.
	//
	// used to be string for protocol 0, but "upgraded" to be arbitrary
	// object for later protocols.
	Pid interface{}
}

Ref is the default representation for a Python persistent reference.

Such references are used when one pickle somehow references another pickle in e.g. a database.

See https://docs.python.org/3/library/pickle.html#pickle-persistent for details.

See DecoderConfig.PersistentLoad and EncoderConfig.PersistentRef for ways to tune Decoder and Encoder to handle persistent references with user-specified application logic.

type Tuple

type Tuple []interface{}

Tuple is a representation of Python's tuple.

type TypeError

type TypeError struct {
	// contains filtered or unexported fields
}

func (*TypeError) Error

func (te *TypeError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL