nanojson

package module
v0.1.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 21, 2018 License: MIT Imports: 11 Imported by: 1

README

nanojson pipeline status coverage report GoDoc

Parse JSON in nanoseconds, not microseconds

A WIP JSON decoder/encoder for Go. At the moment parsing works, though it requires a lot of manual work on the user end. Planned features:

  • Encoding of Values
  • Static code generation for marshaling/unmarshaling of types

Why?

Yes, there are already plenty of JSON encoders/decoders for Go out there. jsonparser has a nice list of them, and related benchmarks. As you can see, all except jsonparser require memory allocations, and even for the small payload they take at least a microsecond to complete.

nanojson is here to bring nanoseconds to json. (Read: make parsing usually take less than a microsecond).

Status

Very WIP.

Documentation

Index

Constants

View Source
const (
	KindInvalid uint8 = iota
	KindString
	KindNumber
	KindObject
	KindArray
	KindTrue
	KindFalse
	KindNull
)

nanojson packs all JSON data types into a Value - to do that, we can identify which data type it is by inspecting the Kind field, which will have one of the 9 constants below.

Variables

View Source
var Pools = struct {
	ValueSlice     *sync.Pool
	EncodeStateBuf *sync.Pool
	Value          *sync.Pool
	PropertyMap    *sync.Pool
}{
	ValueSlice: &sync.Pool{
		New: func() interface{} {
			return make([]Value, 0, 1024)
		},
	},
	EncodeStateBuf: &sync.Pool{
		New: func() interface{} {
			return make([]byte, 255)
		},
	},
	Value: &sync.Pool{
		New: func() interface{} {
			return &Value{}
		},
	},
	PropertyMap: &sync.Pool{
		New: func() interface{} {
			return make(map[string]int)
		},
	},
}

Pools are sync.Pools used to efficiently reuse data structures that would otherwise escape to heap (and require to be allocated every time, thus leading to overall slowness of the package.) Normally, users of the package don't need to fine-tune this, however if you have particular needs it might come in handy to change some of them.

ValueSlice

ValueSlice is used when creating children elements to a Value - which is to say when there is an object or an array. Since ValueSlice enters the domain of the user, Children slices are not automatically returned to the pool - however they will be if the user practices good hygene and calls Recycle on the root value once it's done dealing with the parsed JSON data. Recycling greatly improves the speed of parsing.

The cap of the slices in the pool is vital. When parsing a JSON object or array, nanojson will cycle through the children elements and will append values to the slice, as long as the cap is not reached. Once len == cap, the parser will give back the slice to the pool, and will switch to use append to grow and add new elements to the slice. This, of course, incurs in a costly memory allocation.

By default, the pool always returns a []Value of size 1024 - on the assuption that most APIs often return less than (or equal to) that in arrays and objects. The downside of this is that even a simple [1,2,3] takes up 120kb of data (on 64-bit systems). This may seem disastrous, especially after considering a potentially dangerous payload like the following Python code:

"[" + ("[" * 40 + "1337" + ",[[1337]]]" * 40 + ",") * 500 + "[[7331]]" + "]"

However, you should not forget that in modern times our machines have virtual RAM which can help handle such abuses of memory - so you should probably not worry about exceeding of the physical RAM.

If you want to replace ValueSlice and want to have an estimate of how memory a []Value takes, it's unsafe.Sizeof(Value{}) * cap.

EncodeStateBuf

When calling Encode on a value, EncodeStateBuf is called to retrieve a []byte of size 255. (note: for the moment it MUST be 255, no other size is allowed.) The buffer is used mostly to batch calls to Write - the change showed roughly a 0.75x improvement in speed in our benchmarks, although it did place more strain on the encoder rather than the writer.

Functions

func Unmarshal

func Unmarshal(data []byte, v interface{}) error

Unmarshal is a shorthand functions to call UnmarshalOptions.Unmarshal with the options being their zero value. For more information about the unmarshalling process, refer to the documentation of UnmarshalOptions.Unmarshal.

Types

type LeftoverError

type LeftoverError []byte

LeftoverError is returned by Parse when the given data is more than just the expected JSON value to be parsed, and any additional trailing whitespace in the set of ' ', '\t', '\n', '\r'. It will contain the additional data, and the error itself will tell the amount of bytes left over.

func (*LeftoverError) Error

func (e *LeftoverError) Error() string

type ParseError

type ParseError struct {
	// What were we parsing when the error was found?
	Kind uint8

	// Position in the byte slice passed to Parse, and character that triggered
	// the error.
	Pos  int
	Char byte

	// Proper reason why the error happened.
	Reason string
}

ParseError is a general error happened during parsing.

func (*ParseError) Error

func (d *ParseError) Error() string

type UnmarshalOptions

type UnmarshalOptions struct {
	// By default (false), Unmarshal will copy its data parameter to a new array
	// - that is because the caller might want to retain the original data,
	// whereas Value.Parse in nanojson actually rewrites the original byte
	// slice. Set to true if you don't care if we touch your data parameter.
	DisableDataCopy bool

	// When assigning a Value to a []byte, normally it is enough to do a simple
	// assignment which points at the reference in the original data array.
	// If CopyData is true, however, the value is copied over.
	// It only makes sense to set this to true if DisableDataCopy is also true.
	// (An example of appropriate use would be if you use a []byte from a pool
	// to pass to Unmarshal and you want to retain the struct in which you
	// Unmarshal for long after you give back the []byte to the pool. This would
	// ensure that the data in the struct doesn't become invalid.)
	// An important note: when the destination is a string the value is always
	// copied over regardless.
	CopyData bool
}

UnmarshalOptions specifies the options for parsing JSON data in nanojson.

func (*UnmarshalOptions) Unmarshal

func (u *UnmarshalOptions) Unmarshal(data []byte, v interface{}) error

Unmarshal will parse the first valid JSON value inside of data, and attempt the best it can to unmarshal it into v. If v is not a pointer or if v is nil, then Unmarshal will return an error. Unmarshal is not exactly backwards-compatible with the encoding/json equivalent, so some changes may be necessary, but the process should be rather painless if not using interfaces.

If after the first value in data, there are more non-whitespace bytes, then an error of type LeftoverError is returned, and unmarshaling is interrupted.

In unmarshaling itself, this leads to some kind of loose typing, and some cases will be automatically converted. Specifically:

bool: true if Kind == KindTrue, a string containing only "true", or a non-0
      number. Rejected if it's an array, object or null.
numbers: strconv.ParseInt/Uint/Float, even if the JSON is a string. Will
         return an error if the strconv functions return one, or if it
         overflows the Go value.
string, []byte, [X]byte: the JSON string, or the raw unparsed number. Empty
                         otherwise.
slices, arrays: will convert each child into its Go representation, except
                if the element is uint8/byte. (see above)
maps: rejected if key is not a string or JSON is not an object. Will set
      each key to match the converted JSON value.
interface: if *Value implements the interface, then a clone of the original
           *Value will be assigned to it. Otherwise, it is rejected.
struct: matching like encoding/json, except it's case sensitive.

The biggest difference to note here is that the unmarshaling process will not automatically create a Go value for you when you specify interface{}. On the contrary - it will simply set it to a *nanojson.Value, and you will have to take care of handling the value dynamically. Note that in this case, as well as in the case of having a field in your struct which is a Value or *Value, to ensure data integrity the Value must be cloned first, which incurs in a costly alloc+copy (especially if the element has children!).

So what should you do when you need to handle a JSON value dynamically? Implement the Unmarshaler interface in a type you define. This way, the *Value will not be cloned, instead it will be passed by reference - and it will be your burden to ensure data integrity.

Unmarshal is also backwards-compatible with json.Unmarshaler - it is important to note, though, that since the parsing process of nanojson involves even rewriting the original byte slice to decode strings, that this will incur in an EncodeJSON to a temporary buffer before calling UnmarshalJSON.

type Unmarshaler

type Unmarshaler interface {
	UnmarshalValue(v *Value) error
}

Unmarshaler is the interface of types capable of unmarhshaling a description of themselves as nanojson.Values. UnmarshalValue, if retaining the Value, should always clone it and never hold the reference beyond its lifespan.

type Value

type Value struct {
	// Kind of the Value. See the constants Kind*.
	Kind uint8

	// Value is filled in the cases of KindNumber and KindString. For
	// KindString, Value is actually the parsed string, with all the slash
	// escaping replaced with their Go representation. Numbers are placed as
	// they are, and parsing of them is left to the user.
	// In encoding, if Kind is KindString, the Value is appropriately escaped.
	// Otherwise, in the case of KindNumber, Value is copied with no operation
	// in-between. (So yes, KindNumber can be used to encode raw JSON data.)
	Value []byte

	// Key is only set if the upper Value is of KindObject - in this case,
	// it will be set to the parsed string of the key.
	Key []byte

	// Children is set in case the Kind is KindObject or KindArray. The children
	// properties are listed in the Children - in the case of KindObject, the
	// children will also have their "Key" field set.
	Children []Value
	// contains filtered or unexported fields
}

Value specifies a single value in the JSON data. Value is the raw representation of JSON data before it is placed into a Go value. Efficient use of a Value should get it and place it from a pool (e.g. Pools.Value), of course taking care in resetting it before placing it back.

func (*Value) Clone

func (v *Value) Clone() *Value

Clone creates a copy of a Value that is completely independent of the the original. It will still be dependent on the original slice of bytes not being changed.

func (*Value) EncodeJSON

func (v *Value) EncodeJSON(w io.Writer) (int, error)

EncodeJSON encodes v to its JSON representation, and writes the result to w.

func (*Value) Parse

func (v *Value) Parse(b []byte) error

Parse parses a single value from the Reader. To ensure zero allocation, Parse reserves the right to modify b's content (e.g. for parsing strings); for this reason, you must create a copy to pass to b in case you wish to retain the original.

It is also important to note that v will hold references to parts of b, therefore v's content is only valid as long as b is not modified.

Parse will read only the first value inside of b, and expects the rest to be exclusively whitespace. If anything else is found, then an error of type LeftoverError is returned.

func (*Value) Property

func (v *Value) Property(s string) *Value

Property gets the children element in the object which Key is s, or nil if it does not exist.

Property will use an internal property map to find the desired element if possible; this is the case of a Value which has been created or modified by Parse. If not available, it will iterate through the items. The only issue which may arise with this is the case where v.Children has been modified and v was created through Parse - in that case, Property may return nil even if one of the children does have the desired key. In that case, the user can create their own logic for finding the key, which should be pretty trivial.

func (*Value) Recycle

func (v *Value) Recycle()

Recycle gives all the Children slices back to the pool, recursively, so that they can be reused in future parses. Callers must not retain references to v.Children or any of its values - if they wish to retain values, they should copy them or not call Recycle(). Keeping references to the children's .Value or .Key is allowed, as it is a reference to the slice and not actually reused.

func (*Value) Reset

func (v *Value) Reset()

Reset resets the Value's fields, each to its zero value.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL