jsonmuncher

package module
v0.0.0-...-475bb07 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 30, 2018 License: MIT Imports: 6 Imported by: 0

README

GoDoc Go Report Card License

JSON Muncher

A highly efficient streaming JSON parser for Go.

But why though?

Do we really need yet another JSON parser? There are a dozen or so other projects that solve this problem, would one of those not suffice?

Different situations call for different approaches. Each of these projects exists because someone found the existing solutions lacking in some way, a gap that needed filling. This might be related to speed, memory footprint, ease of use, some combination of these, or something else entirely. I've found that none of the existing parsers fill my particular gap.

JSON Muncher is designed to be as fast as possible, but its primary focus is memory efficiency. It employs the following design concepts:

  • Interactive. Each step of the parse is explicitly triggered by the caller. In some ways this might be seen as detrimental, as it means a little more code is often needed to parse a file than is required by other parsers. However, it also means you have finer control over how the parse progresses, which can allow for drastic efficiency improvements.
  • Streaming. Rather than load the entire file into memory at once, JSON Muncher reads only what it needs from the input stream. This heavily reduces the memory footprint, and is especially helpful when parsing very large files.
  • No memory allocation (almost). Allocating memory can be costly, in time as well as in space. JSON Muncher avoids allocations as much as possible, without sacrificing usability. However, sometimes allocations are necessary. Allocations are only made in these extremely limited cases:
    • Two or three allocations are made to initialize the input buffer. This only happens once, when the parse begins.
    • An error is allocated if there is a problem parsing the stream. This happens at most once (usually not at all).
    • If a numeric literal in the JSON stream is too long, a temporary buffer is allocated to store it during parsing. This only happens if the literal exceeds 32 characters in length, which is extremely unlikely in practice.

Performance

If you want to see the raw benchmarks or run them yourself, go here.

Speed
Library Small JSON Medium JSON Large JSON Huge JSON
github.com/antonholmquist/jason 24120 ns/op 63887 ns/op 1044236 ns/op 9854035703 ns/op
github.com/bcicen/jstream 40813 ns/op 84099 ns/op 1969405 ns/op 5866067676 ns/op
github.com/bitly/go-simplejson 12923 ns/op 59926 ns/op 1000596 ns/op 5876533891 ns/op
github.com/ugorji/go/codec 9467 ns/op 56543 ns/op 798903 ns/op 7739377297 ns/op
github.com/jeffail/gabs 12494 ns/op 54700 ns/op 870205 ns/op 5661634428 ns/op
github.com/mreiferson/go-ujson 10737 ns/op 41953 ns/op 679211 ns/op 4026916938 ns/op
github.com/json-iterator/go 10582 ns/op 30852 ns/op 410814 ns/op 3225396334 ns/op
github.com/a8m/djson 9475 ns/op 33466 ns/op 531208 ns/op 3048507867 ns/op
encoding/json (interface streaming) 9651 ns/op 55407 ns/op 931944 ns/op 5966943523 ns/op
encoding/json (struct streaming) 9257 ns/op 44838 ns/op 655320 ns/op 5735771777 ns/op
encoding/json (interface) 12018 ns/op 55449 ns/op 842066 ns/op 5701351770 ns/op
encoding/json (struct) 11285 ns/op 42425 ns/op 606856 ns/op 5435946960 ns/op
github.com/francoispqt/gojay 7603 ns/op 15310 ns/op 153690 ns/op 2090216626 ns/op
github.com/pquerna/ffjson 9163 ns/op 21394 ns/op 248859 ns/op 2415598919 ns/op
github.com/mailru/easyjson 7948 ns/op 15691 ns/op 175398 ns/op 2051049192 ns/op
github.com/buger/jsonparser 7322 ns/op 16174 ns/op 111023 ns/op 1135070002 ns/op
github.com/darthfennec/jsonmuncher 5937 ns/op 13783 ns/op 94460 ns/op 761287513 ns/op
Memory
Library Small JSON Medium JSON Large JSON Huge JSON
github.com/antonholmquist/jason 8333 B/op 22443 B/op 421071 B/op 4191166648 B/op
github.com/bcicen/jstream 13289 B/op 14713 B/op 438465 B/op 1129458008 B/op
github.com/bitly/go-simplejson 3337 B/op 20603 B/op 392635 B/op 2563080408 B/op
github.com/ugorji/go/codec 2304 B/op 5789 B/op 57458 B/op 2667890632 B/op
github.com/jeffail/gabs 2649 B/op 14440 B/op 265079 B/op 1517427480 B/op
github.com/mreiferson/go-ujson 2633 B/op 15203 B/op 288540 B/op 1593388936 B/op
github.com/json-iterator/go 2001 B/op 7615 B/op 118218 B/op 1839351112 B/op
github.com/a8m/djson 2345 B/op 13659 B/op 261144 B/op 1489389136 B/op
encoding/json (interface streaming) 2217 B/op 17036 B/op 341692 B/op 2214184568 B/op
encoding/json (struct streaming) 1608 B/op 7692 B/op 136168 B/op 2167391392 B/op
encoding/json (interface) 2521 B/op 13964 B/op 261799 B/op 1489402024 B/op
encoding/json (struct) 1912 B/op 4626 B/op 56264 B/op 1442501104 B/op
github.com/francoispqt/gojay 1520 B/op 6474 B/op 102668 B/op 1911409520 B/op
github.com/pquerna/ffjson 1752 B/op 4346 B/op 55977 B/op 1442499717 B/op
github.com/mailru/easyjson 1304 B/op 3952 B/op 55096 B/op 1510499090 B/op
github.com/buger/jsonparser 1168 B/op 3536 B/op 49616 B/op 360846475 B/op
github.com/darthfennec/jsonmuncher 496 B/op 1264 B/op 4336 B/op 4336 B/op
Allocations
Library Small JSON Medium JSON Large JSON Huge JSON
github.com/antonholmquist/jason 104 allocs/op 248 allocs/op 3284 allocs/op 49634480 allocs/op
github.com/bcicen/jstream 40 allocs/op 172 allocs/op 5484 allocs/op 28533146 allocs/op
github.com/bitly/go-simplejson 39 allocs/op 220 allocs/op 2845 allocs/op 28034667 allocs/op
github.com/ugorji/go/codec 12 allocs/op 36 allocs/op 254 allocs/op 18503643 allocs/op
github.com/jeffail/gabs 47 allocs/op 232 allocs/op 3041 allocs/op 29534794 allocs/op
github.com/mreiferson/go-ujson 46 allocs/op 284 allocs/op 4021 allocs/op 34534906 allocs/op
github.com/json-iterator/go 32 allocs/op 101 allocs/op 1379 allocs/op 17002141 allocs/op
github.com/a8m/djson 34 allocs/op 201 allocs/op 2746 allocs/op 28034807 allocs/op
encoding/json (interface streaming) 38 allocs/op 217 allocs/op 2889 allocs/op 28034497 allocs/op
encoding/json (struct streaming) 22 allocs/op 34 allocs/op 256 allocs/op 10502057 allocs/op
encoding/json (interface) 39 allocs/op 213 allocs/op 2881 allocs/op 28034854 allocs/op
encoding/json (struct) 23 allocs/op 30 allocs/op 248 allocs/op 10502039 allocs/op
github.com/francoispqt/gojay 13 allocs/op 20 allocs/op 178 allocs/op 10002027 allocs/op
github.com/pquerna/ffjson 21 allocs/op 25 allocs/op 243 allocs/op 10502030 allocs/op
github.com/mailru/easyjson 15 allocs/op 19 allocs/op 232 allocs/op 9502023 allocs/op
github.com/buger/jsonparser 7 allocs/op 7 allocs/op 7 allocs/op 1000012 allocs/op
github.com/darthfennec/jsonmuncher 6 allocs/op 6 allocs/op 6 allocs/op 6 allocs/op

API Reference

GoDoc

The API lives in the jsonmuncher package.

Parse()
func Parse(r io.Reader, size int) (JsonValue, error)

The Parse function starts parsing a stream. The stream is passed as the first argument, and can be anything that implements the io.Reader interface. The second argument is the size of the read buffer, in bytes.

This function will allocate a read buffer of the appropriate size, read a chunk of the stream into it, and start parsing. It returns a JsonValue, which can be used to further parse the stream.

The buffer size can be anything, but a power of two is recommended, to avoid potential hardware-related slowness. 4096 (4KiB) or 8192 (8KiB) are generally both good buffer sizes.

JsonValue

This struct represents a value parsed from the stream. It has two exported fields:

type JsonValue struct {
    Type   JsonType
    Status JsonStatus
}
  • Type describes what kind of data the JsonValue contains. Its value might be Null, Bool, Number, String, Array, or Object.
  • Status describes the read status of the JsonValue. Its value is one of the following:
    • Incomplete: There was a problem parsing the value, or some other error was encountered. As long as you aren't ignoring errors, you shouldn't see this.
    • Working: The value is partially parsed, and can be parsed further by reading more from the stream.
    • Complete: The value was parsed in its entirety, and can no longer read from the stream.

The rest of the API consists of methods on the JsonValue struct.

ValueBool()
func (data *JsonValue) ValueBool() (bool, error)

If this JsonValue is a Bool, return the value (true or false). Otherwise, return an error.

ValueNum()
func (data *JsonValue) ValueNum() (float64, error)

If this JsonValue is a Number, return the value as a double-precision float. Otherwise, return an error.

Read()
func (data *JsonValue) Read(b []byte) (int, error)

An implementation of the io.Reader interface. If this JsonValue is a String, write its contents into the argument slice, and return the number of bytes written. Otherwise, return an error.

NextKey()
func (data *JsonValue) NextKey() (JsonValue, error)

If this JsonValue is an Object, read to the next key and return it. Otherwise, return an error.

NextValue()
func (data *JsonValue) NextValue() (JsonValue, error)

If this JsonValue is an Object or Array, read to the next value and return it. Otherwise, return an error.

In the case of an Array, each time NextValue() is called, the next item in the array will be returned. In the case of an Object, alternate between calling NextKey() and NextValue() to get each key/value pair in the object, or else data will be skipped (if you miss calling a NextKey() or NextValue(), that key or value is discarded).

When NextKey() or NextValue() is called but the object or array has been read to the end, a jsonmuncher.EndOfValue error is returned.

Close()
func (data *JsonValue) Close() error

An implementation of the io.Closer interface. When called, this "closes" the JsonValue by simply discarding the remainder of the value from the stream. Use this if you don't want the rest of the value's data, as it's a good deal faster than parsing. This is effective when used with Strings, Numbers, Objects, or Arrays.

NextKey(), NextValue(), and Close() cannot be called on an Object or Array that has a partially-parsed child; you must fully parse or Close() a child value before continuing to read its parent. Otherwise, an error is returned.

Compare()
func (data *JsonValue) Compare(vals ...string) (string, bool, error)

A helper function, designed to check the value of a String without allocating memory. Consumes the string in the process. Given one or more strings as arguments, compare the value against each argument. Return true with the matched string if there was an exact match, and return false otherwise.

FindKey()
func (data *JsonValue) FindKey(keys ...string) (string, JsonValue, bool, error)

A helper function, designed to quickly search an Object without allocating memory. Given one or more strings as arguments, check each key in the object and compare it to each argument. When an exact match is found, read the corresponding value from the object, and return true with the matched string and the value. If the object is exhausted and no match is found, return false.

To compare against multiple strings while avoiding allocations, Compare() and FindKey() both must sort their arguments. Both functions perform an in-place sort, but this step is faster if the arguments are already ordered correctly. For this reason, it's recommended that when these functions are used, arguments are passed in alphabetical order. Also, if a slice of arguments is passed using something like jsonval.Compare(sliceval ...), the slice will be sorted in-place by the function, so it may not be in the same order after the function runs.

Documentation

Overview

Package jsonmuncher is a very performant streaming JSON parser.

Index

Constants

This section is empty.

Variables

View Source
var EndOfValue = errors.New("End of value reached")

EndOfValue denotes that the end of an object or array has already been reached, and no new elements can be read. This is returned by the NextKey() and NextValue() functions.

View Source
var ErrIncomplete = errors.New("Status incomplete denotes failed read")

ErrIncomplete is returned when a call is made to a value with an "Incomplete" status, meaning a read error had occurred during a previous operation.

View Source
var ErrNoParamsSpecified = errors.New("At least one argument should be provided")

ErrNoParamsSpecified is returned from the Compare() and FindKey() functions when no arguments are passed. These variadic functions expect at least one argument.

View Source
var ErrWorkingChild = errors.New("Unable to consume when child element is partially read")

ErrWorkingChild is returned when NextKey(), NextValue(), or Close() is called on an object or array, but one of its elements is only partially read. The child element must be fully read or closed first.

Functions

This section is empty.

Types

type ErrTypeMismatch

type ErrTypeMismatch struct {
	Provided JsonType
	Expected []JsonType
}

ErrTypeMismatch is returned when a JsonValue method specific to a particular JSON type is called on a different JSON type. For example, ValueNum() will return this error if called on any JsonValue that isn't a Number.

func (ErrTypeMismatch) Error

func (e ErrTypeMismatch) Error() string

Error implements error for ErrTypeMismatch.

type ErrUnexpectedChar

type ErrUnexpectedChar struct {
	Offset      uint64
	ProvidedEOF bool
	Provided    byte
	Expected    []byte
	CustomMsg   string
}

ErrUnexpectedChar is returned whenever a syntactic parse error is encountered: an illegal character or an unexpected EOF.

func (ErrUnexpectedChar) Error

func (e ErrUnexpectedChar) Error() string

Error implements error for ErrUnexpectedChar.

type JsonStatus

type JsonStatus byte

JsonStatus represents the current read status of a JsonValue.

const (
	// Incomplete means there was a read or parse error while parsing the value.
	Incomplete JsonStatus = iota
	// Working means the value is currently in the process of being parsed.
	Working
	// Complete means the value has been parsed successfully in its entirety.
	Complete
)

type JsonType

type JsonType byte

JsonType represents the data type of a JsonValue.

const (
	// Null values are always 'null'.
	Null JsonType = iota
	// Bool values are 'true' or 'false'.
	Bool
	// Number values are double-precision floating point numbers.
	Number
	// String values are unicode strings.
	String
	// Array values are ordered collections of arbitrary JSON values.
	Array
	// Object values are maps from strings to arbitrary JSON values.
	Object
)

type JsonValue

type JsonValue struct {

	// Type is the data type of this value.
	Type JsonType
	// Status is the read status of this value.
	Status JsonStatus
	// contains filtered or unexported fields
}

JsonValue represents a JSON value. This is the primary structure used in this library.

func Parse

func Parse(r io.Reader, size int) (JsonValue, error)

Parse takes an io.Reader and begins to parse from it, returning a JsonValue. This function also takes a size (in bytes) to use when creating the read buffer.

func (*JsonValue) Close

func (data *JsonValue) Close() error

Close implements the io.Closer interface for JsonValues. Closing a JsonValue discards the remainder of that value from the stream. This is a fast way to ignore unimportant parts of the input to reach useful information.

func (*JsonValue) Compare

func (data *JsonValue) Compare(vals ...string) (string, bool, error)

Compare is a helper function, designed to read a String value and compare it against one or more arguments. If the String value matches none of the arguments, false is returned. Otherwise, true is returned along with the matched string.

func (*JsonValue) FindKey

func (data *JsonValue) FindKey(keys ...string) (string, JsonValue, bool, error)

FindKey is a helper function, designed to find a specific value in an Object. It reads the object until it finds the first key that matches one of the provided arguments. If no keys match, false is returned. Otherwise, true is returned along with the matched string and the value associated with the matched key. Note that this will discard everything in the stream prior to the matched key/value pair.

func (*JsonValue) NextKey

func (data *JsonValue) NextKey() (JsonValue, error)

NextKey reads the next key from an Object, and returns it as a JsonValue. If the next part of the Object to parse is a value, that value is discarded, and the following key is returned. If the end of the Object is found, an EndOfValue error is returned.

func (*JsonValue) NextValue

func (data *JsonValue) NextValue() (JsonValue, error)

NextValue reads the next value from an Object or Array, and returns it as a JsonValue. If this is used on an Object, and the next part of the Object to parse is a key, that key is discarded and the corresponding value is returned. If the end of the Object or Array is found, an EndOfValue error is returned.

func (*JsonValue) Read

func (data *JsonValue) Read(b []byte) (int, error)

Read implements the io.Reader interface for JsonValues (specifically, String values). Reading from this interface provides the value of the string.

func (*JsonValue) ValueBool

func (data *JsonValue) ValueBool() (bool, error)

ValueBool returns the value of a Bool.

func (*JsonValue) ValueNum

func (data *JsonValue) ValueNum() (float64, error)

ValueNum returns the value of a Number.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL