jsonparser

package module
v0.0.0-...-9a172a2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 13, 2017 License: MIT Imports: 7 Imported by: 0

README

Go Report Card License

Alternative JSON parser for Go (so far fastest)

It does not require you to know the structure of the payload (eg. create structs), and allows accessing fields by providing the path to them. It is up to 10 times faster than standard encoding/json package (depending on payload size and usage), allocates no memory. See benchmarks below.

Rationale

Originally I made this for a project that relies on a lot of 3rd party APIs that can be unpredictable and complex. I love simplicity and prefer to avoid external dependecies. encoding/json requires you to know your data structures exactly, or forces you to use map[string]interface{} instead, which is very slow and hard to manage. I investigated what's on the market and found that most libraries are just wrappers around encoding/json, there is few options with own parsers (ffjson, easyjson), but they still requires you to create data structures.

Goal of this project is to push JSON parser performance and not sacrifice compliance and developer user experience.

JsonValue Abstraction

JsonValue is a small abstraction built on top of the extremely fast base functions (Get, ArrayEach, ObjectEach). Instead of dealing with byte arrays it allows you to parse your json in a simpler OO style. We were also able to fix and improve some functionality which could not be changed before because of backward-compatibity. These abstractions were made to make parsing unknown, deeply nested json structures easy and error free. The raw base functions have some caveats which the abstractions fix/hide. The JsonValue API does require a few small mem allocs for the wrapper structs, but still extremely minimal.

The examples below will document both the raw and the OO way of parsing.

Example

For the given JSON our goal is to extract the user's full name, number of github followers and avatar.

import "github.com/buger/jsonparser"

...

data := []byte(`{
  "person": {
    "name": {
      "first": "Leonid",
      "last": "Bugaev",
      "fullName": "Leonid Bugaev"
    },
    "github": {
      "handle": "buger",
      "followers": 109
    },
    "avatars": [
      { "url": "https://avatars1.githubusercontent.com/u/14009?v=3&s=460", "type": "thumbnail" }
    ]
  },
  "company": {
    "name": "Acme"
  }
}`)
Raw base
// You can specify key path by providing arguments to Get function
jsonparser.Get(data, "person", "name", "fullName")

// There is `GetInt` and `GetBoolean` helpers if you exactly know key data type
jsonparser.GetInt(data, "person", "github", "followers")

// When you try to get object, it will return you []byte slice pointer to data containing it
// In `company` it will be `{"name": "Acme"}`
jsonparser.Get(data, "company")

// If the key doesn't exist it will throw an error
var size int64
if value, _, err := jsonparser.GetInt(data, "company", "size"); err == nil {
  size = value
}

// You can use `ArrayEach` helper to iterate items [item1, item2 .... itemN]
jsonparser.ArrayEach(data, func(value []byte, dataType jsonparser.ValueType, offset int, err error) {
	fmt.Println(jsonparser.Get(value, "url"))
}, "person", "avatars")

// Or use can access fields by index!
jsonparser.GetString(data, "person", "avatars", "[0]", "url")

// You can use `ObjectEach` helper to iterate objects { "key1":object1, "key2":object2, .... "keyN":objectN }
jsonparser.ObjectEach(data, func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {
        fmt.Printf("Key: '%s'\n Value: '%s'\n Type: %s\n", string(key), string(value), dataType)
	return nil
}, "person", "name")

// The most efficient way to extract multiple keys is `EachKey`

paths := [][]string{
  []string{"person", "name", "fullName"},
  []string{"person", "avatars", "[0]", "url"},
  []string{"company", "url"},
}
jsonparser.EachKey(data, func(idx int, value []byte, vt jsonparser.ValueType, err error){
  switch idx {
  case 0: // []string{"person", "name", "fullName"}
    ...
  case 1: // []string{"person", "avatars", "[0]", "url"}
    ...
  case 2: // []string{"company", "url"},
    ...
  }
}, paths...)

// For more information see docs below
With JsonValue
//always start by callig ParseJson of your json data, this wraps your data and provides an easy to use API
json := jsonparser.ParseJson(data)

// You can specify key path by providing arguments to Get function (or directly to ParseJson, result is identical)
//this returns a *JsonValue object containing the data you requested
jsonName := json.Get("person", "name", "fullName")

//you can easily check the type of data you got and retrieve it
if jsonName.IsString() {
    fmt.Println(jsonName.GetString())
} else {
    //complain when name is not a string
    fmt.Printf("Help I got an %v!", jsonName.Type)
}

// Instead of using `Get` you can immediately use `GetSting`, `GetInt`, `GetFloat`, `GetBool`, etc. with a path if you know what kind of data type to expect
json.GetInt("person", "github", "followers")

// When you give the path to an object, you get a JsonValue (as always) which you can use to further parse the json object
company := json.Get("company") // `company` => `{"name": "Acme"}`
company.GetString("name")
//this is identical to
json.GetString("company", "name")
//also identical to (you get the idea..)
name := json.Get("company", "name").GetString()

// if at any point the parsing fails because your json is malformed or you provided an invalid path the JsonValue will contain an error
//use `JsonValue.Err()` to check this. `JsonValue` implements `Error()` so it is a valid go `error`
test := json.Get("company", "doesnotexist").Get("nono")
if test.Err() != nil {
    fmt.Println("Parsing failed: ", test) //will print the error
}
//if you use a `GetXYZ` function on a JsonValue with a parse error, it will simply return an error.
var size int64
if value, _, err := json.GetInt("company", "size"); err == nil {
  size = value
}

// You can use `ArrayEach` helper to iterate items in a jsonArray [item1, item2 .... itemN]
json.Get("person", "avatars").ArrayEach(func(avatar *JsonValue) {
	fmt.Println(avatar.GetString("url"))
})

// Or use can access fields by index!
json.GetString("person", "avatars", "[0]", "url")


// Or parse the whole array and convert it to a go array (of JsonValues)
avatars := json.Get("person", "avatars")
if avatars.IsArray() {
    for _, avatar := range avatars.ToArray() {
        fmt.Println(avatar.GetString("url"))
    }
}

// You can use `ObjectEach` helper to iterate objects { "key1":object1, "key2":object2, .... "keyN":objectN }
json.Get("person", "name").ObjectEach(func(key string, value *JsonValue) {
        fmt.Printf("Key: '%s' Value: '%s' Type: %s\n", key, value.String(), value.Type)
})

// Or use `ToMap` to concvert the whole object to a go map
for key, value := range json.Get("person", "name").ToMap {
        fmt.Printf("Key: '%s' Value: '%s' Type: %s\n", key, value.String(), value.Type)
})

// The most efficient way to extract multiple keys is `AllKeys`
paths := [][]string{
  {"person", "name", "fullName"},
  {"person", "avatars", "[0]", "url"},
  {"company", "url"},
}
for _, value := range json.AllKeys(paths...) {
    fmt.Println(value.String())
})

// For more information see docs below

Need to speedup your app?

I'm available for consulting and can help you push your app performance to the limits. Ping me at: leonsbox@gmail.com.

Reference

Library API is really simple. You just need the Get method to perform any operation. The rest is just helpers around it.

You also can view API at godoc.org

Get
func Get(data []byte, keys ...string) (value []byte, dataType jsonparser.ValueType, offset int, err error)

Receives data structure, and key path to extract value from.

Returns:

  • value - Pointer to original data structure containing key value, or just empty slice if nothing found or error
  • dataType - Can be: NotExist, String, Number, Object, Array, Boolean or Null
  • offset - Offset from provided data structure where key value ends. Used mostly internally, for example for ArrayEach helper.
  • err - If the key is not found or any other parsing issue, it should return error. If key not found it also sets dataType to NotExist

Accepts multiple keys to specify path to JSON value (in case of quering nested structures). If no keys are provided it will try to extract the closest JSON value (simple ones or object/array), useful for reading streams or arrays, see ArrayEach implementation.

Note that keys can be an array indexes: jsonparser.GetInt("person", "avatars", "[0]", "url"), pretty cool, yeah?

GetString
func GetString(data []byte, keys ...string) (val string, err error)

Returns strings properly handing escaped and unicode characters. Note that this will cause additional memory allocations.

GetUnsafeString

If you need string in your app, and ready to sacrifice with support of escaped symbols in favor of speed. It returns string mapped to existing byte slice memory, without any allocations:

s, _, := jsonparser.GetUnsafeString(data, "person", "name", "title")
switch s {
  case 'CEO':
    ...
  case 'Engineer'
    ...
  ...
}

Note that unsafe here means that your string will exist until GC will free underlying byte slice, for most of cases it means that you can use this string only in current context, and should not pass it anywhere externally: through channels or any other way.

GetBoolean, GetInt and GetFloat
func GetBoolean(data []byte, keys ...string) (val bool, err error)

func GetFloat(data []byte, keys ...string) (val float64, err error)

func GetInt(data []byte, keys ...string) (val float64, err error)

If you know the key type, you can use the helpers above. If key data type do not match, it will return error.

ArrayEach
func ArrayEach(data []byte, cb func(value []byte, dataType jsonparser.ValueType, offset int, err error), keys ...string)

Needed for iterating arrays, accepts a callback function with the same return arguments as Get.

ObjectEach
func ObjectEach(data []byte, callback func(key []byte, value []byte, dataType ValueType, offset int) error, keys ...string) (err error)

Needed for iterating object, accepts a callback function. Example:

var handler func([]byte, []byte, jsonparser.ValueType, int) error
handler = func(key []byte, value []byte, dataType jsonparser.ValueType, offset int) error {
	//do stuff here
}
jsonparser.ObjectEach(myJson, handler)
EachKey
func EachKey(data []byte, cb func(idx int, value []byte, dataType jsonparser.ValueType, err error), paths ...[]string)

When you need to read multiple keys, and you do not afraid of low-level API EachKey is your friend. It read payload only single time, and calls callback function once path is found. For example when you call multiple times Get, it has to process payload multiple times, each time you call it. Depending on payload EachKey can be multiple times faster than Get. Path can use nested keys as well!

paths := [][]string{
	[]string{"uuid"},
	[]string{"tz"},
	[]string{"ua"},
	[]string{"st"},
}
var data SmallPayload

jsonparser.EachKey(smallFixture, func(idx int, value []byte, vt jsonparser.ValueType, err error){
	switch idx {
	case 0:
		data.Uuid, _ = value
	case 1:
		v, _ := jsonparser.ParseInt(value)
		data.Tz = int(v)
	case 2:
		data.Ua, _ = value
	case 3:
		v, _ := jsonparser.ParseInt(value)
		data.St = int(v)
	}
}, paths...)

What makes it so fast?

  • It does not rely on encoding/json, reflection or interface{}, the only real package dependency is bytes.
  • Operates with JSON payload on byte level, providing you pointers to the original data structure: no memory allocation.
  • No automatic type conversions, by default everything is a []byte, but it provides you value type, so you can convert by yourself (there is few helpers included).
  • Does not parse full record, only keys you specified

Benchmarks

There are 3 benchmark types, trying to simulate real-life usage for small, medium and large JSON payloads. For each metric, the lower value is better. Time/op is in nanoseconds. Values better than standard encoding/json marked as bold text. Benchmarks run on standard Linode 1024 box.

Compared libraries:

TLDR

If you want to skip next sections we have 2 winner: jsonparser and easyjson. jsonparser is up to 10 times faster than standard encoding/json package (depending on payload size and usage), and almost infinitely (literally) better in memory consumption because it operates with data on byte level, and provide direct slice pointers. easyjson wins in CPU in medium tests and frankly i'm impressed with this package: it is remarkable results considering that it is almost drop-in replacement for encoding/json (require some code generation).

It's hard to fully compare jsonparser and easyjson (or ffson), they a true parsers and fully process record, unlike jsonparser which parse only keys you specified.

If you searching for replacement of encoding/json while keeping structs, easyjson is an amazing choise. If you want to process dynamic JSON, have memory constrains, or more control over your data you should try jsonparser.

jsonparser performance heavily depends on usage, and it works best when you do not need to process full record, only some keys. The more calls you need to make, the slower it will be, in contrast easyjson (or ffjson, encoding/json) parser record only 1 time, and then you can make as many calls as you want.

With great power comes great responsibility! :)

Small payload

Each test processes 190 bytes of http log as a JSON record. It should read multiple fields. https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_small_payload_test.go

| Library | time/op | bytes/op | allocs/op | | --- | --- | --- | --- | --- | | encoding/json struct | 7879 | 880 | 18 | | encoding/json interface{} | 8946 | 1521 | 38| | Jeffail/gabs | 10053 | 1649 | 46 | | bitly/go-simplejson | 10128 | 2241 | 36 | | antonholmquist/jason | 27152 | 7237 | 101 | | github.com/ugorji/go/codec | 8806 | 2176 | 31 | | mreiferson/go-ujson | 7008 | 1409 | 37 | | pquerna/ffjson | 3769 | 624 | 15 | | mailru/easyjson | 2002 | 192 | 9 | | buger/jsonparser | 1367 | 0 | 0 | | buger/jsonparser (EachKey API) | 809 | 0 | 0 |

Winners are ffjson, easyjson and jsonparser, where jsonparser is up to 9.8x faster than encoding/json and 4.6x faster than ffjson, and slightly faster than easyjson. If you look at memory allocation, jsonparser has no rivals, as it makes no data copy and operates with raw []byte structures and pointers to it.

Medium payload

Each test processes a 2.4kb JSON record (based on Clearbit API). It should read multiple nested fields and 1 array.

https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_medium_payload_test.go

| Library | time/op | bytes/op | allocs/op | | --- | --- | --- | --- | --- | | encoding/json struct | 57749 | 1336 | 29 | | encoding/json interface{} | 79297 | 10627 | 215 | | Jeffail/gabs | 83807 | 11202 | 235 | | bitly/go-simplejson | 88187 | 17187 | 220 | | antonholmquist/jason | 94099 | 19013 | 247 | | github.com/ugorji/go/codec | 114719 | 6712 | 152 | | mreiferson/go-ujson | 56972 | 11547 | 270 | | pquerna/ffjson | 20298 | 856 | 20 | | mailru/easyjson | 10512 | 336 | 12 | | buger/jsonparser | 15955 | 0 | 0 | | buger/jsonparser (EachKey API) | 8916 | 0 | 0 |

The difference between ffjson and jsonparser in CPU usage is smaller, while the memory consumption difference is growing. On the other hand easyjson shows remarkable performance for medium payload.

gabs, go-simplejson and jason are based on encoding/json and map[string]interface{} and actually only helpers for unstructured JSON, their performance correlate with encoding/json interface{}, and they will skip next round. go-ujson while have its own parser, shows same performance as encoding/json, also skips next round. Same situation with ugorji/go/codec, but it showed unexpectedly bad performance for complex payloads.

Large payload

Each test processes a 24kb JSON record (based on Discourse API) It should read 2 arrays, and for each item in array get a few fields. Basically it means processing a full JSON file.

https://github.com/buger/jsonparser/blob/master/benchmark/benchmark_large_payload_test.go

| Library | time/op | bytes/op | allocs/op | | --- | --- | --- | --- | --- | | encoding/json struct | 748336 | 8272 | 307 | | encoding/json interface{} | 1224271 | 215425 | 3395 | | pquerna/ffjson | 312271 | 7792 | 298 | | mailru/easyjson | 154186 | 6992 | 288 | | buger/jsonparser | 85308 | 0 | 0 |

jsonparser now is a winner, but do not forget that it is way more lighweight parser than ffson or easyjson, and they have to parser all the data, while jsonparser parse only what you need. All ffjson, easysjon and jsonparser have their own parsing code, and does not depend on encoding/json or interface{}, thats one of the reasons why they are so fast. easyjson also use a bit of unsafe package to reduce memory consuption (in theory it can lead to some unexpected GC issue, but i did not tested enough)

Also last benchmark did not included EachKey test, because in this particular case we need to read lot of Array values, and using ArrayEach is more efficient.

Questions and support

All bug-reports and suggestions should go though Github Issues. If you have some private questions you can send them directly to me: leonsbox@gmail.com

Contributing

  1. Fork it
  2. Create your feature branch (git checkout -b my-new-feature)
  3. Commit your changes (git commit -am 'Added some feature')
  4. Push to the branch (git push origin my-new-feature)
  5. Create new Pull Request

Development

All my development happens using Docker, and repo include some Make tasks to simplify development.

  • make build - builds docker image, usually can be called only once
  • make test - run tests
  • make fmt - run go fmt
  • make bench - run benchmarks (if you need to run only single benchmark modify BENCHMARK variable in make file)
  • make profile - runs benchmark and generate 3 files- cpu.out, mem.mprof and benchmark.test binary, which can be used for go tool pprof
  • make bash - enter container (i use it for running go tool pprof above)

Documentation

Index

Constants

View Source
const (
	NotExist = ValueType(iota)
	String
	Number
	Object
	Array
	Boolean
	Null
	Unknown
)

Variables

View Source
var (
	KeyPathNotFoundError       = errors.New("Key path not found")
	UnknownValueTypeError      = errors.New("Unknown value type")
	MalformedJsonError         = errors.New("Malformed JSON error")
	MalformedStringError       = errors.New("Value is string, but can't find closing '\"' symbol")
	MalformedArrayError        = errors.New("Value is array, but can't find closing ']' symbol")
	MalformedObjectError       = errors.New("Value looks like object, but can't find closing '}' symbol")
	MalformedValueError        = errors.New("Value looks like Number/Boolean/None, but can't find its end: ',' or '}' symbol")
	MalformedStringEscapeError = errors.New("Encountered an invalid escape sequence in a string")
)

Errors

Functions

func ArrayEach

func ArrayEach(data []byte, cb func(value []byte, dataType ValueType, offset int, err error), keys ...string) (offset int, err error)

ArrayEach is used when iterating arrays, accepts a callback function with the same return arguments as `Get`.

func EachKey

func EachKey(data []byte, cb func(int, []byte, ValueType, error), paths ...[]string) int

func GetBoolean

func GetBoolean(data []byte, keys ...string) (val bool, err error)

GetBoolean returns the value retrieved by `Get`, cast to a bool if possible. The offset is the same as in `Get`. If key data type do not match, it will return error.

func GetFloat

func GetFloat(data []byte, keys ...string) (val float64, err error)

GetFloat returns the value retrieved by `Get`, cast to a float64 if possible. The offset is the same as in `Get`. If key data type do not match, it will return an error.

func GetInt

func GetInt(data []byte, keys ...string) (val int64, err error)

GetInt returns the value retrieved by `Get`, cast to a int64 if possible. If key data type do not match, it will return an error.

func GetString

func GetString(data []byte, keys ...string) (val string, err error)

GetString returns the value retrieved by `Get`, cast to a string if possible, trying to properly handle escape and utf8 symbols If key data type do not match, it will return an error.

func GetUnsafeString

func GetUnsafeString(data []byte, keys ...string) (val string, err error)

GetUnsafeString returns the value retrieved by `Get`, use creates string without memory allocation by mapping string to slice memory. It does not handle escape symbols.

func ObjectEach

func ObjectEach(data []byte, callback func(key []byte, value []byte, dataType ValueType, offset int) error, keys ...string) (err error)

ObjectEach iterates over the key-value pairs of a JSON object, invoking a given callback for each such entry

func ParseBoolean

func ParseBoolean(b []byte) (bool, error)

ParseBoolean parses a Boolean ValueType into a Go bool (not particularly useful, but here for completeness)

func ParseFloat

func ParseFloat(b []byte) (float64, error)

ParseNumber parses a Number ValueType into a Go float64

func ParseInt

func ParseInt(b []byte) (int64, error)

ParseInt parses a Number ValueType into a Go int64

func ParseString

func ParseString(b []byte) (string, error)

ParseString parses a String ValueType into a Go string (the main parsing work is unescaping the JSON string)

func Unescape

func Unescape(in, out []byte) ([]byte, error)

unescape unescapes the string contained in 'in' and returns it as a slice. If 'in' contains no escaped characters:

Returns 'in'.

Else, if 'out' is of sufficient capacity (guaranteed if cap(out) >= len(in)):

'out' is used to build the unescaped string and is returned with no extra allocation

Else:

A new slice is allocated and returned.

Types

type JsonValue

type JsonValue struct {
	Type ValueType
	// contains filtered or unexported fields
}

func ParseJson

func ParseJson(data []byte, keys ...string) *JsonValue

func (*JsonValue) AllKeys

func (jv *JsonValue) AllKeys(paths ...[]string) (res []*JsonValue, err error)

func (*JsonValue) ArrayEach

func (jv *JsonValue) ArrayEach(cb func(value *JsonValue)) error

func (*JsonValue) ArrayEachWithError

func (jv *JsonValue) ArrayEachWithError(cb func(value *JsonValue) error) error

func (*JsonValue) ArrayEachWithIndex

func (jv *JsonValue) ArrayEachWithIndex(cb func(idx int, value *JsonValue)) error

func (*JsonValue) EachKey

func (jv *JsonValue) EachKey(cb func(idx int, value *JsonValue), paths ...[]string) error

func (*JsonValue) Err

func (jv *JsonValue) Err() error

func (*JsonValue) Error

func (jv *JsonValue) Error() string

func (*JsonValue) Get

func (jv *JsonValue) Get(keys ...string) *JsonValue

func (*JsonValue) GetBool

func (jv *JsonValue) GetBool(keys ...string) (bool, error)

func (*JsonValue) GetBoolArray

func (jv *JsonValue) GetBoolArray(keys ...string) ([]bool, error)

func (*JsonValue) GetFloat

func (jv *JsonValue) GetFloat(keys ...string) (float64, error)

func (*JsonValue) GetFloatArray

func (jv *JsonValue) GetFloatArray(keys ...string) ([]float64, error)

func (*JsonValue) GetInt

func (jv *JsonValue) GetInt(keys ...string) (int64, error)

func (*JsonValue) GetIntArray

func (jv *JsonValue) GetIntArray(keys ...string) ([]int64, error)

func (*JsonValue) GetString

func (jv *JsonValue) GetString(keys ...string) (string, error)

func (*JsonValue) GetStringArray

func (jv *JsonValue) GetStringArray(keys ...string) (res []string, err error)

func (*JsonValue) GetStringUnsafe

func (jv *JsonValue) GetStringUnsafe(keys ...string) (string, error)

func (*JsonValue) Index

func (jv *JsonValue) Index(indices ...int) *JsonValue

func (*JsonValue) IsArray

func (jv *JsonValue) IsArray() bool

func (*JsonValue) IsBoolean

func (jv *JsonValue) IsBoolean() bool

func (*JsonValue) IsFloat

func (jv *JsonValue) IsFloat() bool

func (*JsonValue) IsInt

func (jv *JsonValue) IsInt() bool

func (*JsonValue) IsNumber

func (jv *JsonValue) IsNumber() bool

func (*JsonValue) IsObject

func (jv *JsonValue) IsObject() bool

func (*JsonValue) IsString

func (jv *JsonValue) IsString() bool

func (*JsonValue) ObjectEach

func (jv *JsonValue) ObjectEach(cb func(key string, value *JsonValue)) error

func (*JsonValue) RawBytes

func (jv *JsonValue) RawBytes() []byte

func (*JsonValue) String

func (jv *JsonValue) String() string

func (*JsonValue) ToArray

func (jv *JsonValue) ToArray() ([]*JsonValue, error)

func (*JsonValue) ToMap

func (jv *JsonValue) ToMap() (res map[string]*JsonValue, err error)

type ValueType

type ValueType int

Data types available in valid JSON data.

func Get

func Get(data []byte, keys ...string) (value []byte, dataType ValueType, offset int, err error)

Get - Receives data structure, and key path to extract value from.

Returns: `value` - Pointer to original data structure containing key value, or just empty slice if nothing found or error `dataType` - Can be: `NotExist`, `String`, `Number`, `Object`, `Array`, `Boolean` or `Null` `offset` - Offset from provided data structure where key value ends. Used mostly internally, for example for `ArrayEach` helper. `err` - If key not found or any other parsing issue it should return error. If key not found it also sets `dataType` to `NotExist`

Accept multiple keys to specify path to JSON value (in case of quering nested structures). If no keys provided it will try to extract closest JSON value (simple ones or object/array), useful for reading streams or arrays, see `ArrayEach` implementation.

Get calls the internal get function, but will strip quotes from strings returned. (breaks abstraction, but kept for compatibility)

func (ValueType) String

func (vt ValueType) String() string

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL