simdjson

package module

v0.3.0 Latest Latest Go to latest Published: Jan 25, 2021 License: Apache-2.0 Imports: 17 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/dgraph-io/simdjson-go

Links

Open Source Insights

README ¶

simdjson-go

Introduction

This is a Golang port of simdjson, a high performance JSON parser developed by Daniel Lemire and Geoff Langdale. It makes extensive use of SIMD instructions to achieve parsing performance of gigabytes of JSON per second.

Performance wise, simdjson-go runs on average at about 40% to 60% of the speed of simdjson. Compared to Golang's standard package encoding/json, simdjson-go is about 10x faster.

Features

simdjson-go is a validating parser, meaning that it amongst others validates and checks numerical values, booleans etc. Therefore these values are available as the appropriate int and float64 representations after parsing.

Additionally simdjson-go has the following features:

No 4 GB object limit
Support for ndjson (newline delimited json)
Pure Go (no need for cgo)

Requirements

simdjson-go has the following requirements for parsing:

A CPU with both AVX2 and CLMUL is required (Haswell from 2013 onwards should do for Intel, for AMD a Ryzen/EPYC CPU (Q1 2017) should be sufficient). This can be checked using the provided SupportedCPU() function.

The package does not provide fallback for unsupported CPUs, but serialized data can be deserialized on an unsupported CPU.

Using the gccgo will also always return unsupported CPU since it cannot compile assembly.

Usage

Run the following command in order to install simdjson-go

go get -u github.com/minio/simdjson-go

In order to parse a JSON byte stream, you either call simdjson.Parse() or simdjson.ParseND() for newline delimited JSON files. Both of these functions return a ParsedJson struct that can be used to navigate the JSON object by calling Iter().

Using the type Iter you can call Advance() to iterate over the tape, like so:

for {
    typ := iter.Advance()

    switch typ {
    case simdjson.TypeRoot:
        if typ, tmp, err = iter.Root(tmp); err != nil {
            return
        }

        if typ == simdjson.TypeObject {
            if obj, err = tmp.Object(obj); err != nil {
                return
            }

            e := obj.FindKey(key, &elem)
            if e != nil && elem.Type == simdjson.TypeString {
                v, _ := elem.Iter.StringBytes()
                fmt.Println(string(v))
            }
        }

    default:
        return
    }
}

When you advance the Iter you get the next type currently queued.

Each type then has helpers to access the data. When you get a type you can use these to access the data:

Type	Action on Iter
TypeNone	Nothing follows. Iter done
TypeNull	Null value
TypeString	`String()`/`StringBytes()`
TypeInt	`Int()`/`Float()`
TypeUint	`Uint()`/`Float()`
TypeFloat	`Float()`
TypeBool	`Bool()`
TypeObject	`Object()`
TypeArray	`Array()`
TypeRoot	`Root()`

You can also get the next value as an interface{} using the Interface() method.

Note that arrays and objects that are null are always returned as TypeNull.

The complex types returns helpers that will help parse each of the underlying structures.

It is up to you to keep track of the nesting level you are operating at.

For any Iter it is possible to marshal the recursive content of the Iter using MarshalJSON() or MarshalJSONBuffer(...).

Currently, it is not possible to unmarshal into structs.

Parsing Objects

If you are only interested in one key in an object you can use FindKey to quickly select it.

An object kan be traversed manually by using NextElement(dst *Iter) (name string, t Type, err error). The key of the element will be returned as a string and the type of the value will be returned and the provided Iter will contain an iterator which will allow access to the content.

There is a NextElementBytes which provides the same, but without the need to allocate a string.

All elements of the object can be retrieved using a pretty lightweight Parse which provides a map of all keys and all elements an a slide.

All elements of the object can be returned as map[string]interface{} using the Map method on the object. This will naturally perform allocations for all elements.

Parsing Arrays

Arrays in JSON can have mixed types. To iterate over the array with mixed types use the Iter method to get an iterator.

There are methods that allow you to retrieve all elements as a single type, []int64, []uint64, float64 and strings.

Number parsing

Numbers in JSON are untyped and are returned by the following rules in order:

If there is any float point notation, like exponents, or a dot notation, it is always returned as float.
If number is a pure integer and it fits within an int64 it is returned as such.
If number is a pure positive integer and fits within a uint64 it is returned as such.
If the number is valid number it is returned as float64.

If the number was converted from integer notation to a float due to not fitting inside int64/uint64 the FloatOverflowedInteger flag is set, which can be retrieved using (Iter).FloatFlags() method.

JSON numbers follow JavaScript’s double-precision floating-point format.

Represented in base 10 with no superfluous leading zeros (e.g. 67, 1, 100).
Include digits between 0 and 9.
Can be a negative number (e.g. -10).
Can be a fraction (e.g. .5).
Can also have an exponent of 10, prefixed by e or E with a plus or minus sign to indicate positive or negative exponentiation.
Octal and hexadecimal formats are not supported.
Can not have a value of NaN (Not A Number) or Infinity.

Parsing NDSJON stream

Newline delimited json is sent as packets with each line being a root element.

Here is an example that counts the number of "Make": "HOND" in NDSJON similar to this:

{"Age":20, "Make": "HOND"}
{"Age":22, "Make": "TLSA"}

func findHondas(r io.Reader) {
	// Temp values.
	var tmpO simdjson.Object{}
	var tmpE simdjson.Element{}
	var tmpI simdjson.Iter
	var nFound int
	
	// Communication
	reuse := make(chan *simdjson.ParsedJson, 10)
	res := make(chan simdjson.Stream, 10)

	simdjson.ParseNDStream(r, res, reuse)
	// Read results in blocks...
	for got := range res {
		if got.Error != nil {
			if got.Error == io.EOF {
				break
			}
			log.Fatal(got.Error)
		}

		all := got.Value.Iter()
		// NDJSON is a separated by root objects.
		for all.Advance() == simdjson.TypeRoot {
			// Read inside root.
			t, i, err := all.Root(&tmpI)
			if t != simdjson.TypeObject {
				log.Println("got type", t.String())
				continue
			}

			// Prepare object.
			obj, err := i.Object(&tmpO)
			if err != nil {
				log.Println("got err", err)
				continue
			}

			// Find Make key.
			elem := obj.FindKey("Make", &tmpE)
			if elem.Type != TypeString {
				log.Println("got type", err)
				continue
			}
			
			// Get value as bytes.
			asB, err := elem.Iter.StringBytes()
			if err != nil {
				log.Println("got err", err)
				continue
			}
			if bytes.Equal(asB, []byte("HOND")) {
				nFound++
			}
		}
		reuse <- got.Value
	}
	fmt.Println("Found", nFound, "Hondas")
}

More examples can be found in the examples subdirectory and further documentation can be found at godoc.

Serializing parsed json

It is possible to serialize parsed JSON for more compact storage and faster load time.

To create a new serialized use NewSerializer. This serializer can be reused for several JSON blocks.

The serializer will provide string deduplication and compression of elements. This can be finetuned using the CompressMode setting.

To serialize a block of parsed data use the Serialize method.

To read back use the Deserialize method. For deserializing the compression mode does not need to match since it is read from the stream.

Example of speed for serializer/deserializer on parking-citations-1M.

Compress Mode	% of JSON size	Serialize Speed	Deserialize Speed
None	177.26%	425.70 MB/s	2334.33 MB/s
Fast	17.20%	412.75 MB/s	1234.76 MB/s
Default	16.85%	411.59 MB/s	1242.09 MB/s
Best	10.91%	337.17 MB/s	806.23 MB/s

In some cases the speed difference and compression difference will be bigger.

Performance vs simdjson

Based on the same set of JSON test files, the graph below shows a comparison between simdjson and simdjson-go.

simdjson-vs-go-comparison

These numbers were measured on a MacBook Pro equipped with a 3.1 GHz Intel Core i7. Also, to make it a fair comparison, the constant GOLANG_NUMBER_PARSING was set to false (default is true) in order to use the same number parsing function (which is faster at the expense of some precision; see more below).

In addition the constant ALWAYS_COPY_STRINGS was set to false (default is true) for non-streaming use case scenarios where the full JSON message is kept in memory (similar to the simdjson behaviour).

Performance vs `encoding/json` and `json-iterator/go`

Below is a performance comparison to Golang's standard package encoding/json based on the same set of JSON test files.

$ benchcmp                    encoding_json.txt      simdjson-go.txt
benchmark                     old MB/s               new MB/s         speedup
BenchmarkApache_builds-8      106.77                  948.75           8.89x
BenchmarkCanada-8              54.39                  519.85           9.56x
BenchmarkCitm_catalog-8       100.44                 1565.28          15.58x
BenchmarkGithub_events-8      159.49                  848.88           5.32x
BenchmarkGsoc_2018-8          152.93                 2515.59          16.45x
BenchmarkInstruments-8         82.82                  811.61           9.80x
BenchmarkMarine_ik-8           48.12                  422.43           8.78x
BenchmarkMesh-8                49.38                  371.39           7.52x
BenchmarkMesh_pretty-8         73.10                  784.89          10.74x
BenchmarkNumbers-8            160.69                  434.85           2.71x
BenchmarkRandom-8              66.56                  615.12           9.24x
BenchmarkTwitter-8             79.05                 1193.47          15.10x
BenchmarkTwitterescaped-8      83.96                  536.19           6.39x
BenchmarkUpdate_center-8       73.92                  860.52          11.64x

Also simdjson-go uses less additional memory and allocations.

Here is another benchmark comparison to json-iterator/go:

$ benchcmp                    json-iterator.txt      simdjson-go.txt
benchmark                     old MB/s               new MB/s         speedup
BenchmarkApache_builds-8      154.65                  948.75           6.13x
BenchmarkCanada-8              40.34                  519.85          12.89x
BenchmarkCitm_catalog-8       183.69                 1565.28           8.52x
BenchmarkGithub_events-8      170.77                  848.88           4.97x
BenchmarkGsoc_2018-8          225.13                 2515.59          11.17x
BenchmarkInstruments-8        120.39                  811.61           6.74x
BenchmarkMarine_ik-8           61.71                  422.43           6.85x
BenchmarkMesh-8                50.66                  371.39           7.33x
BenchmarkMesh_pretty-8         90.36                  784.89           8.69x
BenchmarkNumbers-8             52.61                  434.85           8.27x
BenchmarkRandom-8              85.87                  615.12           7.16x
BenchmarkTwitter-8            139.57                 1193.47           8.55x
BenchmarkTwitterescaped-8     102.28                  536.19           5.24x
BenchmarkUpdate_center-8      101.41                  860.52           8.49x

AVX512 Acceleration

Stage 1 has been optimized using AVX512 instructions. Under full CPU load (8 threads) the AVX512 code is about 1 GB/sec (15%) faster as compared to the AVX2 code.

benchmark                                   AVX2 MB/s    AVX512 MB/s     speedup
BenchmarkFindStructuralBitsParallelLoop      7225.24      8302.96         1.15x

These benchmarks were generated on a c5.2xlarge EC2 instance with a Xeon Platinum 8124M CPU at 3.0 GHz.

Design

simdjson-go follows the same two stage design as simdjson. During the first stage the structural elements ({, }, [, ], :, and ,) are detected and forwarded as offsets in the message buffer to the second stage. The second stage builds a tape format of the structure of the JSON document.

Note that in contrast to simdjson, simdjson-go outputs uint32 increments (as opposed to absolute values) to the second stage. This allows arbitrarily large JSON files to be parsed (as long as a single (string) element does not surpass 4 GB...).

Also, for better performance, both stages run concurrently as separate go routines and a go channel is used to communicate between the two stages.

Stage 1

Stage 1 has been converted from the original C code (containing the SIMD intrinsics) to Golang assembly using c2goasm. It essentially consists of five separate steps, being:

find_odd_backslash_sequences: detect backslash characters used to escape quotes
find_quote_mask_and_bits: generate a mask with bits turned on for characters between quotes
find_whitespace_and_structurals: generate a mask for whitespace plus a mask for the structural characters
finalize_structurals: combine the masks computed above into a final mask where each active bit represents the position of a structural character in the input message.
flatten_bits_incremental: output the active bits in the final mask as incremental offsets.

For more details you can take a look at the various test cases in find_subroutines_amd64_test.go to see how the individual routines can be invoked (typically with a 64 byte input buffer that generates one or more 64-bit masks).

There is one final routine, find_structural_bits_in_slice, that ties it all together and is invoked with a slice of the message buffer in order to find the incremental offsets.

Stage 2

During Stage 2 the tape structure is constructed. It is essentially a single function that jumps around as it finds the various structural characters and builds the hierarchy of the JSON document that it processes. The values of the JSON elements such as strings, integers, booleans etc. are parsed and written to the tape.

Any errors (such as an array not being closed or a missing closing brace) are detected and reported back as errors to the client.

Tape format

Similarly to simdjson, simdjson-go parses the structure onto a 'tape' format. With this format it is possible to skip over arrays and (sub)objects as the sizes are recorded in the tape.

simdjson-go format is exactly the same as the simdjson tape format with the following 2 exceptions:

In order to support ndjson, it is possible to have more than one root element on the tape. Also, to allow for fast navigation over root elements, a root points to the next root element (and as such the last root element points 1 index past the length of the tape).
Strings are handled differently, unlike simdjson the string size is not prepended in the String buffer but is added as an additional element to the tape itself (much like integers and floats).
- In case ALWAYS_COPY_STRINGS is false: Only strings that contain special characters are copied to the String buffer in which case the payload from the tape is the offset into the String buffer. For string values without special characters the tape's payload points directly into the message buffer.
- In case ALWAYS_COPY_STRINGS is true (default): Strings are always copied to the String buffer.

For more information, see TestStage2BuildTape in stage2_build_tape_test.go.

Non streaming use cases

The best performance is obtained by keeping the JSON message fully mapped in memory and setting the ALWAYS_COPY_STRINGS constant to false. This prevents duplicate copies of string values being made but mandates that the original JSON buffer is kept alive until the ParsedJson object is no longer needed (ie iteration over the tape format has been completed).

In case the JSON message buffer is freed earlier (or for streaming use cases where memory is reused) ALWAYS_COPY_STRINGS should be set to true (which is the default behaviour).

Fuzz Tests

simdjson-go has been extensively fuzz tested to ensure that input cannot generate crashes and that output matches the standard library.

The fuzzers and corpus are contained in a separate repository at github.com/minio/simdjson-fuzz

The repo contains information on how to run them.

License

simdjson-go is released under the Apache License v2.0. You can find the complete text in the file LICENSE.

Contributing

Contributions are welcome, please send PRs for any enhancements.

If your PR include parsing changes please run fuzz testers for a couple of hours.

Documentation ¶

Rendered for

Index ¶

Constants
Variables
func ParseNDStream(r io.Reader, res chan<- Stream, reuse <-chan *ParsedJson)
func SupportedCPU() bool
type Array
- func (a *Array) AsFloat() ([]float64, error)
- func (a *Array) AsInteger() ([]int64, error)
- func (a *Array) AsString() ([]string, error)
- func (a *Array) AsStringCvt() ([]string, error)
- func (a *Array) AsUint64() ([]uint64, error)
- func (a *Array) FirstType() Type
- func (a *Array) Interface() ([]interface{}, error)
- func (a *Array) Iter() Iter
- func (a *Array) MarshalJSON() ([]byte, error)
- func (a *Array) MarshalJSONBuffer(dst []byte) ([]byte, error)
type CompressMode
type Element
type Elements
- func (e Elements) Lookup(key string) *Element
- func (e Elements) MarshalJSON() ([]byte, error)
- func (e Elements) MarshalJSONBuffer(dst []byte) ([]byte, error)
type FloatFlag
- func (f FloatFlag) Flags(more ...FloatFlag) FloatFlags
type FloatFlags
- func (f FloatFlags) Contains(flag FloatFlag) bool
type Iter
- func (i *Iter) Advance() Type
- func (i *Iter) AdvanceInto() Tag
- func (i *Iter) AdvanceIter(dst *Iter) (Type, error)
- func (i *Iter) Array(dst *Array) (*Array, error)
- func (i *Iter) Bool() (bool, error)
- func (i *Iter) Float() (float64, error)
- func (i *Iter) FloatFlags() (float64, FloatFlags, error)
- func (i *Iter) Int() (int64, error)
- func (i *Iter) Interface() (interface{}, error)
- func (i *Iter) MarshalJSON() ([]byte, error)
- func (i *Iter) MarshalJSONBuffer(dst []byte) ([]byte, error)
- func (i *Iter) Object(dst *Object) (*Object, error)
- func (i *Iter) PeekNext() Type
- func (i *Iter) PeekNextTag() Tag
- func (i *Iter) Root(dst *Iter) (Type, *Iter, error)
- func (i *Iter) String() (string, error)
- func (i *Iter) StringBytes() ([]byte, error)
- func (i *Iter) StringCvt() (string, error)
- func (i *Iter) Type() Type
- func (i *Iter) Uint() (uint64, error)
type Object
- func (o *Object) FindKey(key string, dst *Element) *Element
- func (o *Object) Map(dst map[string]interface{}) (map[string]interface{}, error)
- func (o *Object) NextElement(dst *Iter) (name string, t Type, err error)
- func (o *Object) NextElementBytes(dst *Iter) (name []byte, t Type, err error)
- func (o *Object) Parse(dst *Elements) (*Elements, error)
type ParsedJson
- func Parse(b []byte, reuse *ParsedJson) (*ParsedJson, error)
- func ParseND(b []byte, reuse *ParsedJson) (*ParsedJson, error)
- func (pj *ParsedJson) Iter() Iter
- func (pj *ParsedJson) Reset()
type Serializer
- func NewSerializer() *Serializer
- func (s *Serializer) CompressMode(c CompressMode)
- func (s *Serializer) Deserialize(src []byte, dst *ParsedJson) (*ParsedJson, error)
- func (s *Serializer) Serialize(dst []byte, pj ParsedJson) []byte
type Stream
type Tag
- func (t Tag) String() string
- func (t Tag) Type() Type
type Type
- func (t Type) String() string

Constants ¶

View Source

const (
	TagString      = Tag('"')
	TagInteger     = Tag('l')
	TagUint        = Tag('u')
	TagFloat       = Tag('d')
	TagNull        = Tag('n')
	TagBoolTrue    = Tag('t')
	TagBoolFalse   = Tag('f')
	TagObjectStart = Tag('{')
	TagObjectEnd   = Tag('}')
	TagArrayStart  = Tag('[')
	TagArrayEnd    = Tag(']')
	TagRoot        = Tag('r')
	TagEnd         = Tag(0)
)

View Source

const JSONTAGMASK = 0xff << 56

View Source

const JSONVALUEMASK = 0xffffffffffffff

View Source

const STRINGBUFBIT = 0x80000000000000

View Source

const STRINGBUFMASK = 0x7fffffffffffff

Variables ¶

View Source

var TagToType = [256]Type{
	TagString:      TypeString,
	TagInteger:     TypeInt,
	TagUint:        TypeUint,
	TagFloat:       TypeFloat,
	TagNull:        TypeNull,
	TagBoolTrue:    TypeBool,
	TagBoolFalse:   TypeBool,
	TagObjectStart: TypeObject,
	TagArrayStart:  TypeArray,
	TagRoot:        TypeRoot,
}

TagToType converts a tag to type. For arrays and objects only the start tag will return types. All non-existing tags returns TypeNone.

Functions ¶

func ParseNDStream ¶

func ParseNDStream(r io.Reader, res chan<- Stream, reuse <-chan *ParsedJson)

ParseNDStream will parse a stream and return parsed JSON to the supplied result channel. The method will return immediately. Each element is contained within a root tag.

<root>Element 1</root><root>Element 2</root>...

Each result will contain an unspecified number of full elements, so it can be assumed that each result starts and ends with a root tag. The parser will keep parsing until writes to the result stream blocks. A stream is finished when a non-nil Error is returned. If the stream was parsed until the end the Error value will be io.EOF The channel will be closed after an error has been returned. An optional channel for returning consumed results can be provided. There is no guarantee that elements will be consumed, so always use non-blocking writes to the reuse channel.

func SupportedCPU ¶

func SupportedCPU() bool

SupportedCPU will return whether the CPU is supported.

Types ¶

type Array ¶

type Array struct {
	// contains filtered or unexported fields
}

Array represents a JSON array. There are methods that allows to get full arrays if the value type is the same. Otherwise an iterator can be retrieved.

func (*Array) AsFloat ¶

func (a *Array) AsFloat() ([]float64, error)

AsFloat returns the array values as float. Integers are automatically converted to float.

func (*Array) AsInteger ¶

func (a *Array) AsInteger() ([]int64, error)

AsInteger returns the array values as int64 values. Uints/Floats are automatically converted to int64 if they fit within the range.

func (*Array) AsString ¶

func (a *Array) AsString() ([]string, error)

AsString returns the array values as a slice of strings. No conversion is done.

func (*Array) AsStringCvt ¶

func (a *Array) AsStringCvt() ([]string, error)

AsStringCvt returns the array values as a slice of strings. Scalar types are converted. Root, Object and Arrays are not supported an will return an error if found.

func (*Array) AsUint64 ¶

func (a *Array) AsUint64() ([]uint64, error)

AsUint64 returns the array values as float. Uints/Floats are automatically converted to uint64 if they fit within the range.

func (*Array) FirstType ¶

func (a *Array) FirstType() Type

FirstType will return the type of the first element. If there are no elements, TypeNone is returned.

func (*Array) Interface ¶

func (a *Array) Interface() ([]interface{}, error)

Interface returns the array as a slice of interfaces. See Iter.Interface() for a reference on value types.

func (*Array) Iter ¶

func (a *Array) Iter() Iter

Iter returns the array as an iterator. This can be used for parsing mixed content arrays. The first value is ready with a call to Advance. Calling after last element should have TypeNone.

func (*Array) MarshalJSON ¶

func (a *Array) MarshalJSON() ([]byte, error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (*Array) MarshalJSONBuffer ¶

func (a *Array) MarshalJSONBuffer(dst []byte) ([]byte, error)

MarshalJSONBuffer will marshal all elements. An optional buffer can be provided for fewer allocations. Output will be appended to the destination.

type CompressMode ¶

type CompressMode uint8

const (
	// CompressNone no compression whatsoever.
	CompressNone CompressMode = iota

	// CompressFast will apply light compression,
	// but will not deduplicate strings which may affect deserialization speed.
	CompressFast

	// CompressDefault applies light compression and deduplicates strings.
	CompressDefault

	// CompressBest
	CompressBest
)

type Element ¶

type Element struct {
	// Name of the element
	Name string
	// Type of the element
	Type Type
	// Iter containing the element
	Iter Iter
}

Element represents an element in an object.

type Elements ¶

type Elements struct {
	Elements []Element
	Index    map[string]int
}

Elements contains all elements in an object kept in original order. And index contains lookup for object keys.

func (Elements) Lookup ¶

func (e Elements) Lookup(key string) *Element

Lookup a key in elements and return the element. Returns nil if key doesn't exist. Keys are case sensitive.

func (Elements) MarshalJSON ¶

func (e Elements) MarshalJSON() ([]byte, error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (Elements) MarshalJSONBuffer ¶

func (e Elements) MarshalJSONBuffer(dst []byte) ([]byte, error)

MarshalJSONBuffer will marshal all elements. An optional buffer can be provided for fewer allocations. Output will be appended to the destination.

type FloatFlag ¶

type FloatFlag uint64

FloatFlag is a flag recorded when parsing floats.

const (
	// FloatOverflowedInteger is set when number in JSON was in integer notation,
	// but under/overflowed both int64 and uint64 and therefore was parsed as float.
	FloatOverflowedInteger FloatFlag = 1 << iota
)

func (FloatFlag) Flags ¶

func (f FloatFlag) Flags(more ...FloatFlag) FloatFlags

Flags converts the flag to FloatFlags and optionally merges more flags.

type FloatFlags ¶

type FloatFlags uint64

FloatFlags are flags recorded when converting floats.

func (FloatFlags) Contains ¶

func (f FloatFlags) Contains(flag FloatFlag) bool

Contains returns whether f contains the specified flag.

type Iter ¶

type Iter struct {
	// contains filtered or unexported fields
}

Iter represents a section of JSON. To start iterating it, use Advance() or AdvanceIter() methods which will queue the first element. If an Iter is copied, the copy will be independent.

func (*Iter) Advance ¶

func (i *Iter) Advance() Type

Advance will read the type of the next element and queues up the value on the same level.

func (*Iter) AdvanceInto ¶

func (i *Iter) AdvanceInto() Tag

AdvanceInto will read the tag of the next element and move into and out of arrays , objects and root elements. This should only be used for strictly manual parsing.

func (*Iter) AdvanceIter ¶

func (i *Iter) AdvanceIter(dst *Iter) (Type, error)

AdvanceIter will read the type of the next element and return an iterator only containing the object. If dst and i are the same, both will contain the value inside.

func (*Iter) Array ¶

func (i *Iter) Array(dst *Array) (*Array, error)

Array will return the next element as an array. An optional destination can be given.

func (*Iter) Bool ¶

func (i *Iter) Bool() (bool, error)

Bool() returns the bool value.

func (*Iter) Float ¶

func (i *Iter) Float() (float64, error)

Float returns the float value of the next element. Integers are automatically converted to float.

func (*Iter) FloatFlags ¶

func (i *Iter) FloatFlags() (float64, FloatFlags, error)

FloatFlags returns the float value of the next element. This will include flags from parsing. Integers are automatically converted to float.

func (*Iter) Int ¶

func (i *Iter) Int() (int64, error)

Int returns the integer value of the next element. Integers and floats within range are automatically converted.

func (*Iter) Interface ¶

func (i *Iter) Interface() (interface{}, error)

Interface returns the value as an interface. Objects are returned as map[string]interface{}. Arrays are returned as []interface{}. Float values are returned as float64. Integer values are returned as int64 or uint64. String values are returned as string. Boolean values are returned as bool. Null values are returned as nil. Root objects are returned as []interface{}.

func (*Iter) MarshalJSON ¶

func (i *Iter) MarshalJSON() ([]byte, error)

MarshalJSON will marshal the entire remaining scope of the iterator.

func (*Iter) MarshalJSONBuffer ¶

func (i *Iter) MarshalJSONBuffer(dst []byte) ([]byte, error)

MarshalJSONBuffer will marshal the remaining scope of the iterator including the current value. An optional buffer can be provided for fewer allocations. Output will be appended to the destination.

func (*Iter) Object ¶

func (i *Iter) Object(dst *Object) (*Object, error)

Object will return the next element as an object. An optional destination can be given.

func (*Iter) PeekNext ¶

func (i *Iter) PeekNext() Type

PeekNext will return the next value type. Returns TypeNone if next ends iterator.

func (*Iter) PeekNextTag ¶

func (i *Iter) PeekNextTag() Tag

PeekNextTag will return the tag at the current offset. Will return TagEnd if at end of iterator.

func (*Iter) Root ¶

func (i *Iter) Root(dst *Iter) (Type, *Iter, error)

Root() returns the object embedded in root as an iterator along with the type of the content of the first element of the iterator. An optional destination can be supplied to avoid allocations.

func (*Iter) String ¶

func (i *Iter) String() (string, error)

String() returns a string value.

func (*Iter) StringBytes ¶

func (i *Iter) StringBytes() ([]byte, error)

StringBytes() returns a byte array.

func (*Iter) StringCvt ¶

func (i *Iter) StringCvt() (string, error)

StringCvt() returns a string representation of the value. Root, Object and Arrays are not supported.

func (*Iter) Type ¶

func (i *Iter) Type() Type

Type returns the queued value type from the previous call to Advance.

func (*Iter) Uint ¶

func (i *Iter) Uint() (uint64, error)

Uint returns the unsigned integer value of the next element. Positive integers and floats within range are automatically converted.

type Object ¶

type Object struct {
	// contains filtered or unexported fields
}

Object represents a JSON object.

func (*Object) FindKey ¶

func (o *Object) FindKey(key string, dst *Element) *Element

FindKey will return a single named element. An optional destination can be given. The method will return nil if the element cannot be found. This should only be used to locate a single key where the object is no longer needed. The object will not be advanced.

func (*Object) Map ¶

func (o *Object) Map(dst map[string]interface{}) (map[string]interface{}, error)

Map will unmarshal into a map[string]interface{} See Iter.Interface() for a reference on value types.

func (*Object) NextElement ¶

func (o *Object) NextElement(dst *Iter) (name string, t Type, err error)

NextElement sets dst to the next element and returns the name. TypeNone with nil error will be returned if there are no more elements.

func (*Object) NextElementBytes ¶

func (o *Object) NextElementBytes(dst *Iter) (name []byte, t Type, err error)

NextElementBytes sets dst to the next element and returns the name. TypeNone with nil error will be returned if there are no more elements. Contrary to NextElement this will not cause allocations.

func (*Object) Parse ¶

func (o *Object) Parse(dst *Elements) (*Elements, error)

Parse will return all elements and iterators. An optional destination can be given. The Object will be consumed.

type ParsedJson ¶

type ParsedJson struct {
	Message []byte
	Tape    []uint64
	Strings []byte
	// contains filtered or unexported fields
}

func Parse ¶

func Parse(b []byte, reuse *ParsedJson) (*ParsedJson, error)

Parse a block of data and return the parsed JSON. An optional block of previously parsed json can be supplied to reduce allocations.

func ParseND ¶

func ParseND(b []byte, reuse *ParsedJson) (*ParsedJson, error)

ParseND will parse newline delimited JSON. An optional block of previously parsed json can be supplied to reduce allocations.

func (*ParsedJson) Iter ¶

func (pj *ParsedJson) Iter() Iter

Iter returns a new Iter.

func (*ParsedJson) Reset ¶

func (pj *ParsedJson) Reset()

type Serializer ¶

type Serializer struct {
	// contains filtered or unexported fields
}

Serializer allows to serialize parsed json and read it back. A Serializer can be reused, but not used concurrently.

func NewSerializer ¶

func NewSerializer() *Serializer

NewSerializer will create and initialize a Serializer.

func (*Serializer) CompressMode ¶

func (s *Serializer) CompressMode(c CompressMode)

func (*Serializer) Deserialize ¶

func (s *Serializer) Deserialize(src []byte, dst *ParsedJson) (*ParsedJson, error)

Deserialize the content in src. Only basic sanity checks will be performed. Slight corruption will likely go through unnoticed. And optional destination can be provided.

func (*Serializer) Serialize ¶

func (s *Serializer) Serialize(dst []byte, pj ParsedJson) []byte

Serialize the data in pj and return the data. An optional destination can be provided.

type Stream ¶

type Stream struct {
	Value *ParsedJson
	Error error
}

A Stream is used to stream back results. Either Error or Value will be set on returned results.

type Tag ¶

type Tag uint8

Tag indicates the data type of a tape entry

func (Tag) String ¶

func (t Tag) String() string

func (Tag) Type ¶

func (t Tag) Type() Type

Type converts a tag to a type. Only basic types and array+object start match a type.

type Type ¶

type Type uint8

Type is a JSON value type.

const (
	TypeNone Type = iota
	TypeNull
	TypeString
	TypeInt
	TypeUint
	TypeFloat
	TypeBool
	TypeObject
	TypeArray
	TypeRoot
)

func (Type) String ¶

func (t Type) String() string

String returns the type as a string.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
examples

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

simdjson-go

Introduction

Features

Requirements

Usage

Parsing Objects

Parsing Arrays

Number parsing

Parsing NDSJON stream

Serializing parsed json

Performance vs simdjson

Performance vs encoding/json and json-iterator/go

AVX512 Acceleration

Design

Stage 1

Stage 2

Tape format

Non streaming use cases

Fuzz Tests

License

Contributing

Documentation ¶

Index ¶

Constants ¶

Variables ¶

Functions ¶

func ParseNDStream ¶

func SupportedCPU ¶

Types ¶

type Array ¶

func (*Array) AsFloat ¶

func (*Array) AsInteger ¶

func (*Array) AsString ¶

func (*Array) AsStringCvt ¶

func (*Array) AsUint64 ¶

func (*Array) FirstType ¶

func (*Array) Interface ¶

func (*Array) Iter ¶

func (*Array) MarshalJSON ¶

func (*Array) MarshalJSONBuffer ¶

type CompressMode ¶

type Element ¶

type Elements ¶

func (Elements) Lookup ¶

func (Elements) MarshalJSON ¶

func (Elements) MarshalJSONBuffer ¶

type FloatFlag ¶

func (FloatFlag) Flags ¶

type FloatFlags ¶

func (FloatFlags) Contains ¶

type Iter ¶

func (*Iter) Advance ¶

func (*Iter) AdvanceInto ¶

func (*Iter) AdvanceIter ¶

func (*Iter) Array ¶

func (*Iter) Bool ¶

func (*Iter) Float ¶

func (*Iter) FloatFlags ¶

func (*Iter) Int ¶

func (*Iter) Interface ¶

func (*Iter) MarshalJSON ¶

func (*Iter) MarshalJSONBuffer ¶

func (*Iter) Object ¶

func (*Iter) PeekNext ¶

func (*Iter) PeekNextTag ¶

func (*Iter) Root ¶

func (*Iter) String ¶

func (*Iter) StringBytes ¶

func (*Iter) StringCvt ¶

func (*Iter) Type ¶

func (*Iter) Uint ¶

type Object ¶

func (*Object) FindKey ¶

func (*Object) Map ¶

func (*Object) NextElement ¶

func (*Object) NextElementBytes ¶

func (*Object) Parse ¶

type ParsedJson ¶

func Parse ¶

Performance vs `encoding/json` and `json-iterator/go`