jibby

package module
v0.1.9 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Oct 28, 2021 License: Apache-2.0 Imports: 21 Imported by: 1

README

jibby - High-performance streaming JSON-to-BSON decoder in Go

Go Reference Go Report Card Github Actions codecov License

jibby: A general term to describe an exceptionally positive vibe, attitude, or influence.

~ Urban Dictionary

The jibby package provide high-performance conversion of JSON objects to BSON documents. Key features include:

  • stream decoding - white space delimited or from a JSON array container
  • no reflection
  • minimal abstraction
  • minimal copy
  • allocation-friendly

Examples

import (
	"bufio"
	"bytes"
	"log"

	"github.com/xdg-go/jibby"
)

func ExampleUnmarshal() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	bson, err := jibby.Unmarshal([]byte(json), bson)
	if err != nil {
		log.Fatal(err)
	}
}

func ExampleDecoder_Decode() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	jsonReader := bufio.NewReaderSize(bytes.NewReader([]byte(json)), 8192)
	jib, err := jibby.NewDecoder(jsonReader)
	if err != nil {
		log.Fatal(err)
	}

	bson, err = jib.Decode(bson)
	if err != nil {
		log.Fatal(err)
	}
}

Extended JSON

Jibby optionally supports the MongoDB Extended JSON v2 format. There is limited support for the v1 format -- specifically, the $type and $regex keys use heuristics to determine whether these are extended JSON or MongoDB query operators.

Escape sequences are not supported in Extended JSON keys or number formats, only in naturally textual fields like $symbol, $code, etc. In practice, MongoDB Extended JSON generators should never output escape sequences in keys and number fields anyway.

Limitations

  • Maximum depth defaults to 200 levels of nesting (but is configurable)
  • Only well-formed UTF-8 encoding (including optional BOM) is supported.
  • Numbers (floats and ints) must conform to formats/limits of Go's strconv library.
  • Escape sequences not supported in extended JSON keys and some extended JSON values.

Testing

Jibby is extensively tested.

Jibby's JSON-to-BSON output is compared against reference output from the MongoDB Go driver. Extended JSON conversion is tested against the MongoDB BSON Corpus.

JSON parsing support is tested against data sets from Nicholas Seriot's Parsing JSON is a Minefield article. It behaves correctly against all "y" (must support) tests and "n" (must error) tests. It passes all "i" (implementation defined) tests except for cases exceeding Go's numerical precision or with invalid/unsupported Unicode encoding.

Performance

Performance varies based on the shape of the input data.

For a 92 MB mixed JSON dataset with some extended JSON:

           jibby 283.46 MB/s
   jibby extjson 207.42 MB/s
   driver bsonrw 43.77 MB/s
naive json->bson 43.25 MB/s

For a 4.3 MB pure JSON dataset with lots of arrays:

           jibby 107.15 MB/s
   jibby extjson 123.76 MB/s
   driver bsonrw 25.68 MB/s
naive json->bson 32.78 MB/s

The jibby and jibby extjson figures are jibby without and with extended JSON enabled, respectively. The driver bsonrw figures use the MongoDB Go driver in a streaming mode with bsonrw.NewExtJSONValueReader. The naive json->bson figures use Go's encoding/json to decode to map[string]interface{} and the Go driver's bson.Marshal function.

Copyright 2020 by David A. Golden. All rights reserved.

Licensed under the Apache License, Version 2.0 (the "License"). You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0

Documentation

Overview

Package jibby is a high-performance, streaming JSON-to-BSON decoder. It decodes successive JSON objects into BSON documents from a buffered input byte stream while minimizing memory copies. Only UTF-8 encoding is supported and input text is expected to be well-formed.

Extended JSON

Jibby optionally supports the MongoDB Extended JSON v2 format (https://docs.mongodb.com/manual/reference/mongodb-extended-json/index.html). There is limited support for the v1 format -- specifically, the `$type` and `$regex` keys use heuristics to determine whether these are extended JSON or MongoDB query operators.

Escape sequences are not supported in Extended JSON keys or number formats, only in naturally textual fields like `$symbol`, `$code`, etc. In practice, MongoDB Extended JSON generators should never output escape sequences in keys and number fields anyway.

Testing

Jibby is extensively tested.

Jibby's JSON-to-BSON output is compared against reference output from the MongoDB Go driver. Extended JSON conversion is tested against the MongoDB BSON Corpus: https://github.com/mongodb/specifications/tree/master/source/bson-corpus.

JSON parsing support is tested against data sets from Nicholas Seriot's Parsing JSON is a Minefield article (http://seriot.ch/parsing_json.php). It behaves correctly against all "y" (must support) tests and "n" (must error) tests. It passes all "i" (implementation defined) tests except for cases exceeding Go's numerical precision or with invalid/unsupported Unicode encoding.

Index

Examples

Constants

This section is empty.

Variables

View Source
var ErrUnsupportedBOM = errors.New("unsupported byte order mark")

ErrUnsupportedBOM means that a UTF-16 or UTF-32 byte order mark was found.

Functions

func Unmarshal

func Unmarshal(in []byte, out []byte) ([]byte, error)

Unmarshal converts a single JSON object to a BSON document. The function takes an output buffer as an argument. If the buffer is not large enough, a new buffer will be allocated on demand. The final buffer is returned, just like with `append`. The function returns io.EOF if the input is empty.

Example
package main

import (
	"log"

	"github.com/xdg-go/jibby"
)

func main() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	bson, err := jibby.Unmarshal([]byte(json), bson)
	if err != nil {
		log.Fatal(err)
	}

	// Do something with bson
	_ = bson
}
Output:

func UnmarshalExtJSON

func UnmarshalExtJSON(in []byte, out []byte) ([]byte, error)

UnmarshalExtJSON converts a single Extended JSON object to a BSON document. It otherwise works like `Unmarshal`.

Types

type Decoder

type Decoder struct {
	// contains filtered or unexported fields
}

Decoder reads and decodes JSON objects to BSON from a buffered input stream. Objects may be separated by optional white space or may be in a well-formed JSON array at the top-level.

func NewDecoder

func NewDecoder(json *bufio.Reader) (*Decoder, error)

NewDecoder returns a new decoder. If a UTF-8 byte-order-mark (BOM) exists, it will be stripped. Because only UTF-8 is supported, other BOMs are error and will return ErrUnsupportedBOM. This function consumes leading white space and checks if the first character is '['. If so, the input format is expected to be a single JSON array of objects and the stream will consist of the objects in the array. Any read error (including io.EOF) during these checks will be returned.

If the the bufio.Reader's size is less than 8192, it will be rebuffered. This is necessary to account for lookahead for long decimals to minimize copying.

func (*Decoder) Decode

func (d *Decoder) Decode(buf []byte) ([]byte, error)

Decode converts a single JSON object from the input stream into BSON object. The function takes an output buffer as an argument. If the buffer is not large enough, a new buffer will be allocated when needed. The final buffer is returned, just like with `append`. The function returns io.EOF if no objects remain in the stream.

Example
package main

import (
	"bufio"
	"bytes"
	"log"

	"github.com/xdg-go/jibby"
)

func main() {
	json := `{"a": 1, "b": "foo"}`
	bson := make([]byte, 0, 256)

	jsonReader := bufio.NewReaderSize(bytes.NewReader([]byte(json)), 8192)
	jib, err := jibby.NewDecoder(jsonReader)
	if err != nil {
		log.Fatal(err)
	}

	bson, err = jib.Decode(bson)
	if err != nil {
		log.Fatal(err)
	}

	// Do something with bson
	_ = bson
}
Output:

func (*Decoder) ExtJSON

func (d *Decoder) ExtJSON(b bool)

ExtJSON toggles whether extended JSON is interpreted by the decoder. See https://docs.mongodb.com/manual/reference/mongodb-extended-json/index.html Jibby has limited support for the legacy extended JSON format.

func (*Decoder) MaxDepth

func (d *Decoder) MaxDepth(n int)

MaxDepth sets the maximum allowed depth of a JSON object. The default is 200.

type ParseError added in v0.1.4

type ParseError struct {
	// contains filtered or unexported fields
}

ParseError records JSON/Extended JSON parsing errors. It can include a small excerpt of text from the reader at the point of error.

func (*ParseError) Error added in v0.1.4

func (pe *ParseError) Error() string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL