goj

package module
v0.0.0-...-d40fbc5 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Mar 18, 2022 License: BSD-2-Clause Imports: 8 Imported by: 0

README

GO Json scanner

goj is a small low-level JSON scanning library. It is representation-free, providing no in-memory representation (that's your job).

goj may be useful to you if the following are true:

note this is a soft-fork of lloyd/goj to get some PRs I was interested in.

  1. you need fast json parsing
  2. you do not need a streaming parser (the distinct JSON documents you are parsing are delimited in some fasion)
  3. you either want to extract a subset of JSON documents, or have your own data representation in memory, or wish to transform JSON into a different format.

Usage

Installation:

go get github.com/flowchartsman/goj

A program to extract and print the top level .name property from a json file passed in on stdin:

package main

import (
	"fmt"
	"io/ioutil"
	"os"

	"github.com/flowchartsman/goj"
)

func main() {
	buf, _ := ioutil.ReadAll(os.Stdin)

	p := goj.NewParser()

	depth := 0
	err := p.Parse(buf, func(t goj.Type, key []byte, value []byte) bool {
		switch t {
		case goj.String:
			if depth == 1 && string(key) == "name" {
				fmt.Printf("%s\n", string(value))
				return false
			}
		case goj.Array, goj.Object:
			depth++
		case goj.ArrayEnd, goj.ObjectEnd:
			depth--
		}
		return true
	})

	if err != nil && err != goj.ClientCancelledParse {
		fmt.Printf("error: %s\n", err)
	}
}

Performance

All numbers below are on:

go version go1.11.1 linux/amd64
Intel(R) Xeon(R) CPU E5-2643 v4 @ 3.40GHz

Using the same JSON sample data as encoding/json, goj scans about 3x faster than go's reflection based json parsing:

$ go test -bench . -run 'XXX'
goos: darwin
goarch: amd64
pkg: github.com/flowchartsman/goj
cpu: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
BenchmarkGojScanning-16                      325           3332903 ns/op         582.22 MB/s
BenchmarkGojOffsetScanning-16                397           3156122 ns/op         614.83 MB/s
BenchmarkStdJSONScanning-16                   98          10903834 ns/op         177.96 MB/s
PASS
ok      github.com/flowchartsman/goj    5.180s

See test/bench_test.go for the source.

Comparing against jq (a tiny and awesome tool written in C that extracts nested values from json data), goj is more than 4x faster.

$ go build example/main.go && time ./main < ~/4.9gb_sample.json > /dev/null
real 0m20.476s
user 0m18.838s
sys  0m1.734s

$ time jq -r .name < ~/4.9gb_sample.json > /dev/null
real   1m26.964s
user   1m25.515s
sys    0m1.372s

Compared against yajl (a fast streaming json parser written in C) in a fair fight, goj is about the same.

$ json_verify -s < ~/4.9gb_sample.json
...
real 0m14.504s
user 0m13.754s
sys  0m0.736s

$ go build cmd/prof/main.go && time ./main < ~/4.9gb_sample.json
...
real    0m14.171s
user    0m13.386s
sys     0m0.793s

License

BSD 2 Clause, see LICENSE.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ClientCancelledParse = &Error{
	e: "client cancelled parse",
}

Error code returned from .Parse() when callback returns false.

View Source
var PageSize = uintptr(os.Getpagesize())

Global PageSize variable so a sys call is not made each time

Functions

func ReadJSONNL

func ReadJSONNL(s io.Reader, cb func(t Type, key []byte, value []byte, line int64) bool) error

ReadJSONNL - Read and parse newline separated JSON from an `io.Reader` invoke callback with each token. Terminate if callback returns false. arguments to callback:

t - token type
key - key if parsing object key / value pairs
value - decoded value
line - line offset in file.  Distinct documents are indicated by a distinct line number.

Types

type Action

type Action uint8

Action drives the behavior from the callback

const (
	Continue Action = iota
	// Cancel the parsing
	Cancel
	// Skips the current content and invoke callback when over with the []slice
	Skip
)

func (Action) String

func (a Action) String() string

type Callback

type Callback func(what Type, key []byte, value []byte) Action

Callback is the signature of the client callback to the parsing routine. The routine is passed the type of entity parsed, a key if relevant (parsing inside an object), and a decoded value.

type Error

type Error struct {
	// contains filtered or unexported fields
}

The Error object is provided by the Parser when an error is encountered.

func (*Error) Error

func (e *Error) Error() string

func (*Error) Verbose

func (e *Error) Verbose() string

Verbose returns a longer version of the error string, along with a limited portion of the JSON around which the error occurred.

type OffsetCallback

type OffsetCallback func(what Type, key []byte, start, end int) Action

OffsetCallback is the signature of the client callback to the offset parsing routine.

The routine is passed the type of entity parsed, a key if relevant (parsing inside an object), and the start and end offset of the value.

The start offset is inclusive and the end offset is exclusive, which allows for callers to simply re-slice their buffers with the received data without any manipulation.

There are corner cases for opening and closing of arrays and objects, on those scenarios the caller will receive the start value and -1 for end and -1 and end value respectively.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is the primary object provided by goj via the NewParser method. The various parsing routines are provided by this object, but it has no exported fields.

func NewParser

func NewParser() *Parser

NewParser - Allocate a new JSON Scanner that may be re-used.

func (*Parser) OffsetParse

func (p *Parser) OffsetParse(buf []byte, cb OffsetCallback) error

OffsetParse implements lazy parsing that allows for callers to decide how to read data from the byte slices.

The callbacks will receive the indices of the raw data without any parsing, so the caller is responsible for any decoding if needed. The only exception for this is when it's a key object, in that case the parser will decode data before calling the callback.

func (*Parser) Parse

func (p *Parser) Parse(buf []byte, cb Callback) (err error)

Parse parses a complete JSON document. Callback will be invoked once for each JSON entity found.

type Type

type Type uint8

Type represents the JSON value type.

const (
	// String represents a JSON string.
	String Type = iota
	// Integer represents a JSON number known to be a uint.
	Integer
	// NegInteger represents a JSON number known to be an int.
	NegInteger
	// Float represents a JSON number that is neither an int or uint.
	Float
	// True represents the JSON boolean 'true'.
	True
	// False represents the JSON boolean 'false'.
	False
	// Null represents the JSON null value.
	Null
	// Array represents the beginning of a JSON array.
	Array
	// ArrayEnd represents the end of a JSON array.
	ArrayEnd
	// Object represents the beginning of a JSON object.
	Object
	// ObjectEnd represents the end of a JSON object.
	ObjectEnd
	// SkippedData represent the []byte of data that was skipped.
	SkippedData
)

func (Type) String

func (t Type) String() string

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL