goj

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 14, 2022 License: BSD-2-Clause Imports: 8 Imported by: 0

README

GO Json scanner

goj is a small low-level JSON scanning library. It is representation-free, providing no in-memory representation (that's your job).

goj may be useful to you if the following are true:

  1. you need fast json parsing
  2. you do not need a streaming parser (the distinct JSON documents you are parsing are delimited in some fasion)
  3. you either want to extract a subset of JSON documents, or have your own data representation in memory, or wish to transform JSON into a different format.

Usage

Installation:

go get github.com/lloyd/goj

A program to extract and print the top level .name property from a json file passed in on stdin:

package main

import (
	"fmt"
	"io/ioutil"
	"os"

	"github.com/lloyd/goj"
)

func main() {
	buf, _ := ioutil.ReadAll(os.Stdin)

	p := goj.NewParser()

	depth := 0
	err := p.Parse(buf, func(t goj.Type, key []byte, value []byte) bool {
		switch t {
		case goj.String:
			if depth == 1 && string(key) == "name" {
				fmt.Printf("%s\n", string(value))
				return false
			}
		case goj.Array, goj.Object:
			depth++
		case goj.ArrayEnd, goj.ObjectEnd:
			depth--
		}
		return true
	})

	if err != nil && err != goj.ClientCancelledParse {
		fmt.Printf("error: %s\n", err)
	}
}

Performance

All numbers below are on:

go version go1.11.1 linux/amd64
Intel(R) Xeon(R) CPU E5-2643 v4 @ 3.40GHz

Using the same JSON sample data as encoding/json, goj scans about 3x faster than go's reflection based json parsing:

$ go test -bench . -run 'XXX'
goos: linux
goarch: amd64
pkg: github.com/lloyd/goj/test
BenchmarkGojScanning-24                   300         4836167 ns/op       401.24 MB/s
BenchmarkStdJSONScanning-24               100        13836559 ns/op       140.24 MB/s
PASS
ok      github.com/lloyd/goj/test       3.384s

See test/bench_test.go for the source.

Comparing against jq (a tiny and awesome tool written in C that extracts nested values from json data), goj is more than 4x faster.

$ go build example/main.go && time ./main < ~/4.9gb_sample.json > /dev/null
real 0m20.476s
user 0m18.838s
sys  0m1.734s

$ time jq -r .name < ~/4.9gb_sample.json > /dev/null
real   1m26.964s
user   1m25.515s
sys    0m1.372s

Compared against yajl (a fast streaming json parser written in C) in a fair fight, goj is about the same.

$ json_verify -s < ~/4.9gb_sample.json
...
real 0m14.504s
user 0m13.754s
sys  0m0.736s

$ go build cmd/prof/main.go && time ./main < ~/4.9gb_sample.json
...
real    0m14.171s
user    0m13.386s
sys     0m0.793s

License

BSD 2 Clause, see LICENSE.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var ClientCancelledParse = &Error{
	e: "client cancelled parse",
}

Error code returned from .Parse() when callback returns false.

View Source
var PageSize = uintptr(os.Getpagesize())

Global PageSize variable so a sys call is not made each time

Functions

func ReadJSONNL

func ReadJSONNL(s io.Reader, cb func(t Type, key []byte, value []byte, line int64) bool) error

ReadJSONNL - Read and parse newline separated JSON from an `io.Reader` invoke callback with each token. Terminate if callback returns false. arguments to callback:

t - token type
key - key if parsing object key / value pairs
value - decoded value
line - line offset in file.  Distinct documents are indicated by a distinct line number.

Types

type Action

type Action uint8

Action drives the behavior from the callback

const (
	Continue Action = iota
	// Cancel the parsing
	Cancel
	// Skips the current content and invoke callback when over with the []slice
	Skip
)

func (Action) String

func (a Action) String() string

type Callback

type Callback func(what Type, key []byte, value []byte) Action

Callback is the signature of the client callback to the parsing routine. The routine is passed the type of entity parsed, a key if relevant (parsing inside an object), and a decoded value.

type Error

type Error struct {
	// contains filtered or unexported fields
}

The Error object is provided by the Parser when an error is encountered.

func (*Error) Error

func (e *Error) Error() string

func (*Error) Verbose

func (e *Error) Verbose() string

Verbose returns a longer version of the error string, along with a limited portion of the JSON around which the error occurred.

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser is the primary object provided by goj via the NewParser method. The various parsing routines are provided by this object, but it has no exported fields.

func NewParser

func NewParser() *Parser

NewParser - Allocate a new JSON Scanner that may be re-used.

func (*Parser) Parse

func (p *Parser) Parse(buf []byte, cb Callback) (err error)

Parse parses a complete JSON document. Callback will be invoked once for each JSON entity found.

type Type

type Type uint8

Type represents the JSON value type.

const (
	// String represents a JSON string.
	String Type = iota
	// Integer represents a JSON number known to be a uint.
	Integer
	// NegInteger represents a JSON number known to be an int.
	NegInteger
	// Float represents a JSON number that is neither an int or uint.
	Float
	// True represents the JSON boolean 'true'.
	True
	// False represents the JSON boolean 'false'.
	False
	// Null represents the JSON null value.
	Null
	// Array represents the beginning of a JSON array.
	Array
	// ArrayEnd represents the end of a JSON array.
	ArrayEnd
	// Object represents the beginning of a JSON object.
	Object
	// ObjectEnd represents the end of a JSON object.
	ObjectEnd
	// SkippedData represent the []byte of data that was skipped.
	SkippedData
)

func (Type) String

func (t Type) String() string

Directories

Path Synopsis
cmd

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL