collatejson

package module
v0.0.0-...-85df4e1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 10, 2021 License: Apache-2.0 Imports: 6 Imported by: 4

README

README

Collatejson library, written in golang, provides encoding and decoding function to transform JSON text into binary representation without loosing information. That is,

  • binary representation should preserve the sort order such that, sorting binary encoded json documents much match sorting by functions that parse and compare JSON documents.
  • it must be possible to get back the original document, in semantically correct form, from its binary representation.

Notes:

  • items in a property object are sorted by its property name before they are compared with other property object.

for api documentation and bench marking try,

godoc github.com/couchbaselabs/go-collatejson | less
cd go-collatejson
go test -test.bench=.

to measure relative difference in sorting 100K elements using encoding/json library and this library try,

go test -test.bench=Sort

examples/* contains reference sort ordering for different json elements.

For known issues refer to TODO.rst

Documentation

Overview

Package collatejson supplies Encoding and Decoding function to transform JSON text into binary representation without loosing information. That is,

  • binary representation should preserve the sort order such that, sorting binary encoded json documents much match sorting by functions that parse and compare JSON documents.
  • it must be possible to get back the original document, in semantically correct form, from its binary representation.

Notes:

  • items in a property object are sorted by its property name before they are compared with property's value.

Index

Constants

View Source
const (
	PLUS  = 43
	MINUS = 45
	LT    = 60
	GT    = 62
	DOT   = 46
	ZERO  = 48
)

Constants used in text representation of basic data types.

View Source
const (
	Terminator byte = iota
	TypeMissing
	TypeNull
	TypeFalse
	TypeTrue
	TypeNumber
	TypeString
	TypeLength
	TypeArray
	TypeObj
)

While encoding JSON data-element, both basic and composite, encoded string is prefixed with a type-byte. `Terminator` terminates encoded datum.

View Source
const MinBufferSize = 16

MinBufferSize for target buffer to encode or decode.

View Source
const MissingLiteral = Missing("~[]{}falsenilNA~")

MissingLiteral is special string to denote missing item. IMPORTANT: we are assuming that MissingLiteral will not occur in the keyspace.

Variables

View Source
var ErrorNumberType = errors.New("collatejson.numberType")

ErrorNumberType means configured number type is not supported by codec.

View Source
var ErrorOutputLen = errors.New("collatejson.outputLen")

ErrorOutputLen means output buffer has insufficient length.

View Source
var ErrorSuffixDecoding = errors.New("collatejson.suffixDecoding")

error codes

Functions

func DecodeFloat

func DecodeFloat(code, text []byte) []byte

DecodeFloat complements EncodeFloat, it returns `exponent` and `mantissa` in text format.

func DecodeInt

func DecodeInt(code, text []byte) (int, []byte)

DecodeInt complements EncodeInt, it returns integer in text that can be converted to integer value using strconv.AtoI(return_value)

func DecodeLD

func DecodeLD(code, text []byte) []byte

DecodeLD complements EncodeLD, it returns integer in text that can be converted to integer type using strconv.ParseFloat(return_value, 64).

func DecodeSD

func DecodeSD(code, text []byte) []byte

DecodeSD complements EncodeSD, it returns integer in text that can be converted to integer type using strconv.ParseFloat(return_value, 64).

func EncodeFloat

func EncodeFloat(text, code []byte) []byte

EncodeFloat encodes floating point number such that their natural order is preserved as lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

A floating point number f takes a mantissa m ∈ [1/10 , 1) and an integer exponent e such that f = (10^e) * ±m.

encoding −0.1 × 10^11    - --7888+
encoding −0.1 × 10^10    - --7898+
encoding -1.4            - -885+
encoding -1.3            - -886+
encoding -1              - -88+
encoding -0.123          - 0876+
encoding -0.0123         - +1876+
encoding -0.001233       - +28766+
encoding -0.00123        - +2876+
encoding 0               0
encoding +0.00123        + -7123-
encoding +0.001233       + -71233-
encoding +0.0123         + -8123-
encoding +0.123          + 0123-
encoding +1              + +11-
encoding +1.3            + +113-
encoding +1.4            + +114-
encoding +0.1 × 10^10    + ++2101-
encoding +0.1 × 10^11    + ++2111-

func EncodeInt

func EncodeInt(text, code []byte) []byte

EncodeInt encodes integer such that their natural order is preserved as a lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Input `text` is also in textual representation, that is, strconv.Atoi(text) is the actual integer that is encoded.

Zero is encoded as '0'

func EncodeLD

func EncodeLD(text, code []byte) []byte

EncodeLD encodes large-decimal, values that are greater than or equal to +1.0 and less than or equal to -1.0, such that their natural order is preserved as a lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Input `text` is also in textual representation, that is, strconv.ParseFloat(text, 64) is the actual integer that is encoded.

encoding -100.5         --68994>
encoding -10.5          --7>
encoding -3.145         -3854>
encoding -3.14          -385>
encoding -1.01          -198>
encoding -1             -1>
encoding -0.0001233     -09998766>
encoding -0.000123      -0999876>
encoding +0.000123      >0000123-
encoding +0.0001233     >00001233-
encoding +1             >1-
encoding +1.01          >101-
encoding +3.14          >314-
encoding +3.145         >3145-
encoding +10.5          >>2105-
encoding +100.5         >>31005-

func EncodeSD

func EncodeSD(text, code []byte) []byte

EncodeSD encodes small-decimal, values that are greater than -1.0 and less than +1.0,such that their natural order is preserved as lexicographic order of their representation. Additionally it must be possible to get back the natural representation from its lexical representation.

Small decimals is greater than -1.0 and less than 1.0

Input `text` is also in textual representation, that is, strconv.ParseFloat(text, 64) is the actual integer that is encoded.

encoding -0.9995    -0004>
encoding -0.999     -000>
encoding -0.0123    -9876>
encoding -0.00123   -99876>
encoding -0.0001233 -9998766>
encoding -0.000123  -999876>
encoding +0.000123  >000123-
encoding +0.0001233 >0001233-
encoding +0.00123   >00123-
encoding +0.0123    >0123-
encoding +0.999     >999-
encoding +0.9995    >9995-

Caveats:

-0.0, 0.0 and +0.0 must be filtered out as integer ZERO `0`.

Types

type ByteSlices

type ByteSlices [][]byte

ByteSlices to implement Sort interface.

func (ByteSlices) Len

func (b ByteSlices) Len() int

func (ByteSlices) Less

func (b ByteSlices) Less(i, j int) bool

func (ByteSlices) Swap

func (b ByteSlices) Swap(i, j int)

type Codec

type Codec struct {
	// contains filtered or unexported fields
}

Codec structure

func NewCodec

func NewCodec(propSize int) *Codec

NewCodec creates a new codec object and returns a reference to it.

func (*Codec) Decode

func (codec *Codec) Decode(code, text []byte) ([]byte, error)

Decode a slice of byte into json string and return them as slice of byte. `text` is the output buffer for decoding and expected to have enough capacity, atleast 3x of input `code` and > MinBufferSize.

func (*Codec) Encode

func (codec *Codec) Encode(text, code []byte) ([]byte, error)

Encode json documents to order preserving binary representation. `code` is the output buffer for encoding and expected to have enough capacity, atleast 3x of input `text` and > MinBufferSize.

func (*Codec) NumberType

func (codec *Codec) NumberType(what string)

NumberType chooses type of encoding / decoding for JSON numbers. Can be "float64", "int64", "decimal". Default is "float64"

func (*Codec) SortbyArrayLen

func (codec *Codec) SortbyArrayLen(what bool)

SortbyArrayLen sorts array by length before sorting by array elements. Use `false` to sort only by array elements. Default is `true`.

func (*Codec) SortbyPropertyLen

func (codec *Codec) SortbyPropertyLen(what bool)

SortbyPropertyLen sorts property by length before sorting by property items. Use `false` to sort only by proprety items. Default is `true`.

func (*Codec) UseMissing

func (codec *Codec) UseMissing(what bool)

UseMissing will interpret special string MissingLiteral and encode them as TypeMissing. Default is `true`.

type Length

type Length int64

Length is an internal type used for prefixing length of arrays and properties.

type Missing

type Missing string

Missing denotes a special type for an item that evaluates to _nothing_.

func (Missing) Equal

func (m Missing) Equal(n string) bool

Equal checks wether n is MissingLiteral

Directories

Path Synopsis
tools

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL