cmpbin

package

v0.0.0-...-30d81ab Latest Latest Go to latest Published: Apr 26, 2024 License: Apache-2.0 Imports: 5 Imported by: 17

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

chromium.googlesource.com/infra/luci/luci-go

Links

Open Source Insights

Documentation ¶

Overview ¶

Package cmpbin provides binary serialization routines which ensure that the serialized objects maintain the same sort order of the original inputs when sorted bytewise (i.e. with memcmp). Additionally, serialized objects are concatenatable, and the concatenated items will behave as if they're compared field-to-field. So, for example, comparing each string in a []string would compare the same way as comparing the concatenation of those strings encoded with cmpbin. Simply concatenating the strings without encoding them will NOT retain this property, as you could not distinguish []string{"a", "aa"} from []string{"aa", "a"}. With cmpbin, these two would unambiguously sort as ("a", "aa") < ("aa", "a").

Notes on particular serialization schemes:

- Numbers: The number encoding is less efficient on average than Varint ("encoding/binary") for small numbers (it has a minimum encoded size of 2 bytes), but is more efficient for large numbers (it has a maximum encoded size of 9 bytes for a 64 bit int, unlike the largest Varint which has a 10b representation).

Both signed and unsigned numbers are encoded with the same scheme, and will sort together as signed numbers. Decoding with the incorrect routine will result in an ErrOverflow/ErrUnderflow error if the actual value is out of range.

The scheme works like:

given an 2's compliment value V
extract the sign (S) and magnitude (M) of V
Find the position of the highest bit (P), minus 1.
write (bits):
SPPPPPPP MMMMMMMM MM000000
S is 1
P's are the log2(M)-1
M's are the magnitude of V
0's are padding
Additionally, if the number is negative, invert the bits of all the bytes (e.g. XOR 0xFF). This makes the sign bit S 0 for negative numbers, and makes the ordering of the numbers correct when compared bytewise.

- Strings/[]byte Each byte in the encoded stream reserves the least significant bit as a stop bit (1 means that the string continues, 0 means that the string ends). The actual user data is shifted into the top 7 bits of every encoded byte. This results in a data inflation rate of 12.5%, but this overhead is constant (doesn't vary by the encoded content). Note that if space efficiency is very important and you are storing large strings on average, you could reduce the overhead by only placing the stop bit on every other byte or every 4th byte, etc. This would reduce the overhead to 6.25% or 3.125% accordingly (but would cause every string to round out to 2 or 4 byte chunks), and it would make the algorithm implementation more complex. The current implementation was chosen as good enough in light of the fact that pre-compressing regular data could save more than 12.5% overall, and that for incompressable data a commonly used encoding scheme (base64) has a full 25% overhead (and a generally more complex implementation).

- Floats Floats are tricky (really tricky) because they have lots of weird non-sortable special cases (like NaN). That said, for the majority of non-weird cases, the implementation here will sort real numbers the way that you would expect.

The implementation is derived from http://stereopsis.com/radix.html, and full credit for the original algorithm goes to Michael Herf. The algorithm is essentially:

if the number is positive, flip the top bit
if the number is negative, flip all the bits

Floats are not varint encoded, you could varint encode the mantissa (significand). This is only a 52 bit section, meaning that it is normally encoded with 6.5 bytes (a nybble is stolen from the second exponent byte). Assuming you used the numerical encoding above, shifted left by 4 bits, discarding the sign bit (since its already the MSb on the float, and then using 6 bits (instead of 7) to represent the number of significant bits in the mantissa (since there are only a maximum of 52), you could expect to see small-mantissa floats (of any characteristic) encoded in 3 bytes (this has 6 bits of mantissa), and the largest floats would have an encoded size of 9 bytes (with 2 wasted bits). However the implementation complexity would be higher.

The actual encoded values for special cases are (sorted high to low):

QNaN - 0xFFF8000000000000 // note that golang doesn't seem to actually have SNaN?
SNaN - 0xFFF0000000000001
+inf - 0xFFF0000000000000
MaxFloat64 - 0xFFEFFFFFFFFFFFFF
SmallestNonzeroFloat64 - 0x8000000000000001
0 - 0x8000000000000000
-0 - 0x7FFFFFFFFFFFFFFF
-SmallestNonzeroFloat64 - 0x7FFFFFFFFFFFFFFE
-MaxFloat64 - 0x0010000000000000
-inf - 0x000FFFFFFFFFFFFF

Index ¶

Constants
Variables
func ConcatBytes(itms ...[]byte) []byte
func IncrementBytes(bstr []byte) ([]byte, bool)
func InvertBytes(bs []byte) []byte
func ReadBytes(buf io.ByteReader) (ret []byte, n int, err error)
func ReadFloat64(buf io.Reader) (ret float64, n int, err error)
func ReadInt(r io.ByteReader) (ret int64, n int, err error)
func ReadString(buf io.ByteReader) (ret string, n int, err error)
func ReadUint(r io.ByteReader) (mag uint64, n int, err error)
func WriteBytes(buf io.ByteWriter, data []byte) (n int, err error)
func WriteFloat64(buf io.Writer, v float64) (n int, err error)
func WriteInt(w io.ByteWriter, val int64) (int, error)
func WriteString(buf io.ByteWriter, s string) (n int, err error)
func WriteUint(w io.ByteWriter, mag uint64) (int, error)
type InvertibleBytesBuffer
- func Invertible(b WriteableBytesBuffer) InvertibleBytesBuffer
type ReadableBytesBuffer
type WriteableBytesBuffer

Constants ¶

View Source

const (
	MaxIntLen16 = 3
	MaxIntLen32 = 5
	MaxIntLen64 = 9
)

MaxIntLenN is the maximum length of a cmpbin-encoded N-bit integer (signed or unsigned).

Variables ¶

View Source

var ErrByteLimitExceeded = errors.New("cmbpin: too big! tried to read > cmpbin.ReadByteLimit")

ErrByteLimitExceeded is returned from ReadBytes and ReadString when they attempt to read more than ReadByteLimit bytes.

View Source

var ErrOverflow = errors.New("cmpbin: varint overflows")

ErrOverflow is returned when reading an number which is too large for the destination type (either uint64 or int64)

View Source

var ErrUnderflow = errors.New("cmpbin: uvarint underflows")

ErrUnderflow is returned when reading an number which is too small for the destination type (either uint64 or int64)

View Source

var ReadByteLimit = int(math.Ceil(2 * 1024 * 1024 * 8 / 7))

ReadByteLimit is the limit of how many bytes ReadBytes and ReadString are willing to deserialize before returning ErrByteLimitExceeded. It is currently set to allow 2MB of user data (taking encoding size overhead into account).

Functions ¶

func ConcatBytes ¶

func ConcatBytes(itms ...[]byte) []byte

ConcatBytes is a convenience invocation of bytes.Join(itms, nil)

func IncrementBytes ¶

func IncrementBytes(bstr []byte) ([]byte, bool)

IncrementBytes attempts to increment a copy of bstr as if adding 1 to an integer.

If it overflows, the returned []byte will be all 0's, and the overflow bool will be true.

func InvertBytes ¶

func InvertBytes(bs []byte) []byte

InvertBytes simply inverts all the bytes in bs.

func ReadBytes ¶

func ReadBytes(buf io.ByteReader) (ret []byte, n int, err error)

ReadBytes reads an encoded []byte from buf, returning the number of bytes read, and any read error encountered.

func ReadFloat64 ¶

func ReadFloat64(buf io.Reader) (ret float64, n int, err error)

ReadFloat64 reads a memcmp-sortable float from buf (as written by WriteFloat64). It also returns the number of bytes read (8, unless an error occurs), and any read error encountered.

func ReadInt ¶

func ReadInt(r io.ByteReader) (ret int64, n int, err error)

ReadInt decodes a cmpbin-encoded number from a ByteReader. It returns the decoded value and the number of bytes read. The error may be Err{Over,Under}flow if the number is out of bounds. It may also return an error if the ByteReader returns an error.

func ReadString ¶

func ReadString(buf io.ByteReader) (ret string, n int, err error)

ReadString reads an encoded string from buf, returning the number of bytes read, and any read error encountered.

func ReadUint ¶

func ReadUint(r io.ByteReader) (mag uint64, n int, err error)

ReadUint decodes a cmpbin-encoded positive number from a ByteReader. It returns the decoded value and the number of bytes read. The error may be Err{Over,Under}flow if the number is out of bounds. It may also return an error if the ByteReader returns an error.

func WriteBytes ¶

func WriteBytes(buf io.ByteWriter, data []byte) (n int, err error)

WriteBytes writes an encoded []byte to buf, returning the number of bytes written, and any write error encountered.

func WriteFloat64 ¶

func WriteFloat64(buf io.Writer, v float64) (n int, err error)

WriteFloat64 writes a memcmp-sortable float to buf. It returns the number of bytes written (8, unless an error occurs), and any write error encountered.

func WriteInt ¶

func WriteInt(w io.ByteWriter, val int64) (int, error)

WriteInt val as a cmpbin Int to the ByteWriter. Returns the number of bytes written. Only returns an error if the underlying ByteWriter returns an error.

func WriteString ¶

func WriteString(buf io.ByteWriter, s string) (n int, err error)

WriteString writes an encoded string to buf, returning the number of bytes written, and any write error encountered.

func WriteUint ¶

func WriteUint(w io.ByteWriter, mag uint64) (int, error)

WriteUint writes mag to the ByteWriter. Returns the number of bytes written. Only returns an error if the underlying ByteWriter returns an error.

Types ¶

type InvertibleBytesBuffer ¶

type InvertibleBytesBuffer interface {
	WriteableBytesBuffer
	SetInvert(inverted bool)
}

InvertibleBytesBuffer is just like Buffer, except that it also has a stateful SetInvert() method, which will cause all reads and writes to/from it to be inverted (e.g. every byte XOR 0xFF).

In contexts where you need comparable byte sequences, like datastore queries, it requires manipulating the sortable fields (e.g. synthesizing them, parsing them, etc.). In particular, when you have a reverse-sorted field (e.g. high to low instead of low to high), it's achieved by having all the bits inverted.

Serialization formats (like gae/service/datastore.Serialize) can include delimiter information, which the parsers only know to parse non-inverted. If we don't have this Invertible buffer, we'd basically have to invert every byte in the []byte array when we're trying to decode a reverse-ordered field (including the bytes of all fields after the one we intend to parse) so that the parser can consume as many bytes as it needs (and it only knows the number of bytes it needs as it decodes them). This InvertibleBytesBuffer lets that happen on the fly without having to flip the whole underlying []byte.

If you know you need it, you'll know it's the right thing. If you're not sure then you definitely don't need it!

See InvertBytes for doing one-off []byte inversions.

func Invertible ¶

func Invertible(b WriteableBytesBuffer) InvertibleBytesBuffer

Invertible returns an InvertibleBuffer based on the Buffer.

type ReadableBytesBuffer ¶

type ReadableBytesBuffer interface {
	Len() int

	Read([]byte) (int, error)
	ReadByte() (byte, error)
}

ReadableBytesBuffer is the interface which corresponds to the subset of *bytes.Reader required for reading.

type WriteableBytesBuffer ¶

type WriteableBytesBuffer interface {
	ReadableBytesBuffer

	String() string
	Bytes() []byte

	Grow(int)

	Write([]byte) (int, error)
	WriteByte(c byte) error
	WriteString(s string) (int, error)
}

WriteableBytesBuffer is the interface which corresponds to the subset of *bytes.Buffer required for writing.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL