dsio

package
v0.3.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 4, 2021 License: MIT Imports: 18 Imported by: 18

README

Performance

2018-12-04

go test github.com/qri-io/dataset/dsio -bench=.

BenchmarkCBORWriterArrays-2    	    3000	    431290 ns/op
BenchmarkCBORWriterObjects-2   	    2000	    698920 ns/op
BenchmarkCBORReader-2          	    1000	   1764549 ns/op
BenchmarkCSVWriterArrays-2     	    1000	   1548509 ns/op
BenchmarkCSVWriterObjects-2    	    1000	   1458219 ns/op
BenchmarkCSVReader-2           	    1000	   2008097 ns/op
BenchmarkJSONWriterArrays-2    	    1000	   1556416 ns/op
BenchmarkJSONWriterObjects-2   	    1000	   1562488 ns/op
BenchmarkJSONReader-2          	     500	   2984057 ns/op

2018-04-17

go test github.com/qri-io/dataset/dsio -bench=.

BenchmarkCBORWriterArrays-2    	    3000	    478424 ns/op
BenchmarkCBORWriterObjects-2   	    2000	    584435 ns/op
BenchmarkCBORReader-2          	     300	   5081171 ns/op
BenchmarkCSVWriterArrays-2     	    1000	   1369984 ns/op
BenchmarkCSVWriterObjects-2    	    1000	   1406440 ns/op
BenchmarkCSVReader-2           	    1000	   1463376 ns/op
BenchmarkJSONWriterArrays-2    	    1000	   1377027 ns/op
BenchmarkJSONWriterObjects-2   	    1000	   1558887 ns/op
BenchmarkJSONReader-2          	     500	   2607946 ns/op

2018-03-29

go test github.com/qri-io/dataset/dsio -bench=.

BenchmarkCBORWriterArrays-2    	    3000	    423851 ns/op
BenchmarkCBORWriterObjects-2   	    2000	    572609 ns/op
BenchmarkCBORReader-2          	     300	   5024830 ns/op
BenchmarkCSVWriterArrays-2     	    1000	   1448891 ns/op
BenchmarkCSVWriterObjects-2    	    1000	   1457973 ns/op
BenchmarkCSVReader-2           	    1000	   1454932 ns/op
BenchmarkJSONWriterArrays-2    	    1000	   1423156 ns/op
BenchmarkJSONWriterObjects-2   	    1000	   1620801 ns/op
BenchmarkJSONReader-2          	     300	   5286851 ns/op

Fuzz testing

From: https://medium.com/@dgryski/go-fuzz-github-com-arolek-ase-3c74d5a3150c

How to fuzz test:

go install github.com/qri-io/dataset/use_generate
cd $GOPATH
mkdir out
bin/use_generate
cp $GOPATH/out/* workdir/corpus/.

go get github.com/dvyukov/go-fuzz/go-fuzz
go get github.com/dvyukov/go-fuzz/go-fuzz-build
go install github.com/dvyukov/go-fuzz/go-fuzz
go install github.com/dvyukov/go-fuzz/go-fuzz-build

go-fuzz-build github.com/qri-io/dataset/dsio
go-fuzz -bin=dsio-fuzz.zip -workdir=workdir

Documentation

Overview

Package dsio defines writers & readers for operating on "container" data structures (objects and arrays)

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ColIndexToLetters

func ColIndexToLetters(colRef int) string

ColIndexToLetters is used to convert a zero based, numeric column indentifier into a character code.

func ConvertFile added in v0.3.0

func ConvertFile(file qfs.File, in, out *dataset.Structure, limit, offset int, all bool) (data []byte, err error)

ConvertFile takes an input file & structure, and converts a specified selection to the structure specified by out

func Copy

func Copy(reader EntryReader, writer EntryWriter) error

Copy reads all entries from the reader and writes them to the writer

func EachEntry

func EachEntry(rr EntryReader, fn DataIteratorFunc) error

EachEntry calls fn on each row of a given EntryReader

func Fuzz

func Fuzz(data []byte) int

Fuzz is the entry-point for go-fuzz. Return 1 for a successful parse and 0 for failures.

func GetTopLevelType

func GetTopLevelType(st *dataset.Structure) (string, error)

GetTopLevelType returns the top-level type of the structure, only if it is a valid type ("array" or "object"), otherwise returns an error

func HasHeaderRow

func HasHeaderRow(st *dataset.Structure) bool

HasHeaderRow checks Structure for the presence of the HeaderRow flag

func ReadAll added in v0.3.0

func ReadAll(r EntryReader) (interface{}, error)

ReadAll consumes an EntryReader, returning it's values

func ReadAllArray added in v0.3.0

func ReadAllArray(r EntryReader) ([]interface{}, error)

ReadAllArray consumes an EntryReader with an "array" top level type, returning a map[string]interface{} of values

func ReadAllObject added in v0.3.0

func ReadAllObject(r EntryReader) (map[string]interface{}, error)

ReadAllObject consumes an EntryReader with an "object" top level type, returning a map[string]interface{} of values

Types

type CBORReader

type CBORReader struct {
	// contains filtered or unexported fields
}

CBORReader implements the RowReader interface for the CBOR data format

func NewCBORReader

func NewCBORReader(st *dataset.Structure, r io.Reader) (*CBORReader, error)

NewCBORReader creates a reader from a structure and read source

func (*CBORReader) Close

func (r *CBORReader) Close() error

Close finalizes the reader

func (*CBORReader) ReadEntry

func (r *CBORReader) ReadEntry() (ent Entry, err error)

ReadEntry reads one CBOR record from the reader

func (*CBORReader) Structure

func (r *CBORReader) Structure() *dataset.Structure

Structure gives this writer's structure

type CBORWriter

type CBORWriter struct {
	// contains filtered or unexported fields
}

CBORWriter implements the RowWriter interface for CBOR-formatted data

func NewCBORWriter

func NewCBORWriter(st *dataset.Structure, w io.Writer) (*CBORWriter, error)

NewCBORWriter creates a Writer from a structure and write destination

func (*CBORWriter) Close

func (w *CBORWriter) Close() error

Close finalizes the writer, indicating no more records will be written

func (*CBORWriter) Structure

func (w *CBORWriter) Structure() *dataset.Structure

Structure gives this writer's structure

func (*CBORWriter) WriteEntry

func (w *CBORWriter) WriteEntry(ent Entry) error

WriteEntry writes one CBOR record to the writer

type CSVReader

type CSVReader struct {
	// contains filtered or unexported fields
}

CSVReader implements the RowReader interface for the CSV data format

func NewCSVReader

func NewCSVReader(st *dataset.Structure, r io.Reader) (*CSVReader, error)

NewCSVReader creates a reader from a structure and read source

func NewCSVReaderSize added in v0.2.0

func NewCSVReaderSize(st *dataset.Structure, r io.Reader, size int) (*CSVReader, error)

NewCSVReaderSize creates a reader from a structure, read source, and buffer size

func (*CSVReader) Close

func (r *CSVReader) Close() error

Close finalizes the reader

func (*CSVReader) ReadEntry

func (r *CSVReader) ReadEntry() (Entry, error)

ReadEntry reads one CSV record from the reader

func (*CSVReader) Structure

func (r *CSVReader) Structure() *dataset.Structure

Structure gives this reader's structure

type CSVWriter

type CSVWriter struct {
	// contains filtered or unexported fields
}

CSVWriter implements the RowWriter interface for CSV-formatted data

func NewCSVWriter

func NewCSVWriter(st *dataset.Structure, w io.Writer) (*CSVWriter, error)

NewCSVWriter creates a Writer from a structure and write destination

func (*CSVWriter) Close

func (w *CSVWriter) Close() error

Close finalizes the writer, indicating no more records will be written

func (*CSVWriter) Structure

func (w *CSVWriter) Structure() *dataset.Structure

Structure gives this writer's structure

func (*CSVWriter) WriteEntry

func (w *CSVWriter) WriteEntry(ent Entry) error

WriteEntry writes one CSV record to the writer

type DataIteratorFunc

type DataIteratorFunc func(int, Entry, error) error

DataIteratorFunc is a function for each "row" of a resource's raw data

type Entry

type Entry struct {
	// Index represents this entry's numeric position in a dataset
	// this index may not necessarily refer to the overall position within the dataset
	// as things like offsets affect where the index begins
	Index int
	// Key is a string key for this entry
	// only present when the top level structure is a map
	Key string
	// Value is information contained within the row
	Value interface{}
}

Entry is a "row" of a dataset

type EntryBuffer

type EntryBuffer struct {
	// contains filtered or unexported fields
}

EntryBuffer mimics the behaviour of bytes.Buffer, but with structured Dataa Read and Write are replaced with ReadEntry and WriteEntry. It's worth noting that different data formats have idisyncrcies that affect the behavior of buffers and their output. For example, EntryBuffer won't write things like CSV header rows or enclosing JSON arrays until after the writer's Close method has been called.

func NewEntryBuffer

func NewEntryBuffer(st *dataset.Structure) (*EntryBuffer, error)

NewEntryBuffer allocates a buffer, buffers should always be created with NewEntryBuffer, which will error if the provided structure is invalid for reading / writing

func (*EntryBuffer) Bytes

func (b *EntryBuffer) Bytes() []byte

Bytes gives the raw contents of the underlying buffer

func (*EntryBuffer) Close

func (b *EntryBuffer) Close() error

Close closes the writer portion of the buffer, which will affect underlying contents.

func (*EntryBuffer) ReadEntry

func (b *EntryBuffer) ReadEntry() (Entry, error)

ReadEntry reads one "row" from the buffer

func (*EntryBuffer) Structure

func (b *EntryBuffer) Structure() *dataset.Structure

Structure gives the underlying structure this buffer is using

func (*EntryBuffer) WriteEntry

func (b *EntryBuffer) WriteEntry(e Entry) error

WriteEntry writes one "row" to the buffer

type EntryReadWriter

type EntryReadWriter interface {
	// Structure gives the structure being read and written
	Structure() *dataset.Structure
	// ReadVal reads one row of structured data from the reader
	ReadEntry() (Entry, error)
	// WriteEntry writes one row of structured data to the ReadWriter
	WriteEntry(Entry) error
	// Close finalizes the ReadWriter, indicating all entries
	// have been written
	Close() error
	// Bytes gives the raw contents of the ReadWriter
	Bytes() []byte
}

EntryReadWriter combines EntryWriter and EntryReader behaviors

type EntryReader

type EntryReader interface {
	// Structure gives the structure being read
	Structure() *dataset.Structure
	// ReadVal reads one row of structured data from the reader
	ReadEntry() (Entry, error)
	// Close finalizes the Reader
	Close() error
}

EntryReader is a generalized interface for reading Ordered Structured Data

func NewEntryReader

func NewEntryReader(st *dataset.Structure, r io.Reader) (EntryReader, error)

NewEntryReader allocates a EntryReader based on a given structure

type EntryWriter

type EntryWriter interface {
	// Structure gives the structure being written
	Structure() *dataset.Structure
	// WriteEntry writes one "row" of structured data to the Writer
	WriteEntry(Entry) error
	// Close finalizes the writer, indicating all entries
	// have been written
	Close() error
}

EntryWriter is a generalized interface for writing structured data

func NewEntryWriter

func NewEntryWriter(st *dataset.Structure, w io.Writer) (EntryWriter, error)

NewEntryWriter allocates a EntryWriter based on a given structure

type IdentityReader

type IdentityReader struct {
	// contains filtered or unexported fields
}

IdentityReader is a dsio.EntryReader that works with native go types

func NewIdentityReader

func NewIdentityReader(st *dataset.Structure, data interface{}) (*IdentityReader, error)

NewIdentityReader creates an EntryReader from native go types, passed in data must be of type []interface{} or map[string]interface{}

func (*IdentityReader) Close

func (r *IdentityReader) Close() error

Close finalizes the reader

func (*IdentityReader) ReadEntry

func (r *IdentityReader) ReadEntry() (Entry, error)

ReadEntry reads one row of structured data from the reader

func (*IdentityReader) Structure

func (r *IdentityReader) Structure() *dataset.Structure

Structure gives the structure being read

type IdentityWriter

type IdentityWriter struct {
	// contains filtered or unexported fields
}

IdentityWriter is a dsio.EntryWriter that works with native go types

func (*IdentityWriter) Close

func (w *IdentityWriter) Close() error

Close finalizes the writer, indicating all entries have been written

func (*IdentityWriter) Structure

func (w *IdentityWriter) Structure() *dataset.Structure

Structure gives the structure being written

func (*IdentityWriter) WriteEntry

func (w *IdentityWriter) WriteEntry(e Entry) error

WriteEntry writes one "row" of structured data to the Writer

type JSONReader

type JSONReader struct {
	// contains filtered or unexported fields
}

JSONReader implements the RowReader interface for the JSON data format

func NewJSONReader

func NewJSONReader(st *dataset.Structure, r io.Reader) (*JSONReader, error)

NewJSONReader creates a reader from a structure and read source

func NewJSONReaderSize

func NewJSONReaderSize(st *dataset.Structure, r io.Reader, size int) (*JSONReader, error)

NewJSONReaderSize creates a reader from a structure, read source, and buffer size

func (*JSONReader) Close

func (r *JSONReader) Close() error

Close finalizes the reader

func (*JSONReader) ReadEntry

func (r *JSONReader) ReadEntry() (Entry, error)

ReadEntry reads one JSON record from the reader

func (*JSONReader) Structure

func (r *JSONReader) Structure() *dataset.Structure

Structure gives this writer's structure

type JSONWriter

type JSONWriter struct {
	// contains filtered or unexported fields
}

JSONWriter implements the RowWriter interface for JSON-formatted data

func NewJSONPrettyWriter added in v0.1.4

func NewJSONPrettyWriter(st *dataset.Structure, w io.Writer, indent string) (*JSONWriter, error)

NewJSONPrettyWriter creates a Writer that writes pretty indented JSON

func NewJSONWriter

func NewJSONWriter(st *dataset.Structure, w io.Writer) (*JSONWriter, error)

NewJSONWriter creates a Writer from a structure and write destination

func (*JSONWriter) Close

func (w *JSONWriter) Close() error

Close finalizes the writer, indicating no more records will be written

func (*JSONWriter) Structure

func (w *JSONWriter) Structure() *dataset.Structure

Structure gives this writer's structure

func (*JSONWriter) WriteEntry

func (w *JSONWriter) WriteEntry(ent Entry) error

WriteEntry writes one JSON record to the writer

type PagedReader

type PagedReader struct {
	Reader EntryReader
	Limit  int
	Offset int
}

PagedReader wraps a reader, starting reads from offset, and only reads limit number of entries

func (*PagedReader) Close

func (r *PagedReader) Close() error

Close finalizes the writer, indicating no more records will be written

func (*PagedReader) ReadEntry

func (r *PagedReader) ReadEntry() (Entry, error)

ReadEntry returns an entry, taking offset and limit into account

func (*PagedReader) Structure

func (r *PagedReader) Structure() *dataset.Structure

Structure returns the wrapped reader's structure

type TrackedReader

type TrackedReader struct {
	// contains filtered or unexported fields
}

TrackedReader wraps a reader, keeping an internal count of the bytes read

func NewTrackedReader

func NewTrackedReader(r io.Reader) *TrackedReader

NewTrackedReader creates a new tracked reader

func (*TrackedReader) BytesRead

func (tr *TrackedReader) BytesRead() int

BytesRead gives the total number of bytes read from the underlying reader

func (*TrackedReader) Read

func (tr *TrackedReader) Read(p []byte) (n int, err error)

Read implements the io.Reader interface

type XLSXReader

type XLSXReader struct {
	// contains filtered or unexported fields
}

XLSXReader implements the RowReader interface for the XLSX data format

func NewXLSXReader

func NewXLSXReader(st *dataset.Structure, r io.Reader) (*XLSXReader, error)

NewXLSXReader creates a reader from a structure and read source

func (*XLSXReader) Close

func (r *XLSXReader) Close() error

Close finalizes the writer, indicating no more records will be read

func (*XLSXReader) ReadEntry

func (r *XLSXReader) ReadEntry() (Entry, error)

ReadEntry reads one XLSX record from the reader

func (*XLSXReader) Structure

func (r *XLSXReader) Structure() *dataset.Structure

Structure gives this reader's structure

type XLSXWriter

type XLSXWriter struct {
	// contains filtered or unexported fields
}

XLSXWriter implements the RowWriter interface for XLSX-formatted data

func NewXLSXWriter

func NewXLSXWriter(st *dataset.Structure, w io.Writer) (*XLSXWriter, error)

NewXLSXWriter creates a Writer from a structure and write destination

func (*XLSXWriter) Close

func (w *XLSXWriter) Close() error

Close finalizes the writer, indicating no more records will be written

func (*XLSXWriter) Structure

func (w *XLSXWriter) Structure() *dataset.Structure

Structure gives this writer's structure

func (*XLSXWriter) WriteEntry

func (w *XLSXWriter) WriteEntry(ent Entry) error

WriteEntry writes one XLSX record to the writer

Directories

Path Synopsis
Package replacecr defines a wrapper for replacing solo carriage return characters (\r) with carriage-return + line feed (\r\n)
Package replacecr defines a wrapper for replacing solo carriage return characters (\r) with carriage-return + line feed (\r\n)

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL