Documentation ¶
Overview ¶
Package jsonl parses JSON Lines DataSources. This parser uses https://github.com/tidwall/gjson to process data, and supports Schema column names formatted as gjson paths.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func ParseJSONRow ¶
func ParseJSONRow(accessors []sif.ColumnAccessor, prefixes []*string, valueHandlers []JSONValueHandler, rowJSON gjson.Result, row sif.Row) error
ParseJSONRow parses a json string into a Row, according to a schema
Types ¶
type JSONValueHandler ¶
JSONValueHandler is a function which converts a gjson.Result into a sif ColumnType value.
func BuildJSONValueHandlers ¶
func BuildJSONValueHandlers(schema sif.Schema) ([]JSONValueHandler, error)
BuildJSONValueHandlers produces a sequence of functions which match the sequence of columns in schema. Each function is a precomputed handler capable of converting the JSON data in that column into data for the appropriate Sif ColumnType. This precomputation of which accessor type to cast to based on a given schema avoids a costly type switch for every column and row
type Parser ¶
type Parser struct {
// contains filtered or unexported fields
}
Parser produces partitions from JSONL data
func CreateParser ¶
func CreateParser(conf *ParserConf) *Parser
CreateParser returns a new JSONL Parser. Columns are parsed lazily from each row of JSON using their column name, which should be a gjson path. Values within the JSON which do not correspond to a Schema column are ignored.
func (*Parser) Parse ¶
func (p *Parser) Parse(r io.Reader, source sif.DataSource, schema sif.Schema, onIteratorEnd func()) (sif.PartitionIterator, error)
Parse parses JSONL data to produce Partitions
func (*Parser) PartitionSize ¶
PartitionSize returns the maximum size in rows of Partitions produced by this Parser
type ParserConf ¶
type ParserConf struct { PartitionSize int // The maximum number of rows per Partition. Defaults to 128. HeaderLines int // The number of lines to ignore from the beginning of each file. Defaults to 0. Comment rune // Lines beginning with the comment character are ignored. Cannot be equal to the Delimiter. Defaults to no comment character. MaxBufferSize int // Maximum size in bytes of the buffer used to read lines from the file ColumnNameSearchPrefixes []*string // An optional JSON path prefix to prepend to each column name when searching for that column }
ParserConf configures a JSONL Parser, suitable for JSON lines data