jsonl

package
v0.0.0-...-79c606f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 11, 2022 License: Apache-2.0 Imports: 10 Imported by: 1

Documentation

Overview

Package jsonl parses JSON Lines DataSources. This parser uses https://github.com/tidwall/gjson to process data, and supports Schema column names formatted as gjson paths.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func ParseJSONRow

func ParseJSONRow(accessors []sif.ColumnAccessor, prefixes []*string, valueHandlers []JSONValueHandler, rowJSON gjson.Result, row sif.Row) error

ParseJSONRow parses a json string into a Row, according to a schema

Types

type JSONValueHandler

type JSONValueHandler func(gjson.Result, sif.ColumnAccessor, sif.Row) error

JSONValueHandler is a function which converts a gjson.Result into a sif ColumnType value.

func BuildJSONValueHandlers

func BuildJSONValueHandlers(schema sif.Schema) ([]JSONValueHandler, error)

BuildJSONValueHandlers produces a sequence of functions which match the sequence of columns in schema. Each function is a precomputed handler capable of converting the JSON data in that column into data for the appropriate Sif ColumnType. This precomputation of which accessor type to cast to based on a given schema avoids a costly type switch for every column and row

type Parser

type Parser struct {
	// contains filtered or unexported fields
}

Parser produces partitions from JSONL data

func CreateParser

func CreateParser(conf *ParserConf) *Parser

CreateParser returns a new JSONL Parser. Columns are parsed lazily from each row of JSON using their column name, which should be a gjson path. Values within the JSON which do not correspond to a Schema column are ignored.

func (*Parser) Parse

func (p *Parser) Parse(r io.Reader, source sif.DataSource, schema sif.Schema, onIteratorEnd func()) (sif.PartitionIterator, error)

Parse parses JSONL data to produce Partitions

func (*Parser) PartitionSize

func (p *Parser) PartitionSize() int

PartitionSize returns the maximum size in rows of Partitions produced by this Parser

type ParserConf

type ParserConf struct {
	PartitionSize            int       // The maximum number of rows per Partition. Defaults to 128.
	HeaderLines              int       // The number of lines to ignore from the beginning of each file. Defaults to 0.
	Comment                  rune      // Lines beginning with the comment character are ignored. Cannot be equal to the Delimiter. Defaults to no comment character.
	MaxBufferSize            int       // Maximum size in bytes of the buffer used to read lines from the file
	ColumnNameSearchPrefixes []*string // An optional JSON path prefix to prepend to each column name when searching for that column
}

ParserConf configures a JSONL Parser, suitable for JSON lines data

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL