parquet

package
v0.0.0-...-040724e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jun 6, 2019 License: BSD-3-Clause, GPL-2.0, BSD-3-Clause, + 1 more Imports: 16 Imported by: 2

Documentation

Overview

Package parquet provides tools for data tables serialization to and from Parquet files - in the form of files on disk, memory buffer or io.Reader/io.Writer. Now read methods work pretty slowly with files having hundreds of columns. As a workaround for now, Read methods support specifying a subset of columns to read.

Current implementation

Serialization to/from Parquet currently uses https://github.com/xitongsys/parquet-go library which supports serialization to/from a slice of structs only. Therefore currently data tables are converted to (reflectively created) structs, which is pretty slow.

Future development

To make serialization faster, it would be beneficial to write Parquet files directly without using reflection. If some day a Go library for serializing Arrow to/from Parquet is written, we should use it instead of the current implementation.

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func TableFromBytes

func TableFromBytes(bytes []byte, opts ...ReadOpt) (*data.Table, error)

TableFromBytes reads a data.Table eagerly from a memory buffer

func TableFromFile

func TableFromFile(filePath string, opts ...ReadOpt) (*data.Table, error)

TableFromFile reads a data.Table eagerly from a Parquet file

func TableFromReader

func TableFromReader(reader io.Reader, opts ...ReadOpt) (*data.Table, error)

TableFromReader reads a data.Table eagerly from io.Reader.

func TableToBytes

func TableToBytes(table *data.Table, opts ...WriteOpt) ([]byte, error)

TableToBytes writes a data.Table to a memory buffer

func TableToFile

func TableToFile(table *data.Table, filePath string, opts ...WriteOpt) error

TableToFile writes a data.Table to a file on disk

func TableToWriter

func TableToWriter(table *data.Table, writer io.Writer, opts ...WriteOpt) error

TableToWriter writes a data.Table to io.Writer

Types

type FileKeyValueMetadata

type FileKeyValueMetadata map[string]string

FileKeyValueMetadata represents keys in file-level Parquet metadata, as defined in: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L924 . In the presence of duplicate keys in the file, behavior is undefined.

func (FileKeyValueMetadata) Read

func (m FileKeyValueMetadata) Read() ReadOpt

Read gives a ReadOpt that populates this map as a side effect, when the file is read.

func (FileKeyValueMetadata) Write

func (m FileKeyValueMetadata) Write() WriteOpt

Write gives a WriteOpt that writes all the given keyvalues into file metadata.

type ReadOpt

type ReadOpt func(*readState) error

ReadOpt sets an optional behavior when reading parquet files.

func Columns

func Columns(columnNames ...data.ColumnName) ReadOpt

Columns returns a ReadOpt that selects a subset of columns to read from the source. If no column names are specified, reads all the columns. This is an optimization for projections, that should not be needed in future (when lazy access patterns are possible).

type WriteOpt

type WriteOpt func(*writeState) error

WriteOpt sets an optional behavior when creating parquet files.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL