Documentation ¶
Overview ¶
Package parquet provides tools for data tables serialization to and from Parquet files - in the form of files on disk, memory buffer or io.Reader/io.Writer. Now read methods work pretty slowly with files having hundreds of columns. As a workaround for now, Read methods support specifying a subset of columns to read.
Current implementation ¶
Serialization to/from Parquet currently uses https://github.com/xitongsys/parquet-go library which supports serialization to/from a slice of structs only. Therefore currently data tables are converted to (reflectively created) structs, which is pretty slow.
Future development ¶
To make serialization faster, it would be beneficial to write Parquet files directly without using reflection. If some day a Go library for serializing Arrow to/from Parquet is written, we should use it instead of the current implementation.
Index ¶
- func TableFromBytes(bytes []byte, opts ...ReadOpt) (*data.Table, error)
- func TableFromFile(filePath string, opts ...ReadOpt) (*data.Table, error)
- func TableFromReader(reader io.Reader, opts ...ReadOpt) (*data.Table, error)
- func TableToBytes(table *data.Table, opts ...WriteOpt) ([]byte, error)
- func TableToFile(table *data.Table, filePath string, opts ...WriteOpt) error
- func TableToWriter(table *data.Table, writer io.Writer, opts ...WriteOpt) error
- type FileKeyValueMetadata
- type ReadOpt
- type WriteOpt
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func TableFromBytes ¶
TableFromBytes reads a data.Table eagerly from a memory buffer
func TableFromFile ¶
TableFromFile reads a data.Table eagerly from a Parquet file
func TableFromReader ¶
TableFromReader reads a data.Table eagerly from io.Reader.
func TableToBytes ¶
TableToBytes writes a data.Table to a memory buffer
func TableToFile ¶
TableToFile writes a data.Table to a file on disk
Types ¶
type FileKeyValueMetadata ¶
FileKeyValueMetadata represents keys in file-level Parquet metadata, as defined in: https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L924 . In the presence of duplicate keys in the file, behavior is undefined.
func (FileKeyValueMetadata) Read ¶
func (m FileKeyValueMetadata) Read() ReadOpt
Read gives a ReadOpt that populates this map as a side effect, when the file is read.
func (FileKeyValueMetadata) Write ¶
func (m FileKeyValueMetadata) Write() WriteOpt
Write gives a WriteOpt that writes all the given keyvalues into file metadata.
type ReadOpt ¶
type ReadOpt func(*readState) error
ReadOpt sets an optional behavior when reading parquet files.
func Columns ¶
func Columns(columnNames ...data.ColumnName) ReadOpt
Columns returns a ReadOpt that selects a subset of columns to read from the source. If no column names are specified, reads all the columns. This is an optimization for projections, that should not be needed in future (when lazy access patterns are possible).