parquet

package

v0.0.0-...-7924348 Latest Latest Go to latest Published: Sep 4, 2020 License: MIT Imports: 14 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/bsm/parquet-go

Links

Open Source Insights

Documentation ¶

Index ¶

Variables
func ReadFileMetaData(r io.ReadSeeker) (*parquetformat.FileMetaData, error)
type Column
type ColumnChunkReader
type File
- func FileFromReader(r io.ReadSeeker) (*File, error)
- func OpenFile(path string) (*File, error)
- func (f *File) Close() error
- func (f File) NewReader(col Column, rg int) (*ColumnChunkReader, error)
type Int96
type Schema
- func MakeSchema(meta *parquetformat.FileMetaData) (Schema, error)

Constants ¶

This section is empty.

Variables ¶

View Source

var (
	EndOfChunk = errors.New("EndOfChunk")
)

Functions ¶

func ReadFileMetaData ¶

func ReadFileMetaData(r io.ReadSeeker) (*parquetformat.FileMetaData, error)

ReadFileMetaData reads parquetformat.FileMetaData object from r that provides read interface to data in parquet format.

Parquet format is described here: https://github.com/apache/parquet-format/blob/master/README.md

Types ¶

type Column ¶

type Column struct {
	// contains filtered or unexported fields
}

Column contains information about a single column in a parquet file.

func (Column) Index ¶

func (col Column) Index() int

Index is a 0-based index of col in its schema.

Column chunks in a row group have the same order as columns in the schema.

func (Column) MaxD ¶

func (col Column) MaxD() uint16

MaxD returns the maximum definition level for col.

A read value is not null when its definition level equals to the maximum definition level.

func (Column) MaxR ¶

func (col Column) MaxR() uint16

MaxR returns the maximum repetition level for col.

func (Column) String ¶

func (col Column) String() string

func (Column) Type ¶

func (col Column) Type() parquetformat.Type

Type returns type of col values.

type ColumnChunkReader ¶

type ColumnChunkReader struct {
	// contains filtered or unexported fields
}

ColumnChunkReader allows to read data from a single column chunk of a parquet file.

func (*ColumnChunkReader) DictionaryPageHeader ¶

func (cr *ColumnChunkReader) DictionaryPageHeader() *parquetformat.PageHeader

DictionaryPageHeader returns a DICTIONARY_PAGE page header if the column chunk has one or nil otherwise.

func (*ColumnChunkReader) PageHeader ¶

func (cr *ColumnChunkReader) PageHeader() *parquetformat.PageHeader

PageHeader returns PageHeader of a page that is about to be read or currently being read.

If there was an error reading the last page (including EndOfChunk) PageHeder returns nil.

func (*ColumnChunkReader) Read ¶

func (cr *ColumnChunkReader) Read(values interface{}, dLevels []uint16, rLevels []uint16) (n int, err error)

Read reads up to len(dLevels) values into values and corresponding definition and repetition levels into dLevels and rLevels respectfully. Panics if len(dLevels) != len(rLevels) != len(values). It returns the number of values read (including nulls) and any errors encountered.

Note that after Read values slice contains only non-null values. Number of these values could be less than n.

values must be a slice of interface{} or type that corresponds to the column type (such as []int32 for INT32 column or [][]byte for BYTE_ARRAY column).

When there is not enough values in the current page to fill dLevels Read doesn't advance to the next page and returns the number of values read. If this page was the last page in its column chunk and there is no more data to read it returns EndOfChunk error.

func (*ColumnChunkReader) SkipPage ¶

func (cr *ColumnChunkReader) SkipPage() error

SkipPage positions cr at the beginning of the next page skipping all values in the current page.

Returns EndOfChunk if no more data available

type File ¶

type File struct {
	MetaData *parquetformat.FileMetaData
	Schema   Schema
	// contains filtered or unexported fields
}

func FileFromReader ¶

func FileFromReader(r io.ReadSeeker) (*File, error)

FileFromReader creates parquet.File from io.ReadSeeker.

func OpenFile ¶

func OpenFile(path string) (*File, error)

OpenFile opens a parquet file for reading.

func (*File) Close ¶

func (f *File) Close() error

Close frees up all resources held by f.

func (File) NewReader ¶

func (f File) NewReader(col Column, rg int) (*ColumnChunkReader, error)

NewReader creates a ColumnChunkReader for readng a single column chunk for column col from a row group rg.

type Int96 ¶

type Int96 [12]byte

type Schema ¶

type Schema struct {
	// contains filtered or unexported fields
}

Schema describes structure of the data that is stored in a parquet file.

A Schema can be created from a parquetformat.FileMetaData. Information that is stored in RowGroups part of FileMetaData is not needed for the schema creation.

TODO(ksh): provide a way to read FileMetaData without RowGroups.

Usually FileMetaData should be read from the same file as data. When data is split into multiple parquet files metadata can be stored in a separate file. Usually this file is called "_common_metadata".

func MakeSchema ¶

func MakeSchema(meta *parquetformat.FileMetaData) (Schema, error)

MakeSchema creates a Schema from meta.

func (Schema) ColumnByName ¶

func (s Schema) ColumnByName(name string) (col Column, found bool)

ColumnByName returns a Column with the given name (individual elements are separated with ".").

func (Schema) ColumnByPath ¶

func (s Schema) ColumnByPath(path []string) (col Column, found bool)

ColumnByPath returns a Column for the given path.

func (Schema) Columns ¶

func (s Schema) Columns() []Column

Columns returns all columns defined in s.

func (Schema) DisplayString ¶

func (s Schema) DisplayString() string

DisplayString returns a string representation of s using textual format similar to that described in the Dremel paper and used by parquet-mr project.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL