cockroach: Index | Files | Directories

package colserde

import ""


Package Files

arrowbatchconverter.go file.go record_batch.go

type ArrowBatchConverter Uses

type ArrowBatchConverter struct {
    // contains filtered or unexported fields

ArrowBatchConverter converts batches to arrow column data ([]*array.Data) and back again.

func NewArrowBatchConverter Uses

func NewArrowBatchConverter(typs []*types.T) (*ArrowBatchConverter, error)

NewArrowBatchConverter converts coldata.Batches to []*array.Data and back again according to the schema specified by typs. Converting data that does not conform to typs results in undefined behavior.

func (*ArrowBatchConverter) ArrowToBatch Uses

func (c *ArrowBatchConverter) ArrowToBatch(data []*array.Data, b coldata.Batch) error

ArrowToBatch converts []*array.Data to a coldata.Batch. There must not be more than coldata.BatchSize() elements in data. It's safe to call ArrowToBatch concurrently.

The passed in batch is overwritten, but after this method returns it stays valid as long as `data` stays valid. Callers can use this to control the lifetimes of the batches, saving allocations when they can be reused (i.e. reused by passing them back into this function).

The passed in data is also mutated (we store nulls differently than arrow and the adjustment is done in place).

func (*ArrowBatchConverter) BatchToArrow Uses

func (c *ArrowBatchConverter) BatchToArrow(batch coldata.Batch) ([]*array.Data, error)

BatchToArrow converts the first batch.Length elements of the batch into an arrow []*array.Data. It is assumed that the batch is not larger than coldata.BatchSize(). The returned []*array.Data may only be used until the next call to BatchToArrow.

type FileDeserializer Uses

type FileDeserializer struct {
    // contains filtered or unexported fields

FileDeserializer decodes columnar data batches from files encoded according to the arrow spec.

func NewFileDeserializerFromBytes Uses

func NewFileDeserializerFromBytes(typs []*types.T, buf []byte) (*FileDeserializer, error)

NewFileDeserializerFromBytes constructs a FileDeserializer for an in-memory buffer.

func NewFileDeserializerFromPath Uses

func NewFileDeserializerFromPath(typs []*types.T, path string) (*FileDeserializer, error)

NewFileDeserializerFromPath constructs a FileDeserializer by reading it from a file.

func (*FileDeserializer) Close Uses

func (d *FileDeserializer) Close() error

Close releases any resources held by this deserializer.

func (*FileDeserializer) GetBatch Uses

func (d *FileDeserializer) GetBatch(batchIdx int, b coldata.Batch) error

GetBatch fills in the given in-mem batch with the requested on-disk data.

func (*FileDeserializer) NumBatches Uses

func (d *FileDeserializer) NumBatches() int

NumBatches returns the number of record batches stored in this file.

func (*FileDeserializer) Typs Uses

func (d *FileDeserializer) Typs() []*types.T

Typs returns the in-memory types for the data stored in this file.

type FileSerializer Uses

type FileSerializer struct {
    // contains filtered or unexported fields

FileSerializer converts our in-mem columnar batch representation into the arrow specification's file format. All batches serialized to a file must have the same schema.

func NewFileSerializer Uses

func NewFileSerializer(w io.Writer, typs []*types.T) (*FileSerializer, error)

NewFileSerializer creates a FileSerializer for the given types. The caller is responsible for closing the given writer.

func (*FileSerializer) AppendBatch Uses

func (s *FileSerializer) AppendBatch(batch coldata.Batch) error

AppendBatch adds one batch of columnar data to the file.

func (*FileSerializer) Finish Uses

func (s *FileSerializer) Finish() error

Finish writes the footer metadata described by the arrow spec. Nothing can be called after Finish except Reset.

func (*FileSerializer) Reset Uses

func (s *FileSerializer) Reset(w io.Writer) error

Reset can be called to reuse this FileSerializer with a new io.Writer after calling Finish. The types will remain the ones passed to the constructor. The caller is responsible for closing the given writer.

type RecordBatchSerializer Uses

type RecordBatchSerializer struct {
    // contains filtered or unexported fields

RecordBatchSerializer serializes RecordBatches in the standard Apache Arrow IPC format using flatbuffers. Note that only RecordBatch messages are supported. This is because the full spec would be too much to support (support for DictionaryBatches, Tensors, SparseTensors, and Schema messages would be needed) and we only need the part of the spec that allows us to send data. The IPC format is described here:

func NewRecordBatchSerializer Uses

func NewRecordBatchSerializer(typs []*types.T) (*RecordBatchSerializer, error)

NewRecordBatchSerializer creates a new RecordBatchSerializer according to typs. Note that Serializing or Deserializing data that does not follow the passed in schema results in undefined behavior.

func (*RecordBatchSerializer) Deserialize Uses

func (s *RecordBatchSerializer) Deserialize(data *[]*array.Data, bytes []byte) error

Deserialize deserializes an arrow IPC RecordBatch message contained in bytes into data. Deserializing a schema that does not match the schema given in NewRecordBatchSerializer results in undefined behavior.

func (*RecordBatchSerializer) Serialize Uses

func (s *RecordBatchSerializer) Serialize(
    w io.Writer, data []*array.Data,
) (metadataLen uint32, dataLen uint64, _ error)

Serialize serializes data as an arrow RecordBatch message and writes it to w. Serializing a schema that does not match the schema given in NewRecordBatchSerializer results in undefined behavior.


arrowserdePackage arrowserde contains the flatbuffer generated code used for Apache Arrow serialization (and some small helpers associated with the generated code).

Package colserde imports 20 packages (graph) and is imported by 4 packages. Updated 2020-07-31. Refresh now. Tools for package owners.