cockroach: github.com/cockroachdb/cockroach/pkg/col/colserde Index | Files | Directories

package colserde

import "github.com/cockroachdb/cockroach/pkg/col/colserde"

Index

Package Files

arrowbatchconverter.go diskqueue.go file.go record_batch.go

type ArrowBatchConverter Uses

type ArrowBatchConverter struct {
    // contains filtered or unexported fields
}

ArrowBatchConverter converts batches to arrow column data ([]*array.Data) and back again.

func NewArrowBatchConverter Uses

func NewArrowBatchConverter(typs []coltypes.T) (*ArrowBatchConverter, error)

NewArrowBatchConverter converts coldata.Batches to []*array.Data and back again according to the schema specified by typs. Converting data that does not conform to typs results in undefined behavior.

func (*ArrowBatchConverter) ArrowToBatch Uses

func (c *ArrowBatchConverter) ArrowToBatch(data []*array.Data, b coldata.Batch) error

ArrowToBatch converts []*array.Data to a coldata.Batch. There must not be more than coldata.BatchSize() elements in data. It's safe to call ArrowToBatch concurrently.

The passed in batch is overwritten, but after this method returns it stays valid as long as `data` stays valid. Callers can use this to control the lifetimes of the batches, saving allocations when they can be reused (i.e. reused by passing them back into this function).

The passed in data is also mutated (we store nulls differently than arrow and the adjustment is done in place).

func (*ArrowBatchConverter) BatchToArrow Uses

func (c *ArrowBatchConverter) BatchToArrow(batch coldata.Batch) ([]*array.Data, error)

BatchToArrow converts the first batch.Length elements of the batch into an arrow []*array.Data. It is assumed that the batch is not larger than coldata.BatchSize(). The returned []*array.Data may only be used until the next call to BatchToArrow.

type DiskQueueCfg Uses

type DiskQueueCfg struct {
    Typs []coltypes.T
    // Path is where the temporary files should be created.
    Path string
    // Dir is the directory that will be created to store all of a Queue's files.
    // This directory will be removed on Close.
    Dir string
    // BufferSizeBytes is the number of bytes to buffer before compressing and
    // writing to disk.
    BufferSizeBytes int
    // MaxFileSizeBytes is the maximum size an on-disk file should reach before
    // rolling over to a new one.
    MaxFileSizeBytes int

    // TestingKnobs are used to test the queue implementation.
    TestingKnobs struct {
        // AlwaysCompress, if true, will skip a check that determines whether
        // compression is used for a given write or not given the percentage size
        // improvement. This allows us to test compression.
        AlwaysCompress bool
    }
}

DiskQueueCfg is a struct holding the configuration options for a DiskQueue.

func (*DiskQueueCfg) EnsureDefaults Uses

func (cfg *DiskQueueCfg) EnsureDefaults() error

EnsureDefaults ensures that optional fields are set to reasonable defaults. If any necessary options have been elided, an error is returned.

type FileDeserializer Uses

type FileDeserializer struct {
    // contains filtered or unexported fields
}

FileDeserializer decodes columnar data batches from files encoded according to the arrow spec.

func NewFileDeserializerFromBytes Uses

func NewFileDeserializerFromBytes(buf []byte) (*FileDeserializer, error)

NewFileDeserializerFromBytes constructs a FileDeserializer for an in-memory buffer.

func NewFileDeserializerFromPath Uses

func NewFileDeserializerFromPath(path string) (*FileDeserializer, error)

NewFileDeserializerFromPath constructs a FileDeserializer by reading it from a file.

func (*FileDeserializer) Close Uses

func (d *FileDeserializer) Close() error

Close releases any resources held by this deserializer.

func (*FileDeserializer) GetBatch Uses

func (d *FileDeserializer) GetBatch(batchIdx int, b coldata.Batch) error

GetBatch fills in the given in-mem batch with the requested on-disk data.

func (*FileDeserializer) NumBatches Uses

func (d *FileDeserializer) NumBatches() int

NumBatches returns the number of record batches stored in this file.

func (*FileDeserializer) Typs Uses

func (d *FileDeserializer) Typs() []coltypes.T

Typs returns the in-memory columnar types for the data stored in this file.

type FileSerializer Uses

type FileSerializer struct {
    // contains filtered or unexported fields
}

FileSerializer converts our in-mem columnar batch representation into the arrow specification's file format. All batches serialized to a file must have the same schema.

func NewFileSerializer Uses

func NewFileSerializer(w io.Writer, typs []coltypes.T) (*FileSerializer, error)

NewFileSerializer creates a FileSerializer for the given coltypes. The caller is responsible for closing the given writer.

func (*FileSerializer) AppendBatch Uses

func (s *FileSerializer) AppendBatch(batch coldata.Batch) error

AppendBatch adds one batch of columnar data to the file.

func (*FileSerializer) Finish Uses

func (s *FileSerializer) Finish() error

Finish writes the footer metadata described by the arrow spec. Nothing can be called after Finish except Reset.

func (*FileSerializer) Reset Uses

func (s *FileSerializer) Reset(w io.Writer) error

Reset can be called to reuse this FileSerializer with a new io.Writer after calling Finish. The types will remain the ones passed to the constructor. The caller is responsible for closing the given writer.

type Queue Uses

type Queue interface {
    // Enqueue enqueues a coldata.Batch to this queue. A zero-length batch should
    // be enqueued when no more elements will be enqueued.
    // WARNING: Selection vectors are ignored.
    Enqueue(coldata.Batch) error
    // Dequeue dequeues a coldata.Batch from the queue into the batch that is
    // passed in. The boolean returned specifies whether the queue was not empty
    // (i.e. whether there was a batch to Dequeue). If true is returned and the
    // batch has a length of zero, the Queue is finished and will not be Enqueued
    // to. If an error is returned, the batch and boolean returned are
    // meaningless.
    Dequeue(coldata.Batch) (bool, error)
    // Close closes any resources associated with the Queue.
    Close() error
}

Queue describes a simple queue interface to which coldata.Batches can be Enqueued and Dequeued.

func NewDiskQueue Uses

func NewDiskQueue(cfg DiskQueueCfg, fs vfs.FS) (Queue, error)

NewDiskQueue creates a Queue that spills to disk.

type RecordBatchSerializer Uses

type RecordBatchSerializer struct {
    // contains filtered or unexported fields
}

RecordBatchSerializer serializes RecordBatches in the standard Apache Arrow IPC format using flatbuffers. Note that only RecordBatch messages are supported. This is because the full spec would be too much to support (support for DictionaryBatches, Tensors, SparseTensors, and Schema messages would be needed) and we only need the part of the spec that allows us to send data. The IPC format is described here: https://arrow.apache.org/docs/format/IPC.html

func NewRecordBatchSerializer Uses

func NewRecordBatchSerializer(typs []coltypes.T) (*RecordBatchSerializer, error)

NewRecordBatchSerializer creates a new RecordBatchSerializer according to typs. Note that Serializing or Deserializing data that does not follow the passed in schema results in undefined behavior.

func (*RecordBatchSerializer) Deserialize Uses

func (s *RecordBatchSerializer) Deserialize(data *[]*array.Data, bytes []byte) error

Deserialize deserializes an arrow IPC RecordBatch message contained in bytes into data. Deserializing a schema that does not match the schema given in NewRecordBatchSerializer results in undefined behavior.

func (*RecordBatchSerializer) Serialize Uses

func (s *RecordBatchSerializer) Serialize(
    w io.Writer, data []*array.Data,
) (metadataLen uint32, dataLen uint64, _ error)

Serialize serializes data as an arrow RecordBatch message and writes it to w. Serializing a schema that does not match the schema given in NewRecordBatchSerializer results in undefined behavior.

Directories

PathSynopsis
arrowserdePackage arrowserde contains the flatbuffer generated code used for Apache Arrow serialization (and some small helpers associated with the generated code).

Package colserde imports 24 packages (graph) and is imported by 2 packages. Updated 2020-01-28. Refresh now. Tools for package owners.