arrowutils

package
v0.0.0-...-66c0612 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 30, 2024 License: Apache-2.0 Imports: 18 Imported by: 0

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func EnsureSameSchema

func EnsureSameSchema(records []arrow.Record) ([]arrow.Record, error)

EnsureSameSchema ensures that all the records have the same schema. In cases where the schema is not equal, virtual null columns are inserted in the records with the missing column. When we have static schemas in the execution engine, steps like these should be unnecessary.

func GetGroupsAndOrderedSetRanges

func GetGroupsAndOrderedSetRanges(
	firstGroup []any, arrs []arrow.Array,
) (*Int64Heap, *Int64Heap, []any, error)

GetGroupsAndOrderedSetRanges returns a min-heap of group ranges and ordered set ranges of the given arrow arrays in that order. For the given input with a single array: a a c d a b c This function will return [2, 3, 4, 5, 6] for the group ranges and [4] for the ordered set ranges. A group is a collection of values that are equal and an ordered set is a collection of groups that are in increasing order. The ranges are determined by iterating over the arrays and comparing the current group value for each column. The firstGroup to compare against must be provided (it can be initialized to the values at index 0 of each array). The last group found is returned.

func MakeNullArray

func MakeNullArray(mem memory.Allocator, dt arrow.DataType, len int) arrow.Array

MakeNullArray makes a physical arrow.Array full of NULLs of the given DataType.

func MergeRecords

func MergeRecords(
	mem memory.Allocator,
	records []arrow.Record,
	orderByCols []SortingColumn,
	limit uint64,
) (arrow.Record, error)

MergeRecords merges the given records. The records must all have the same schema. orderByCols is a slice of indexes into the columns that the records and resulting records are ordered by. While ordering the limit is checked before appending more rows. If limit is 0, no limit is applied. Note that the given records should already be ordered by the given columns. WARNING: Only ascending ordering is currently supported.

func SortRecord

func SortRecord(r arrow.Record, columns []SortingColumn) (*array.Int32, error)

SortRecord sorts given arrow.Record by columns. Returns *array.Int32 of indices to sorted rows or record r.

Comparison is made sequentially by each column. When rows are equal in the first column we compare the rows om the second column and so on and so forth until rows that are not equal are found.

func Take

func Take(ctx context.Context, r arrow.Record, indices *array.Int32) (arrow.Record, error)

Take uses indices which is an array of row index and returns a new record that only contains rows specified in indices.

Use compute.WithAllocator to pass a custom memory.Allocator.

func TakeColumn

func TakeColumn(ctx context.Context, a arrow.Array, idx int, arr []arrow.Array, indices *array.Int32) error

func TakeDictColumn

func TakeDictColumn(ctx context.Context, a *array.Dictionary, idx int, arr []arrow.Array, indices *array.Int32) error

Types

type ArrayConcatenator

type ArrayConcatenator struct {
	// contains filtered or unexported fields
}

ArrayConcatenator is an object that helps callers keep track of a slice of arrays and concatenate them into a single one when needed. This is more efficient and memory safe than using a builder.

func (*ArrayConcatenator) Add

func (c *ArrayConcatenator) Add(arr arrow.Array)

func (*ArrayConcatenator) Len

func (c *ArrayConcatenator) Len() int

func (*ArrayConcatenator) NewArray

func (c *ArrayConcatenator) NewArray(mem memory.Allocator) (arrow.Array, error)

func (*ArrayConcatenator) Release

func (c *ArrayConcatenator) Release()

type Direction

type Direction uint
const (
	Ascending Direction = iota
	Descending
)

type Int64Heap

type Int64Heap []int64

func (Int64Heap) Len

func (h Int64Heap) Len() int

func (Int64Heap) Less

func (h Int64Heap) Less(i, j int) bool

func (*Int64Heap) Pop

func (h *Int64Heap) Pop() any

func (*Int64Heap) PopNextNotEqual

func (h *Int64Heap) PopNextNotEqual(compare int64) (int64, bool)

PopNextNotEqual returns the next least element not equal to compare.

func (*Int64Heap) Push

func (h *Int64Heap) Push(x any)

func (Int64Heap) Swap

func (h Int64Heap) Swap(i, j int)

func (*Int64Heap) Unwrap

func (h *Int64Heap) Unwrap(scratch []int64) []int64

Unwrap unwraps the heap into the provided scratch space. The result is a slice that will have distinct ints in order. This helps with reiterating over the same heap.

type SortingColumn

type SortingColumn struct {
	Index      int
	Direction  Direction
	NullsFirst bool
}

SortingColumn describes a sorting column on a arrow.Record.

type VirtualNullArray

type VirtualNullArray struct {
	// contains filtered or unexported fields
}

VirtualNullArray is an arrow.Array that will return that any element is null via the arrow.Array interface methods. This is useful if callers need to represent an array of len NULL values without allocating/storing a bitmap. This should only be used internally. If callers need a physical null array, call MakeNullArray.

func MakeVirtualNullArray

func MakeVirtualNullArray(dt arrow.DataType, len int) VirtualNullArray

func (VirtualNullArray) Data

func (n VirtualNullArray) Data() arrow.ArrayData

func (VirtualNullArray) DataType

func (n VirtualNullArray) DataType() arrow.DataType

func (VirtualNullArray) GetOneForMarshal

func (n VirtualNullArray) GetOneForMarshal(_ int) any

func (VirtualNullArray) IsNull

func (n VirtualNullArray) IsNull(_ int) bool

func (VirtualNullArray) IsValid

func (n VirtualNullArray) IsValid(_ int) bool

func (VirtualNullArray) Len

func (n VirtualNullArray) Len() int

func (VirtualNullArray) MarshalJSON

func (n VirtualNullArray) MarshalJSON() ([]byte, error)

func (VirtualNullArray) NullBitmapBytes

func (n VirtualNullArray) NullBitmapBytes() []byte

func (VirtualNullArray) NullN

func (n VirtualNullArray) NullN() int

func (VirtualNullArray) Release

func (n VirtualNullArray) Release()

func (VirtualNullArray) Retain

func (n VirtualNullArray) Retain()

func (VirtualNullArray) String

func (n VirtualNullArray) String() string

func (VirtualNullArray) ValueStr

func (n VirtualNullArray) ValueStr(_ int) string

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL