bow

package module
v1.0.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 7, 2023 License: Apache-2.0 Imports: 26 Imported by: 0

README

Bow

lint ci

This project is experimental and not ready for production. The interface and methods are still under heavy changes.

Bow is meant to be an efficient data manipulation framework based on Apache Arrow for the Go programming language. Inspired by Pandas, Bow aims to bring the last missing block required to make Golang a data science ready language.

Bow is currently developed internally at Metronlab with primary concerns about timeseries. Don't hesitate to send issues and contribute to the library design.

Roadmap

Data types handling

  • implement string, int64, float64, bool data types
  • use go gen as a palliative for the lack of generics in Go
  • handle all Arrow data types

Serialization

  • expose native Arrow stringer
  • implement Parquet serialization
  • expose native Arrow CSV (through record / schema access)
  • expose native Arrow JSON
  • expose native Arrow IPC

Features

  • implement windowed data aggregations
  • implement windowed data interpolations
  • implement Fill methods to handle missing data
  • implement InnerJoin method
  • implement OuterJoin method
  • implement Select columns method
  • handle Arrow Schema metadata
  • implement Apply method
  • implement facade for all accessible features to simplify usage
  • improve Bow append method in collaboration with Arrow maintainers

Go to v1

  • complete Go native doc
  • examples for each methods
  • implement package to compare Bow and Pandas performances
  • API frozen, new releases won't break your code
  • support dataframes with several columns having the same name

Documentation

Index

Examples

Constants

View Source
const (
	// Unknown is placed first to be the default when allocating Type or []Type.
	Unknown = Type(iota)

	// Float64 and following types are native arrow type supported by bow.
	Float64
	Int64
	Boolean
	String

	// InputDependent is used in aggregations when the output type is dependent on the input type.
	InputDependent

	// IteratorDependent is used in aggregations when the output type is dependent on the iterator type.
	IteratorDependent
)

Variables

View Source
var ErrColTimeUnitNotFound = errors.New("column time unit not found in parquet metadata")

Functions

func GenStrategyDecremental added in v0.16.0

func GenStrategyDecremental(typ Type, seed int) interface{}

GenStrategyDecremental generates a number of type `typ` equal to the opposite of the converted `seed` value.

func GenStrategyIncremental added in v0.16.0

func GenStrategyIncremental(typ Type, seed int) interface{}

GenStrategyIncremental generates a number of type `typ` equal to the converted `seed` value.

func GenStrategyRandom added in v0.16.0

func GenStrategyRandom(typ Type, seed int) interface{}

GenStrategyRandom generates a random number of type `typ`.

func GenStrategyRandomDecremental added in v0.16.0

func GenStrategyRandomDecremental(typ Type, seed int) interface{}

GenStrategyRandomDecremental generates a random number of type `typ` by using the `seed` value.

func GenStrategyRandomIncremental added in v0.16.0

func GenStrategyRandomIncremental(typ Type, seed int) interface{}

GenStrategyRandomIncremental generates a random number of type `typ` by using the `seed` value.

func ToBoolean added in v0.16.0

func ToBoolean(input interface{}) (output bool, ok bool)

ToBoolean attempts to convert `input` to bool. Return also a false boolean if the conversion failed. In case of numeric type, returns true if the value is non-zero.

func ToFloat64

func ToFloat64(input interface{}) (output float64, ok bool)

ToFloat64 attempts to convert `input` to float64. Return also a false boolean if the conversion failed.

func ToInt64

func ToInt64(input interface{}) (output int64, ok bool)

ToInt64 attempts to convert `input` to int64. Return also a false boolean if the conversion failed.

func ToString

func ToString(input interface{}) (output string, ok bool)

ToString attempts to convert `input` to string. Return also a false boolean if the conversion failed.

Types

type Bow

type Bow interface {
	String() string
	Schema() *arrow.Schema
	ArrowRecord() *arrow.Record

	ColumnName(colIndex int) string
	NumRows() int
	NumCols() int

	ColumnType(colIndex int) Type
	ColumnIndex(colName string) (int, error)
	NewBufferFromCol(colIndex int) Buffer
	NewSeriesFromCol(colIndex int) Series

	Metadata() Metadata
	WithMetadata(metadata Metadata) Bow
	SetMetadata(key, value string) Bow

	GetRow(rowIndex int) map[string]interface{}
	GetRowsChan() <-chan map[string]interface{}

	GetValue(colIndex, rowIndex int) interface{}
	GetPrevValue(colIndex, rowIndex int) (value interface{}, resRowIndex int)
	GetNextValue(colIndex, rowIndex int) (value interface{}, resRowIndex int)
	GetPrevValues(colIndex1, colIndex2, rowIndex int) (value1, value2 interface{}, resRowIndex int)
	GetNextValues(colIndex1, colIndex2, rowIndex int) (value1, value2 interface{}, resRowIndex int)
	GetPrevRowIndex(colIndex, rowIndex int) int
	GetNextRowIndex(colIndex, rowIndex int) int

	GetInt64(colIndex, rowIndex int) (value int64, valid bool)
	GetPrevInt64(colIndex, rowIndex int) (value int64, resRowIndex int)
	GetNextInt64(colIndex, rowIndex int) (value int64, resRowIndex int)

	GetFloat64(colIndex, rowIndex int) (value float64, valid bool)
	GetPrevFloat64(colIndex, rowIndex int) (value float64, resRowIndex int)
	GetNextFloat64(colIndex, rowIndex int) (value float64, resRowIndex int)
	GetPrevFloat64s(colIndex1, colIndex2, rowIndex int) (value1, value2 float64, resRowIndex int)
	GetNextFloat64s(colIndex1, colIndex2, rowIndex int) (value1, value2 float64, resRowIndex int)

	Distinct(colIndex int) Bow

	Find(columnIndex int, value interface{}) int
	FindNext(columnIndex, rowIndex int, value interface{}) int
	Contains(columnIndex int, value interface{}) bool

	Filter(fns ...RowCmp) Bow
	MakeFilterValues(colIndex int, values ...interface{}) RowCmp

	AddCols(newCols ...Series) (Bow, error)
	RenameCol(colIndex int, newName string) (Bow, error)
	Apply(colIndex int, returnType Type, fn func(interface{}) interface{}) (Bow, error)
	Convert(colIndex int, t Type) (Bow, error)

	InnerJoin(other Bow) Bow
	OuterJoin(other Bow) Bow

	Diff(colIndices ...int) (Bow, error)

	NewSlice(i, j int) Bow
	Select(colIndices ...int) (Bow, error)
	NewEmptySlice() Bow
	DropNils(colIndices ...int) (Bow, error)
	SortByCol(colIndex int) (Bow, error)

	FillPrevious(colIndices ...int) (Bow, error)
	FillNext(colIndices ...int) (Bow, error)
	FillMean(colIndices ...int) (Bow, error)
	FillLinear(refColIndex, toFillColIndex int) (Bow, error)

	Equal(other Bow) bool
	IsColEmpty(colIndex int) bool
	IsColSorted(colIndex int) bool

	MarshalJSON() (buf []byte, err error)
	UnmarshalJSON(data []byte) error
	NewValuesFromJSON(jsonB JSONBow) error
	WriteParquet(path string, verbose bool) error
	GetParquetMetaColTimeUnit(colIndex int) (time.Duration, error)
}

Bow is wrapping the Apache Arrow arrow.Record interface, which is a collection of equal-length arrow.Array matching a particular arrow.Schema. Its purpose is to add convenience methods to easily manipulate dataframes.

func AppendBows

func AppendBows(bows ...Bow) (Bow, error)

AppendBows attempts to append bows with equal schemas. Different schemas will lead to undefined behavior. Resulting metadata is copied from the first bow.

func NewBow

func NewBow(series ...Series) (Bow, error)

NewBow returns a new Bow from one or more Series.

Example
b, err := NewBow(
	NewSeries("col1", Int64, []int64{1, 2, 3, 4}, nil),
	NewSeries("col2", Float64, []float64{1.1, 2.2, 3.3, 4}, []bool{true, false, true, true}),
	NewSeries("col3", Boolean, []bool{true, false, true, false}, []bool{true, false, true, true}),
)
if err != nil {
	panic(err)
}

fmt.Println(b)
Output:

col1:int64  col2:float64  col3:bool
1           1.1           true
2           <nil>         <nil>
3           3.3           true
4           4             false

func NewBowEmpty added in v0.7.3

func NewBowEmpty() Bow

NewBowEmpty returns a new empty Bow.

func NewBowFromColBasedInterfaces added in v0.7.3

func NewBowFromColBasedInterfaces(colNames []string, colTypes []Type, colBasedData [][]interface{}) (Bow, error)

NewBowFromColBasedInterfaces returns a new Bow:

  • colNames contains the Series names
  • colTypes contains the Series data types, optional (if nil, the types will be automatically seeked)
  • colBasedData contains the data itself as a two-dimensional slice, with the first dimension being the columns (colNames and colBasedData need to be of the same size)
Example
colNames := []string{"time", "value", "valueFromJSON"}
colTypes := make([]Type, len(colNames))
colTypes[0] = Int64
colBasedData := [][]interface{}{
	{1, 1.2, json.Number("3")},
	{1, json.Number("1.2"), 3},
	{json.Number("1.1"), 2, 1.3},
}

b, err := NewBowFromColBasedInterfaces(colNames, colTypes, colBasedData)
if err != nil {
	panic(err)
}

fmt.Println(b)
Output:

time:int64  value:int64  valueFromJSON:float64
1           1            1.1
1           <nil>        2
3           3            1.3

func NewBowFromParquet added in v0.12.0

func NewBowFromParquet(path string, verbose bool) (Bow, error)

NewBowFromParquet loads a parquet object from the file path, returning a new Bow. Only value columns are used to create the new Bow. Argument verbose is used to print information about the file loaded.

func NewBowFromRowBasedInterfaces

func NewBowFromRowBasedInterfaces(colNames []string, colTypes []Type, rowBasedData [][]interface{}) (Bow, error)

NewBowFromRowBasedInterfaces returns a new Bow:

  • colNames contains the Series names
  • colTypes contains the Series data types, required
  • rowBasedData contains the data itself as a two-dimensional slice, with the first dimension being the rows (colNames and rowBasedData need to be of the same size)
Example
colNames := []string{"time", "value", "valueFromJSON"}
colTypes := []Type{Int64, Int64, Float64}
rowBasedData := [][]interface{}{
	{1, 1, json.Number("1.1")},
	{1.2, json.Number("1.2"), 2},
	{json.Number("3"), 3, 1.3},
}

b, err := NewBowFromRowBasedInterfaces(colNames, colTypes, rowBasedData)
if err != nil {
	panic(err)
}

fmt.Println(b)
Output:

time:int64  value:int64  valueFromJSON:float64
1           1            1.1
1           <nil>        2
3           3            1.3

func NewBowWithMetadata added in v0.12.0

func NewBowWithMetadata(metadata Metadata, series ...Series) (Bow, error)

NewBowWithMetadata returns a new Bow from Metadata and Series.

func NewGenBow added in v0.9.0

func NewGenBow(numRows int, options ...GenSeriesOptions) (Bow, error)

NewGenBow generates a new random Bow with `numRows` rows and eventual GenSeriesOptions.

type Buffer

type Buffer struct {
	Data     interface{}
	DataType Type
	// contains filtered or unexported fields
}

Buffer is a mutable data structure with the purpose of easily building data Series with: - Data: slice of data. - DataType: type of the data. - nullBitmapBytes: slice of bytes representing valid or null values.

func NewBuffer

func NewBuffer(size int, typ Type) Buffer

NewBuffer returns a new Buffer of size `size` and Type `typ`.

func NewBufferFromInterfaces

func NewBufferFromInterfaces(typ Type, data []interface{}) (Buffer, error)

NewBufferFromInterfaces returns a new typed Buffer with the data represented as a slice of interface{}, with eventual nil values.

func (*Buffer) GetValue added in v0.16.0

func (b *Buffer) GetValue(i int) interface{}

GetValue gets the value at index `i` from the Buffer

func (Buffer) IsNull added in v0.16.0

func (b Buffer) IsNull(rowIndex int) bool

IsNull return true if the value at row `rowIndex` is nil.

func (Buffer) IsSorted added in v0.17.0

func (b Buffer) IsSorted() bool

IsSorted returns true if the values of the Buffer are sorted in ascending order.

func (Buffer) IsValid added in v0.16.0

func (b Buffer) IsValid(rowIndex int) bool

IsValid return true if the value at row `rowIndex` is valid.

func (Buffer) Len added in v0.16.0

func (b Buffer) Len() int

Len returns the size of the underlying slice of data in the Buffer.

func (Buffer) Less added in v0.17.0

func (b Buffer) Less(i, j int) bool

func (*Buffer) SetOrDrop

func (b *Buffer) SetOrDrop(i int, value interface{})

SetOrDrop sets the Buffer data at index `i` by attempting to convert `value` to its DataType. Sets the value to nil if the conversion failed or if `value` is nil.

func (*Buffer) SetOrDropStrict added in v0.16.0

func (b *Buffer) SetOrDropStrict(i int, value interface{})

SetOrDropStrict sets the Buffer data at index `i` by attempting a type assertion of `value` to its DataType. Sets the value to nil if the assertion failed or if `value` is nil.

func (Buffer) Swap added in v0.17.0

func (b Buffer) Swap(i, j int)

Swap swaps the values of the Buffer at indices i and j.

type CommonRows added in v0.16.0

type CommonRows struct {
	// contains filtered or unexported fields
}

type GenSeriesOptions added in v0.16.0

type GenSeriesOptions struct {
	NumRows     int
	Name        string
	Type        Type
	GenStrategy GenStrategy
	MissingData bool
}

GenSeriesOptions are options to generate random Series: - NumRows: number of rows of the resulting Series - Name: name of the Series - Type: data type of the Series - GenStrategy: strategy of data generation - MissingData: sets whether the Series includes random nil values

type GenStrategy added in v0.16.0

type GenStrategy func(typ Type, seed int) interface{}

GenStrategy defines how random values are generated.

type JSONBow added in v0.9.0

type JSONBow struct {
	Schema       JSONSchema               `json:"schema"`
	RowBasedData []map[string]interface{} `json:"data"`
}

JSONBow is a structure representing a Bow for JSON marshaling purpose.

func NewJSONBow added in v0.9.0

func NewJSONBow(b Bow) JSONBow

NewJSONBow returns a new JSONBow structure from a Bow.

type JSONSchema added in v0.9.0

type JSONSchema struct {
	Fields []jsonField `json:"fields"`
}

type Metadata added in v0.12.0

type Metadata struct {
	arrow.Metadata
}

Metadata is wrapping arrow.Metadata.

func NewMetadata added in v0.12.0

func NewMetadata(keys, values []string) Metadata

NewMetadata returns a new Metadata.

func (*Metadata) Set added in v0.14.0

func (m *Metadata) Set(newKey, newValue string) Metadata

Set returns a new Metadata with the key/value pair set. If the key already exists, it replaces its value.

func (*Metadata) SetMany added in v0.14.0

func (m *Metadata) SetMany(newKeys, newValues []string) Metadata

SetMany returns a new Metadata with the key/value pairs set. If a key already exists, it replaces its value.

type RowCmp added in v0.17.0

type RowCmp func(b Bow, i int) bool

RowCmp implementation is required for Filter passing full dataset multidimensional comparators implementations, cross column for instance index argument is the current row to compare

type Series

type Series struct {
	Name  string
	Array arrow.Array
}

Series is wrapping the Apache Arrow arrow.Array interface, with the addition of a name. It represents an immutable sequence of values using the Arrow in-memory format.

func NewGenSeries added in v0.16.0

func NewGenSeries(o GenSeriesOptions) Series

NewGenSeries returns a new randomly generated Series.

func NewSeries

func NewSeries(name string, typ Type, dataArray, validityArray interface{}) Series

NewSeries returns a new Series from: - name: string - typ: Bow data Type - dataArray: slice of the data - validityArray:

  • if nil, the data will be non-nil
  • can be of type []bool or []byte to represent nil values

func NewSeriesFromBuffer added in v0.16.0

func NewSeriesFromBuffer(name string, buf Buffer) Series

NewSeriesFromBuffer returns a new Series from a name and a Buffer.

func NewSeriesFromInterfaces

func NewSeriesFromInterfaces(name string, typ Type, data []interface{}) Series

NewSeriesFromInterfaces returns a new Series from: - name: string - typ: Bow Type - data: represented by a slice of interface{}, with eventually nil values

type Type

type Type int

func GetAllTypes added in v0.16.0

func GetAllTypes() []Type

GetAllTypes returns all Bow types.

func (Type) Convert

func (t Type) Convert(input interface{}) interface{}

func (Type) IsSupported added in v0.8.0

func (t Type) IsSupported() bool

IsSupported ensures that the Type t is currently supported by Bow and matches a convertible concrete type.

func (Type) String

func (t Type) String() string

String returns the string representation of the Type t.

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL