tada

package module
v0.0.0-...-192ff94 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 16, 2020 License: Apache-2.0 Imports: 20 Imported by: 0

README

tada

tada (TAble DAta) is a package that enables test-driven data pipelines in pure Go.

tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data.

This package is a stable copy of tada@v0.8.8 and no changes will be made.

Usage

Examples

Documentation

Overview

Package tada is a stable copy of tada@v0.8.8 and no changes will be made.

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func DisableWarnings

func DisableWarnings()

DisableWarnings prevents tada from writing warning messages to the default log writer.

func EnableWarnings

func EnableWarnings()

EnableWarnings allows tada to write warning messages to the default log writer.

func EqualDataFrames

func EqualDataFrames(a, b *DataFrame) bool

EqualDataFrames returns whether two dataframes are identical or not.

func EqualSeries

func EqualSeries(a, b *Series) bool

EqualSeries returns whether two Series are identical or not.

func GetOptionDefaultNullStrings

func GetOptionDefaultNullStrings() []string

GetOptionDefaultNullStrings returns the default list of strings that tada considers null.

func JoinOptionHow

func JoinOptionHow(how string) func(*joinConfig)

JoinOptionHow specifies how to join two Series or DataFrames. Supported options: left (ie left join), right, inner (default: left).

func JoinOptionLeftOn

func JoinOptionLeftOn(keys []string) func(*joinConfig)

JoinOptionLeftOn specifies the key(s) to use to join the left Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.

func JoinOptionRightOn

func JoinOptionRightOn(keys []string) func(*joinConfig)

JoinOptionRightOn specifies the key(s) to use to join the right Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.

func MakeMultiLevelLabels

func MakeMultiLevelLabels(labels []interface{}) ([]interface{}, error)

MakeMultiLevelLabels expects labels to be a slice of slices. It returns a product of these slices by repeating each label value n times, where n is the number of unique label values in the other slices.

For example, [["foo", "bar"], [1, 2, 3]] returns [["foo", "foo", "foo", "bar", "bar", "bar"], [1, 2, 3, 1, 2, 3]]

func PrettyDiff

func PrettyDiff(got, want interface{}) (bool, *tablediff.Differences, error)

PrettyDiff reads two structs into DataFrames, prints each as a stringified csv table, and returns whether they are equal. If not, returns the differences between the two.

func PrintOptionMaxCellWidth

func PrintOptionMaxCellWidth(n int)

PrintOptionMaxCellWidth changes the max rune width of any cell displayed when printing a Series or DataFrame to n (default: 30).

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]string{"corgilius", "barrius", "foo"},
	}).SetColNames([]string{"waldonius"})
	tada.PrintOptionMaxCellWidth(5)
	fmt.Println(df)
	tada.PrintOptionMaxCellWidth(30)
}
Output:

+---++-------+
| - || wa... |
|---||-------|
| 0 || co... |
| 1 || ba... |
| 2 ||   foo |
+---++-------+

func PrintOptionMaxColumns

func PrintOptionMaxColumns(n int)

PrintOptionMaxColumns changes the max number of columns displayed when printing a Series or DataFrame to n (default: 20).

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []float64{3, 4}, []float64{5, 6},
		[]float64{3, 4}, []float64{5, 6},
	}).SetColNames([]string{"A", "B", "C", "D", "E"})
	tada.PrintOptionMaxColumns(2)
	fmt.Println(df)
	tada.PrintOptionMaxColumns(20)
}
Output:

+---++---+-----+---+
| - || A | ... | E |
|---||---|-----|---|
| 0 || 1 | ... | 5 |
| 1 || 2 |     | 6 |
+---++---+-----+---+

func PrintOptionMaxRows

func PrintOptionMaxRows(n int)

PrintOptionMaxRows changes the max number of rows displayed when printing a Series or DataFrame to n (default: 50).

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3, 4, 5, 6, 7, 8}}).SetColNames([]string{"A"})
	tada.PrintOptionMaxRows(6)
	fmt.Println(df)
	tada.PrintOptionMaxRows(50)
}
Output:

+-----++-----+
|  -  ||  A  |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   2 ||   3 |
| ... || ... |
|   5 ||   6 |
|   6 ||   7 |
|   7 ||   8 |
+-----++-----+

func PrintOptionMergeRepeats

func PrintOptionMergeRepeats(set bool)

PrintOptionMergeRepeats (if true) instructs the String() function to merge repeated non-header values when printing a Series or DataFrame (default: true).

func PrintOptionWrapLines

func PrintOptionWrapLines(set bool)

PrintOptionWrapLines (if true) instructs the String() function to wrap overly-wide rows onto new lines instead of truncating them when printing a Series or DataFrame (default: truncate).

func ReadOptionDelimiter

func ReadOptionDelimiter(sep rune) func(*readConfig)

ReadOptionDelimiter configures a read function to use sep as a field delimiter for use in ReadCSV (default: ",").

func ReadOptionHeaders

func ReadOptionHeaders(n int) func(*readConfig)

ReadOptionHeaders configures a read function to expect n rows to be column headers (default: 1).

func ReadOptionLabels

func ReadOptionLabels(n int) func(*readConfig)

ReadOptionLabels configures a read function to expect the first n columns to be label levels (default: 0).

func ReadOptionSwitchDims

func ReadOptionSwitchDims() func(*readConfig)

ReadOptionSwitchDims configures a read function to expect columns to be the major dimension of csv data (default: expects rows to be the major dimension). For example, when reading this data:

[["foo", "bar"], ["baz", "qux"]]

default ReadOptionSwitchDims() (major dimension: rows) (major dimension: columns)

	foo bar							foo baz
 baz qux							bar qux

func SetOptionAddTimeFormat

func SetOptionAddTimeFormat(format string)

SetOptionAddTimeFormat adds format to the list of time formats that can be parsed when converting values from string to time.Time.

func SetOptionDefaultSeparator

func SetOptionDefaultSeparator(sep string)

SetOptionDefaultSeparator changes the separator used in group names and multi-level column names to sep (default: "|").

func SetOptionNaNStatus

func SetOptionNaNStatus(set bool)

SetOptionNaNStatus sets whether math.NaN() is considered a null value or not (default: true).

func SetOptionNullStrings

func SetOptionNullStrings(list []string)

SetOptionNullStrings replaces the default list of strings that tada considers null with list.

func WriteMockCSV

func WriteMockCSV(w io.Writer, n int, r io.Reader, options ...ReadOption) error

WriteMockCSV reads r (configured by options) and writes n mock rows to w, with column names and types inferred based on the data in src. Regardless of the major dimension of src, the major dimension of the output is rows. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row, no labels, rows as major dimension

func WriteOptionDelimiter

func WriteOptionDelimiter(sep rune) func(*writeConfig)

WriteOptionDelimiter configures a write function to use sep as a field delimiter for use in write functions (default: ",").

func WriteOptionExcludeLabels

func WriteOptionExcludeLabels() func(*writeConfig)

WriteOptionExcludeLabels excludes the label levels from the output.

Types

type ApplyFn

type ApplyFn func(slice interface{}, isNull []bool) (equalLengthSlice interface{})

An ApplyFn is an anonymous function supplied to an Apply function to convert one slice to another. The function input will be a slice, and it must return a slice of equal length (though the type may be different). isNull contains the null status of every row in the input slice. The null status of a row may be changed by setting that row's isNull element within the function body.

type Binner

type Binner struct {
	AndLess bool
	AndMore bool
	Labels  []string
}

Binner supplies logic for the Bin() function. If `AndLess` is true, a bin is added that ranges between negative infinity and the first bin value. If `AndMore` is true, a bin is added that ranges between the last bin value and positive infinity. If `Labels` is not nil, then category names correspond to labels, and the number of labels must be one less than the number of bin values. Otherwise, category names are auto-generated from the range of the bin intervals.

type DType

type DType int

DType is a DataType that may be used in Sort() or Cast().

const (
	// Float64 -> float64
	Float64 DType = iota
	// String -> string
	String
	// DateTime -> time.Time
	DateTime // always tz-aware
	// Time -> civil.Time
	Time
	// Date -> civil.Date
	Date
)

type DataFrame

type DataFrame struct {
	// contains filtered or unexported fields
}

A DataFrame is one or more columns of data with one or more levels of aligned labels. A DataFrame is analogous to a spreadsheet.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []string{"baz", "qux"}},
	).SetName("foo")
	fmt.Println(df)
}
Output:

+---++---+-----+
| - || 0 |  1  |
|---||---|-----|
| 0 || 1 | baz |
| 1 || 2 | qux |
+---++---+-----+
name: foo

func ConcatSeries

func ConcatSeries(series ...*Series) (*DataFrame, error)

ConcatSeries merges multiple Series from left-to-right, one after the other, via left joins on shared keys. For advanced cases, use df.LookupAdvanced() + df.WithCol().

func NewDataFrame

func NewDataFrame(slices []interface{}, labels ...interface{}) *DataFrame

NewDataFrame creates a new DataFrame with slices (akin to column values) and optional labels. Slices must be comprised of supported slices, and each label must be a supported slice.

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Columns are named sequentially (e.g., 0, 1, etc) by default. Default column names are displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.

Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).

func ReadCSV

func ReadCSV(r io.Reader, options ...ReadOption) (*DataFrame, error)

ReadCSV reads csv records in r into a Dataframe (configured by options). Rows must be the major dimension of r. For advanced cases, use the standard csv library NewReader().ReadAll() + tada.ReadCSVFromRecords(). Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter.

Default if no options are supplied: 1 header row; no labels; field delimiter is ","

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

Example
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data))
	fmt.Println(df)
}
Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+
Example (Delimiter)
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := `foo|bar
	baz|qux
	corge|fred`
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionDelimiter('|'))
	fmt.Println(df)
}
Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+
Example (MultipleHeaders)
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2))
	fmt.Println(df)
}
Output:

+---++-------+------+
|   ||  foo  | bar  |
| - ||  baz  | qux  |
|---||-------|------|
| 0 || corge | fred |
+---++-------+------+
Example (MultipleHeadersWithLabels)
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := ", foo, bar\n labels, baz, qux\n 1, corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2), tada.ReadOptionLabels(1))
	fmt.Println(df)
}
Output:

+--------++-------+------+
|        ||  foo  | bar  |
| labels ||  baz  | qux  |
|--------||-------|------|
|      1 || corge | fred |
+--------++-------+------+
Example (NoHeaders)
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(0))
	fmt.Println(df)
}
Output:

+---++-------+------+
| - ||   0   |  1   |
|---||-------|------|
| 0 ||   foo |  bar |
| 1 ||   baz |  qux |
| 2 || corge | fred |
+---++-------+------+
Example (WithLabels)
package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := `foo, bar
	baz, qux
	corge, fred`
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionLabels(1))
	fmt.Println(df)
}
Output:

+-------++------+
|  foo  || bar  |
|-------||------|
|   baz ||  qux |
| corge || fred |
+-------++------+

func ReadCSVFromRecords

func ReadCSVFromRecords(records [][]string, options ...ReadOption) (ret *DataFrame, err error)

ReadCSVFromRecords reads records into a DataFrame (configured by options). Often used with encoding/csv.NewReader().ReadAll() All columns will be read as []string. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row; no labels; rows as major dimension

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := [][]string{
		{"foo", "bar"},
		{"baz", "qux"},
		{"corge", "fred"},
	}
	df, _ := tada.ReadCSVFromRecords(data)
	fmt.Println(df)
}
Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+
Example (ColsAsMajorDimension)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	data := [][]string{
		{"foo", "bar"},
		{"baz", "qux"},
		{"corge", "fred"},
	}
	df, _ := tada.ReadCSVFromRecords(data, tada.ReadOptionSwitchDims())
	fmt.Println(df)
}
Output:

+---++-----+-----+-------+
| - || foo | baz | corge |
|---||-----|-----|-------|
| 0 || bar | qux |  fred |
+---++-----+-----+-------+

func ReadInterfaceRecords

func ReadInterfaceRecords(records [][]interface{}, options ...ReadOption) (ret *DataFrame, err error)

ReadInterfaceRecords reads records into a DataFrame (configured by options). All columns will be read as []interface{}. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row; no labels; rows as major dimension

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

func ReadMatrix

func ReadMatrix(mat Matrix) *DataFrame

ReadMatrix reads data satisfying the gonum Matrix interface into a DataFrame. Panics if any slices in the matrix are shorter than the first slice.

func ReadStruct

func ReadStruct(strct interface{}, options ...ReadOption) (*DataFrame, error)

ReadStruct reads the exported fields in strct into a DataFrame. strct must be a struct or pointer to a struct. If any exported field in strct is nil, returns an error.

If a "tada" tag is present with the value "isNull", this field must be [][]bool with one equal-lengthed slice for each exported field. These values will set the null status for each of the resulting value containers in the DataFrame, from left-to-right. If a "tada" tag has any other value, the resulting value container will have the same name as the tag value. Otherwise, the value container will have the same name as the exported field.

func ReadStructSlice

func ReadStructSlice(slice interface{}) (*DataFrame, error)

ReadStructSlice reads a slice of structs into a DataFrame with field names converted to column names, field values converted to column values, and default labels. The structs must all be of the same type.

A default label level named *0 is inserted ([]int incrementing from 0). Default label names are hidden on printing.

func (*DataFrame) Append

func (df *DataFrame) Append(other *DataFrame) *DataFrame

Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Returns a new DataFrame.

func (*DataFrame) Apply

func (df *DataFrame) Apply(lambdas map[string]ApplyFn) *DataFrame

Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new DataFrame.

func (*DataFrame) At

func (df *DataFrame) At(row, column int) *Element

At returns the Element at the row and column index positions. If row or column is out of range, returns nil.

func (*DataFrame) CSVRecords

func (df *DataFrame) CSVRecords(options ...WriteOption) [][]string

CSVRecords writes a DataFrame to a [][]string with rows as the major dimension. Null values are replaced with "(null)".

func (*DataFrame) Cast

func (df *DataFrame) Cast(containerAsType map[string]DType)

Cast coerces the underlying container values (column or label level) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time and caches the []byte values of the container (if inexpensive). Use cast to improve performance when calling multiple operations on values.

func (*DataFrame) Col

func (df *DataFrame) Col(name string) *Series

Col finds the first column with matching name and returns as a Series. Similar to SelectLabels(), but selects column values instead of label values.

func (*DataFrame) Cols

func (df *DataFrame) Cols(names ...string) *DataFrame

Cols returns all columns with matching names.

func (*DataFrame) Copy

func (df *DataFrame) Copy() *DataFrame

Copy returns a new DataFrame with identical values as the original but no shared objects (i.e., all internals are newly allocated).

func (*DataFrame) Count

func (df *DataFrame) Count() *Series

Count counts the number of non-null values in each column.

func (*DataFrame) DeduplicateNames

func (df *DataFrame) DeduplicateNames() *DataFrame

DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Returns a new DataFrame.

func (*DataFrame) DropCol

func (df *DataFrame) DropCol(name string) *DataFrame

DropCol drops the first column matching name. Returns a new DataFrame.

func (*DataFrame) DropLabels

func (df *DataFrame) DropLabels(name string) *DataFrame

DropLabels drops the first label level matching name. Returns a new DataFrame.

func (*DataFrame) DropNull

func (df *DataFrame) DropNull(subset ...string) *DataFrame

DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Returns a new DataFrame.

func (*DataFrame) DropRow

func (df *DataFrame) DropRow(index int) *DataFrame

DropRow removes the row at the specified index. Returns a new DataFrame.

func (*DataFrame) EqualsCSV

func (df *DataFrame) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)

EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both df and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.

If includeLabels is true, then df's labels are included as columns.

func (*DataFrame) Err

func (df *DataFrame) Err() error

Err returns the most recent error attached to the DataFrame, if any.

func (*DataFrame) FillNull

func (df *DataFrame) FillNull(how map[string]NullFiller) *DataFrame

FillNull fills null values and makes them non-null based on how, a map of container names (either column or label names) and tada.NullFiller structs. For each container name in the map, the first field selected (i.e., not left blank) in its NullFiller struct is the strategy used to replace null values in that container. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Returns a new DataFrame.

func (*DataFrame) Filter

func (df *DataFrame) Filter(filters map[string]FilterFn) *DataFrame

Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new DataFrame.

Example
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	dt1 := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC)
	dt2 := dt1.AddDate(0, 0, 1)
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3}, []time.Time{dt1, dt2, dt1}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)

	gt1 := func(val interface{}) bool { return val.(float64) > 1 }
	beforeDate := func(val interface{}) bool { return val.(time.Time).Before(dt2) }
	ret := df.Filter(map[string]tada.FilterFn{
		"foo": gt1,
		"bar": beforeDate,
	})
	fmt.Println(ret)
}
Output:

+---++-----+----------------------+
| - || foo |         bar          |
|---||-----|----------------------|
| 0 ||   1 | 2020-01-01T00:00:00Z |
| 1 ||   2 | 2020-01-02T00:00:00Z |
| 2 ||   3 | 2020-01-01T00:00:00Z |
+---++-----+----------------------+

+---++-----+----------------------+
| - || foo |         bar          |
|---||-----|----------------------|
| 2 ||   3 | 2020-01-01T00:00:00Z |
+---++-----+----------------------+

func (*DataFrame) FilterByValue

func (df *DataFrame) FilterByValue(filters map[string]interface{}) *DataFrame

FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Returns a new DataFrame.

func (*DataFrame) FilterCols

func (df *DataFrame) FilterCols(lambda func(string) bool, level int) *DataFrame

FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.

func (*DataFrame) FilterIndex

func (df *DataFrame) FilterIndex(container string, filterFn FilterFn) []int

FilterIndex returns the index positions of the rows in container that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil.

func (*DataFrame) GetLabels

func (df *DataFrame) GetLabels() []interface{}

GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame(). NB: If supplying this output to either of these constructors, be sure to use the spread operator (...), or else the labels will not be read as separate levels.

func (*DataFrame) GroupBy

func (df *DataFrame) GroupBy(names ...string) *GroupedDataFrame

GroupBy groups the DataFrame rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedDataFrame.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3, 4},
	},
		[]string{"foo", "bar", "foo", "bar"}).
		SetColNames([]string{"baz"})
	g := df.GroupBy()
	fmt.Println(g)
}
Output:

+-----++-----+
|  -  || baz |
|-----||-----|
| foo ||   1 |
|     ||   3 |
| bar ||   2 |
|     ||   4 |
+-----++-----+

func (*DataFrame) HasCols

func (df *DataFrame) HasCols(colNames ...string) error

HasCols returns an error if the DataFrame does not contain all of the colNames supplied.

func (*DataFrame) HasLabels

func (df *DataFrame) HasLabels(labelNames ...string) error

HasLabels returns an error if the DataFrame does not contain all of the labelNames supplied.

func (*DataFrame) HasType

func (df *DataFrame) HasType(sliceType string) (labelIndex, columnIndex []int)

HasType returns the index positions of all label and column containers containing a slice of values where reflect.Type.String() == sliceType. Container index positions may then be supplied to df.SubsetLabels() or df.SubsetCols().

For example, to search for datetime labels: labels, _ := df.HasType("[]time.Time")

To search for float64 columns: _, cols := df.HasType("[]float64")

func (*DataFrame) Head

func (df *DataFrame) Head(n int) *DataFrame

Head returns the first n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.

func (*DataFrame) InPlace

func (df *DataFrame) InPlace() *DataFrameMutator

InPlace returns a DataFrameMutator, which contains most of the same methods as DataFrame but never returns a new DataFrame. If you want to save memory and improve performance and do not need to preserve the original DataFrame, consider using InPlace().

func (*DataFrame) IndexOfContainer

func (df *DataFrame) IndexOfContainer(name string, columns bool) int

IndexOfContainer returns the index position of the first container with a name matching name (case-sensitive). If name does not match any container, -1 is returned. If columns is true, only column names will be searched. If columns is false, only label level names will be searched.

func (*DataFrame) InterfaceRecords

func (df *DataFrame) InterfaceRecords(options ...WriteOption) [][]interface{}

InterfaceRecords writes a DataFrame to a [][]interface{} with columns as the major dimension. Null values are replaced with "(null)".

func (*DataFrame) IsNull

func (df *DataFrame) IsNull(subset ...string) *DataFrame

IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Returns a new DataFrame.

func (*DataFrame) Iterator

func (df *DataFrame) Iterator() *DataFrameIterator

Iterator returns an iterator which may be used to access the values in each row as map[string]Element.

func (*DataFrame) LabelsAsSeries

func (df *DataFrame) LabelsAsSeries(name string) *Series

LabelsAsSeries finds the first label level with matching name and returns the values as a Series. Similar to Col(), but selects label values instead of column values. The labels in the Series are shared with the labels in the DataFrame. If label level name is default (prefixed with *), the prefix is removed.

func (*DataFrame) Len

func (df *DataFrame) Len() int

Len returns the number of rows in each column of the DataFrame.

func (*DataFrame) ListColNames

func (df *DataFrame) ListColNames() []string

ListColNames returns the name of all the columns in the DataFrame, in order. If df has multiple column levels, each column name is a single string with level values separated by "|" (may be changed with SetOptionDefaultSeparator). To return the names at a specific level, use ListColNamesAtLevel().

func (*DataFrame) ListColNamesAtLevel

func (df *DataFrame) ListColNamesAtLevel(level int) []string

ListColNamesAtLevel returns the name of all the columns in the DataFrame, in order, at the supplied column level. If level is out of range, returns a nil slice.

func (*DataFrame) ListLabelNames

func (df *DataFrame) ListLabelNames() []string

ListLabelNames returns the name of all the label levels in the DataFrame, in order.

func (*DataFrame) Lookup

func (df *DataFrame) Lookup(other *DataFrame, options ...JoinOption) (*DataFrame, error)

Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Lookup identifies the row alignment between df and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:

df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:

FOO BAR bar (null) baz corge

Returns a new DataFrame.

func (*DataFrame) Max

func (df *DataFrame) Max() *Series

Max coerces the values in each column to float64 and returns the maximum non-null value in each column.

func (*DataFrame) Mean

func (df *DataFrame) Mean() *Series

Mean coerces the values in each column to float64 and calculates the mean of each column.

func (*DataFrame) Median

func (df *DataFrame) Median() *Series

Median coerces the values in each column to float64 and calculates the median of each column.

func (*DataFrame) Merge

func (df *DataFrame) Merge(other *DataFrame, options ...JoinOption) (*DataFrame, error)

Merge joins other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Merge identifies the row alignment between df and other and appends aligned values as new columns on df. Rows are aligned when 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:

df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:

df FOO BAR QUX bar 0 null baz 1 corge

Finally, all container names (columns and label names) are deduplicated after the merge so that they are unique. Returns a new DataFrame.

func (*DataFrame) Min

func (df *DataFrame) Min() *Series

Min coerces the values in each column to float64 and returns the minimum non-null value in each column.

func (*DataFrame) NUnique

func (df *DataFrame) NUnique() *Series

NUnique counts the number of unique non-null values in each column.

func (*DataFrame) Name

func (df *DataFrame) Name() string

Name returns the name of the DataFrame.

func (*DataFrame) NameOfCol

func (df *DataFrame) NameOfCol(n int) string

NameOfCol returns the name of the column at index position n. If n is out of range, returns "-out of range-"

func (*DataFrame) NameOfLabel

func (df *DataFrame) NameOfLabel(n int) string

NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"

func (*DataFrame) NumColumns

func (df *DataFrame) NumColumns() int

NumColumns returns the number of colums in the DataFrame.

func (*DataFrame) NumLevels

func (df *DataFrame) NumLevels() int

NumLevels returns the number of label levels in the DataFrame.

func (*DataFrame) PivotTable

func (df *DataFrame) PivotTable(labels, columns, values, aggFunc string) (*DataFrame, error)

PivotTable creates a spreadsheet-style pivot table as a DataFrame by grouping rows using the unique values in labels, reducing the values in values using an aggFunc aggregation function, then promoting the unique values in columns to be new columns. labels, columns, and values should all refer to existing container names (either columns or labels). Supported aggFuncs: sum, mean, median, stdDev, count, min, max.

func (*DataFrame) PromoteToColLevel

func (df *DataFrame) PromoteToColLevel(name string) *DataFrame

PromoteToColLevel pivots an existing container (either column or label names) into a new column level. If promoting would use either the last column or index level, it returns an error. Each unique value in the stacked column is stacked above each existing column. Promotion can add new columns and remove label rows with duplicate values.

func (*DataFrame) Range

func (df *DataFrame) Range(first, last int) *DataFrame

Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is greater than the length of the DataFrame, an error is returned. Returns a new DataFrame.

func (*DataFrame) Reduce

func (df *DataFrame) Reduce(name string, lambda ReduceFn) (*Series, error)

Reduce uses lambda to reduce all columns to a Series named name with column names as labels and reduced values as row values. The type of the new Series is a slice with the same type as the first value outputted by the anonymous function.

func (*DataFrame) Relabel

func (df *DataFrame) Relabel() *DataFrame

Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new DataFrame.

func (*DataFrame) ReorderCols

func (df *DataFrame) ReorderCols(colNames []string) *DataFrame

ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.

func (*DataFrame) ReorderLabels

func (df *DataFrame) ReorderLabels(levelNames []string) *DataFrame

ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.

func (*DataFrame) Resample

func (df *DataFrame) Resample(how map[string]Resampler) *DataFrame

Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Returns a new DataFrame.

func (*DataFrame) ResetLabels

func (df *DataFrame) ResetLabels(index ...string) *DataFrame

ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level. If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Returns a new DataFrame.

func (*DataFrame) Series

func (df *DataFrame) Series() *Series

Series converts a single-columned DataFrame to a Series that shares the same underlying values and labels.

func (*DataFrame) SetAsLabels

func (df *DataFrame) SetAsLabels(colNames ...string) *DataFrame

SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Returns a new DataFrame.

func (*DataFrame) SetColNames

func (df *DataFrame) SetColNames(colNames []string) *DataFrame

SetColNames sets the names of all the columns in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []string{"baz", "qux"}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)
}
Output:

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 0 ||   1 | baz |
| 1 ||   2 | qux |
+---++-----+-----+

func (*DataFrame) SetLabelNames

func (df *DataFrame) SetLabelNames(levelNames []string) *DataFrame

SetLabelNames sets the names of all the label levels in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{[]float64{1, 2}}).
		SetLabelNames([]string{"baz"})
	fmt.Println(df)
}
Output:

+-----++---+
| baz || 0 |
|-----||---|
|   0 || 1 |
|   1 || 2 |
+-----++---+
Example (Multiple)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{[]float64{1, 2}},
		[]int{0, 1}, []string{"foo", "bar"},
	).
		SetColNames([]string{"A"}).
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(df)
}
Output:

+-----+-----++---+
| baz | qux || A |
|-----|-----||---|
|   0 | foo || 1 |
|   1 | bar || 2 |
+-----+-----++---+

func (*DataFrame) SetName

func (df *DataFrame) SetName(name string) *DataFrame

SetName sets the name of a DataFrame and returns the entire DataFrame.

func (*DataFrame) SetNulls

func (df *DataFrame) SetNulls(n int, nulls []bool) error

SetNulls overwrites the underlying boolean slice that records whether each value is null or not for the container at position n (either labels or columns).

func (*DataFrame) SetRows

func (df *DataFrame) SetRows(lambda ApplyFn, container string, rows []int) *DataFrame

SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new DataFrame.

func (*DataFrame) Shuffle

func (df *DataFrame) Shuffle(seed int64) *DataFrame

Shuffle randomizes the row order of the DataFrame. Returns a new DataFrame.

func (*DataFrame) Sort

func (df *DataFrame) Sort(by ...Sorter) *DataFrame

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new DataFrame.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{2, 2, 1}, []string{"b", "c", "a"}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)

	// first sort by foo in ascending order, then sort by bar in descending order
	ret := df.Sort(
		// Float64 is the default sorting DType, and ascending is the default ordering
		tada.Sorter{Name: "foo"},
		tada.Sorter{Name: "bar", DType: tada.String, Descending: true},
	)
	fmt.Println(ret)
}
Output:

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 0 ||   2 |   b |
| 1 ||     |   c |
| 2 ||   1 |   a |
+---++-----+-----+

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 2 ||   1 |   a |
| 1 ||   2 |   c |
| 0 ||     |   b |
+---++-----+-----+

func (*DataFrame) StdDev

func (df *DataFrame) StdDev() *Series

StdDev coerces the values in each column to float64 and calculates the standard deviation of each column.

func (*DataFrame) String

func (df *DataFrame) String() string

String prints the DataFrame in table form, with the number of rows constrained by optionMaxRows, and the number of columns constrained by optionMaxColumns, which may be configured with PrintOptionMaxRows(n) and PrintOptionMaxColumns(n), respectively. By default, repeated values are merged together, but this behavior may be disabled with PrintOptionAutoMerge(false). By default, overly-wide non-header cells are truncated, but this behavior may be changed to wrapping with PrintOptionWrapLines(true).

func (*DataFrame) Struct

func (df *DataFrame) Struct(structPointer interface{}, options ...WriteOption) error

Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{
			[]float64{1, 2},
		},
		[]string{"baz", "qux"},
	).SetLabelNames([]string{"foo"}).
		SetColNames([]string{"bar"})
	type output struct {
		Foo []string  `tada:"foo"`
		Bar []float64 `tada:"bar"`
	}
	var out output
	df.Struct(&out)
	fmt.Printf("%#v", out)
}
Output:

tada_test.output{Foo:[]string{"baz", "qux"}, Bar:[]float64{1, 2}}
Example (WithNulls)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{
			[]float64{1, 2},
		},
		[]string{"", "qux"},
	).SetLabelNames([]string{"foo"}).
		SetColNames([]string{"bar"})
	type output struct {
		Foo   []string  `tada:"foo"`
		Bar   []float64 `tada:"bar"`
		Nulls [][]bool  `tada:"isNull"`
	}
	var out output
	df.Struct(&out)
	fmt.Printf("%#v", out)
}
Output:

tada_test.output{Foo:[]string{"", "qux"}, Bar:[]float64{1, 2}, Nulls:[][]bool{[]bool{true, false}, []bool{false, false}}}

func (*DataFrame) Subset

func (df *DataFrame) Subset(index []int) *DataFrame

Subset returns only the rows specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) SubsetCols

func (df *DataFrame) SubsetCols(index []int) *DataFrame

SubsetCols returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) SubsetLabels

func (df *DataFrame) SubsetLabels(index []int) *DataFrame

SubsetLabels returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) Sum

func (df *DataFrame) Sum() *Series

Sum coerces the values in each column to float64 and sums each column.

func (*DataFrame) SumCols

func (df *DataFrame) SumCols(name string, colNames ...string) (*Series, error)

SumCols finds each column matching a supplied colName, coerces its values to float64, and adds them row-wise. The resulting Series is named name. If any column has a null value for a given row, that row is considered null.

func (*DataFrame) SwapLabels

func (df *DataFrame) SwapLabels(i, j string) *DataFrame

SwapLabels swaps the label levels with names i and j. Returns a new DataFrame.

func (*DataFrame) Tail

func (df *DataFrame) Tail(n int) *DataFrame

Tail returns the last n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.

func (*DataFrame) Transpose

func (df *DataFrame) Transpose() *DataFrame

Transpose transposes rows into columns. Row values become column values, column names become labels, labels become column names (and multi-level labels become multi-level columns) and label level names swap with column level names. For example a DataFrame with 2 rows and 1 column has 2 columns and 1 row after transposition. Because rows can contain heterogenous types, every column is coerced to []interface{}.

func (*DataFrame) Where

func (df *DataFrame) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)

Where iterates over the rows in df and evaluates whether each one satisfies filters, which is a map of container names (either column or label names) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.

Returns an unnamed Series with a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]int{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	gt1 := func(val interface{}) bool { return val.(int) > 1 }
	ret, _ := df.Where(map[string]tada.FilterFn{"foo": gt1}, true, false)
	fmt.Println(ret)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-------+
| - ||       |
|---||-------|
| 0 || false |
| 1 ||  true |
+---++-------+

func (*DataFrame) WithCol

func (df *DataFrame) WithCol(name string, input interface{}) *DataFrame

WithCol resolves as follows:

If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.

If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, returns a new DataFrame.

Example (Append)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("bar", []bool{false, true})
	fmt.Println(ret)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+-------+
| - || foo |  bar  |
|---||-----|-------|
| 0 ||   1 | false |
| 1 ||   2 |  true |
+---++-----+-------+
Example (Overwrite)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("foo", []string{"baz", "qux"})
	fmt.Println(ret)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 || baz |
| 1 || qux |
+---++-----+
Example (Rename)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("foo", "qux")
	fmt.Println(ret)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || qux |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

func (*DataFrame) WithLabels

func (df *DataFrame) WithLabels(name string, input interface{}) *DataFrame

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, returns a new DataFrame.

func (*DataFrame) WriteCSV

func (df *DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV converts a DataFrame to a csv with rows as the major dimension, and writes the output to w. Null values are replaced with "(null)".

type DataFrameIterator

type DataFrameIterator struct {
	// contains filtered or unexported fields
}

A DataFrameIterator iterates over the rows in a DataFrame.

func (*DataFrameIterator) Next

func (iter *DataFrameIterator) Next() bool

Next advances to next row. Returns false at end of iteration.

func (*DataFrameIterator) Row

func (iter *DataFrameIterator) Row() map[string]Element

Row returns the current row in the DataFrame as a map. The map keys are the names of containers (including label levels). The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Element of the left-most column are returned.

type DataFrameMutator

type DataFrameMutator struct {
	// contains filtered or unexported fields
}

A DataFrameMutator is used to change DataFrame values in place.

func (*DataFrameMutator) Append

func (df *DataFrameMutator) Append(other *DataFrame) error

Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Apply

func (df *DataFrameMutator) Apply(lambdas map[string]ApplyFn) error

Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be changed in-place within the anonymous function. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DeduplicateNames

func (df *DataFrameMutator) DeduplicateNames()

DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropCol

func (df *DataFrameMutator) DropCol(name string) error

DropCol drops the first column matching name. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropLabels

func (df *DataFrameMutator) DropLabels(name string) error

DropLabels drops the first label level matching name. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropNull

func (df *DataFrameMutator) DropNull(subset ...string) error

DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Modifies the underlying DataFrame.

func (*DataFrameMutator) DropRow

func (df *DataFrameMutator) DropRow(index int) error

DropRow removes the row at the specified index. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FillNull

func (df *DataFrameMutator) FillNull(how map[string]NullFiller) error

FillNull fills null values and makes them non-null based on how. How is a map of container names (either column or label names) and NullFillers. For each container name supplied, the first field selected (i.e., not left blank) in the NullFiller is the strategy used to replace null values. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Modifies the underlying DataFrame.

func (*DataFrameMutator) Filter

func (df *DataFrameMutator) Filter(filters map[string]FilterFn) error

Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FilterByValue

func (df *DataFrameMutator) FilterByValue(filters map[string]interface{}) error

FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FilterCols

func (df *DataFrameMutator) FilterCols(lambda func(string) bool, level int) error

FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.

func (*DataFrameMutator) IsNull

func (df *DataFrameMutator) IsNull(subset ...string) error

IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Modifies the underlying DataFrame.

func (*DataFrameMutator) Range

func (df *DataFrameMutator) Range(first, last int) error

Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If first or last is out of range, an error is returned. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Relabel

func (df *DataFrameMutator) Relabel()

Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying DataFrame in place.

func (*DataFrameMutator) ReorderCols

func (df *DataFrameMutator) ReorderCols(colNames []string) error

ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) ReorderLabels

func (df *DataFrameMutator) ReorderLabels(levelNames []string) error

ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) Resample

func (df *DataFrameMutator) Resample(how map[string]Resampler) error

Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Modifies the underlying DataFrame in place.

func (*DataFrameMutator) ResetLabels

func (df *DataFrameMutator) ResetLabels(labelLevels ...string) error

ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level(s). If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SetAsLabels

func (df *DataFrameMutator) SetAsLabels(colNames ...string)

SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SetRows

func (df *DataFrameMutator) SetRows(lambda ApplyFn, container string, rows []int) error

SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying DataFrame.

func (*DataFrameMutator) Shuffle

func (df *DataFrameMutator) Shuffle(seed int64)

Shuffle randomizes the row order of the DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) Sort

func (df *DataFrameMutator) Sort(by ...Sorter) error

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Subset

func (df *DataFrameMutator) Subset(index []int) error

Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SubsetCols

func (df *DataFrameMutator) SubsetCols(index []int) error

SubsetCols returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SubsetLabels

func (df *DataFrameMutator) SubsetLabels(index []int) error

SubsetLabels returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SwapLabels

func (df *DataFrameMutator) SwapLabels(i, j string) error

SwapLabels swaps the label levels with names i and j. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) WithCol

func (df *DataFrameMutator) WithCol(name string, input interface{}) error

WithCol resolves as follows:

If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.

If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, modifies the underlying DataFrame in place.

Example (Rename)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	df.InPlace().WithCol("foo", "qux")
	fmt.Println(df)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || qux |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

func (*DataFrameMutator) WithLabels

func (df *DataFrameMutator) WithLabels(name string, input interface{}) error

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, modifies the underlying DataFrame in place.

type Element

type Element struct {
	Val    interface{}
	IsNull bool
}

An Element is one {value, null status} pair in either a Series or DataFrame.

type FilterFn

type FilterFn func(value interface{}) bool

A FilterFn is an anonymous function supplied to a Filter or Where function. The function will be called on every val in the container.

type GroupedDataFrame

type GroupedDataFrame struct {
	// contains filtered or unexported fields
}

A GroupedDataFrame is a collection of row positions sharing the same group key. A GroupedDataFrame has a reference to an underlying DataFrame, which is used for reduce operations.

func (*GroupedDataFrame) Apply

func (g *GroupedDataFrame) Apply(cols []string, lambda ApplyFn) *GroupedDataFrame

Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null) from a single column. Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.

func (*GroupedDataFrame) Col

func (g *GroupedDataFrame) Col(colName string) *GroupedSeries

Col isolates the Series at containerName, which may be either a label level or column in the underlying DataFrame. Returns a GroupedSeries with the same groups and labels as in the GroupedDataFrame.

func (*GroupedDataFrame) Count

func (g *GroupedDataFrame) Count(colNames ...string) *DataFrame

Count returns the number of non-null values in each group for the columns in colNames.

func (*GroupedDataFrame) DataFrame

func (g *GroupedDataFrame) DataFrame() *DataFrame

DataFrame returns the GroupedDataFrame as a DataFrame, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name. Columns used as label levels are dropped.

func (*GroupedDataFrame) Earliest

func (g *GroupedDataFrame) Earliest(colNames ...string) *DataFrame

Earliest coerces the column values in colNames to time.Time and calculates the earliest timestamp of each group.

func (*GroupedDataFrame) Err

func (g *GroupedDataFrame) Err() error

Err returns the underlying error, if any

func (*GroupedDataFrame) First

func (g *GroupedDataFrame) First(colNames ...string) *DataFrame

First returns the first row within each group for the columns in colNames.

func (*GroupedDataFrame) GetGroup

func (g *GroupedDataFrame) GetGroup(group string) *DataFrame

GetGroup returns the grouped rows sharing the same group key as a new DataFrame.

func (*GroupedDataFrame) GetLabels

func (g *GroupedDataFrame) GetLabels() []interface{}

GetLabels returns the grouped label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*GroupedDataFrame) HavingCount

func (g *GroupedDataFrame) HavingCount(lambda func(int) bool) *GroupedDataFrame

HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).

func (*GroupedDataFrame) Iterator

Iterator returns an iterator which may be used to access each group of rows as a new DataFrame, in the order in which the groups originally appeared.

func (*GroupedDataFrame) Last

func (g *GroupedDataFrame) Last(colNames ...string) *DataFrame

Last returns the last row within each group for the columns in colNames.

func (*GroupedDataFrame) Latest

func (g *GroupedDataFrame) Latest(colNames ...string) *DataFrame

Latest coerces the column values in colNames to time.Time and calculates the latest timestamp of each group.

func (*GroupedDataFrame) Len

func (g *GroupedDataFrame) Len() int

Len returns the number of group labels.

func (*GroupedDataFrame) ListGroups

func (g *GroupedDataFrame) ListGroups() []string

ListGroups returns a list of group keys in the order in which they originally appeared.

func (*GroupedDataFrame) Max

func (g *GroupedDataFrame) Max(colNames ...string) *DataFrame

Max coerces the column values in colNames to float64 and calculates the maximum of each group.

func (*GroupedDataFrame) Mean

func (g *GroupedDataFrame) Mean(colNames ...string) *DataFrame

Mean coerces the column values in colNames to float64 and calculates the mean of each group.

func (*GroupedDataFrame) Median

func (g *GroupedDataFrame) Median(colNames ...string) *DataFrame

Median coerces the column values in colNames to float64 and calculates the median of each group.

func (*GroupedDataFrame) Min

func (g *GroupedDataFrame) Min(colNames ...string) *DataFrame

Min coerces the column values in colNames to float64 and calculates the minimum of each group.

func (*GroupedDataFrame) NUnique

func (g *GroupedDataFrame) NUnique(colNames ...string) *DataFrame

NUnique returns the number of unique, non-null values in each group for the columns in colNames.

func (*GroupedDataFrame) Nth

func (g *GroupedDataFrame) Nth(index int, colNames ...string) *DataFrame

Nth returns the row at position n (if it exists) within each group for the columns in colNames.

func (*GroupedDataFrame) Reduce

func (g *GroupedDataFrame) Reduce(name string, cols []string, lambda ReduceFn) *DataFrame

Reduce iterates over the groups in the GroupedDataFrame and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new DataFrame named "name_originalDataFrameName" with columns named "name_originalColumnName" where each reduced group is represented by a single row.

The columns in the new DataFrame will be slices of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values in each column, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into new []float64 columns and return in a new DataFrame.

func (*GroupedDataFrame) StdDev

func (g *GroupedDataFrame) StdDev(colNames ...string) *DataFrame

StdDev coerces the column values in colNames to float64 and calculates the standard deviation of each group.

func (*GroupedDataFrame) String

func (g *GroupedDataFrame) String() string

func (*GroupedDataFrame) Sum

func (g *GroupedDataFrame) Sum(colNames ...string) *DataFrame

Sum coerces the column values in colNames to float64 and calculates the sum of each group.

type GroupedDataFrameIterator

type GroupedDataFrameIterator struct {
	// contains filtered or unexported fields
}

GroupedDataFrameIterator iterates over all DataFrames in the group.

func (*GroupedDataFrameIterator) DataFrame

func (g *GroupedDataFrameIterator) DataFrame() *DataFrame

DataFrame returns the current grouped DataFrame.

func (*GroupedDataFrameIterator) Next

func (g *GroupedDataFrameIterator) Next() bool

Next advances to next grouped DataFrame. Returns false at end of iteration.

type GroupedSeries

type GroupedSeries struct {
	// contains filtered or unexported fields
}

A GroupedSeries is a collection of row positions sharing the same group key. A GroupedSeries has a reference to an underlying Series, which is used for reduce operations.

func (*GroupedSeries) Align

func (g *GroupedSeries) Align() *GroupedSeries

Align changes subsequent reduce operations for this group to return a Series aligned with the original Series labels (the default behavior is to return a Series with one label per group). If the original Series is:

FOO baz 0 baz 1 bar 2 bar 4

and it is grouped by the "foo" label, then the default g.Sum() reducer would return:

FOO baz 1 bar 6

After g.Align(), the g.Sum() reducer would return:

FOO baz 1 baz 1 bar 6 bar 6

Example (Mean)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	// here, s.GroupBy("baz") is equivalent to s.GroupBy()
	g := s.GroupBy("baz")
	fmt.Println(g.Align().Mean())

}
Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   0 ||   3 |
|   1 ||   4 |
+-----++-----+

+-----++----------+
| baz || mean_foo |
|-----||----------|
|   0 ||        2 |
|   1 ||        3 |
|   0 ||        2 |
|   1 ||        3 |
+-----++----------+

func (*GroupedSeries) Apply

func (g *GroupedSeries) Apply(lambda ApplyFn) *GroupedSeries

Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null). Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}).
		SetName("foobar").
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	// if group has at least 3 items, multiply by 2. otherwise set as null.
	modifyBigGroup := func(slice interface{}, isNull []bool) interface{} {
		vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error
		ret := make([]float64, len(vals))
		if len(vals) >= 3 {
			for i := range ret {
				ret[i] = vals[i] * 2
			}
		} else {
			for i := range ret {
				isNull[i] = true
			}
		}
		return ret
	}
	fmt.Println(g.Apply(modifyBigGroup).Series())

}
Output:

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      1 |
|     |   1 ||      2 |
| foo |   2 ||      3 |
| bar |   3 ||      4 |
+-----+-----++--------+

+-----++--------+
| baz || foobar |
|-----||--------|
| bar ||      2 |
|     ||      4 |
|     ||      8 |
| foo || (null) |
+-----++--------+
Example (Align)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}).
		SetName("foobar").
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	// if group has at least 3 items, multiply by 2. otherwise set as null.
	modifyBigGroup := func(slice interface{}, isNull []bool) interface{} {
		vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error
		ret := make([]float64, len(vals))
		if len(vals) >= 3 {
			for i := range ret {
				ret[i] = vals[i] * 2
			}
		} else {
			for i := range ret {
				isNull[i] = true
			}
		}
		return ret
	}
	g.Align()
	fmt.Println(g.Apply(modifyBigGroup).Series())

}
Output:

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      1 |
|     |   1 ||      2 |
| foo |   2 ||      3 |
| bar |   3 ||      4 |
+-----+-----++--------+

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      2 |
|     |   1 ||      4 |
| foo |   2 || (null) |
| bar |   3 ||      8 |
+-----+-----++--------+

func (*GroupedSeries) Count

func (g *GroupedSeries) Count() *Series

Count returns the number of non-null values in each group.

func (*GroupedSeries) Earliest

func (g *GroupedSeries) Earliest() *Series

Earliest coerces the Series values to time.Time and calculates the earliest timestamp in each group.

func (*GroupedSeries) Err

func (g *GroupedSeries) Err() error

Err returns the underlying error, if any.

func (*GroupedSeries) First

func (g *GroupedSeries) First() *Series

First returns the first row in each group.

func (*GroupedSeries) GetGroup

func (g *GroupedSeries) GetGroup(group string) *Series

GetGroup returns the grouped rows sharing the same group key as a new Series.

func (*GroupedSeries) GetLabels

func (g *GroupedSeries) GetLabels() []interface{}

GetLabels returns the grouped label levels as interface{} slices within an []interface returns the group's labels as slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*GroupedSeries) HavingCount

func (g *GroupedSeries) HavingCount(lambda func(int) bool) *GroupedSeries

HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).

Example (Sum)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 1, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	countOf3 := func(n int) bool { return n == 3 }
	g := s.GroupBy("baz")
	fmt.Println(g.HavingCount(countOf3).Sum())

}
Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|     ||   3 |
|     ||   4 |
+-----++-----+

+-----++---------+
| baz || sum_foo |
|-----||---------|
|   1 ||       9 |
+-----++---------+

func (*GroupedSeries) Iterator

func (g *GroupedSeries) Iterator() *GroupedSeriesIterator

Iterator returns an iterator which may be used to access each group of rows as a new Series, in the order in which the groups originally appeared.

func (*GroupedSeries) Last

func (g *GroupedSeries) Last() *Series

Last returns the last row in each group.

func (*GroupedSeries) Latest

func (g *GroupedSeries) Latest() *Series

Latest coerces the Series values to time.Time and calculates the latest timestamp in each group.

func (*GroupedSeries) Len

func (g *GroupedSeries) Len() int

Len returns the number of group labels.

func (*GroupedSeries) ListGroups

func (g *GroupedSeries) ListGroups() []string

ListGroups returns a list of group keys in the order in which they originally appeared.

func (*GroupedSeries) Max

func (g *GroupedSeries) Max() *Series

Max coerces values to float64 and calculates the maximum of each group.

func (*GroupedSeries) Mean

func (g *GroupedSeries) Mean() *Series

Mean coerces values to float64 and calculates the mean of each group.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	// here, s.GroupBy("baz") is equivalent to s.GroupBy()
	g := s.GroupBy("baz")
	fmt.Println(g.Mean())

}
Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   0 ||   3 |
|   1 ||   4 |
+-----++-----+

+-----++----------+
| baz || mean_foo |
|-----||----------|
|   0 ||        2 |
|   1 ||        3 |
+-----++----------+

func (*GroupedSeries) Median

func (g *GroupedSeries) Median() *Series

Median coerces values to float64 and calculates the median of each group.

func (*GroupedSeries) Min

func (g *GroupedSeries) Min() *Series

Min coerces values to float64 and calculates the minimum of each group.

func (*GroupedSeries) NUnique

func (g *GroupedSeries) NUnique() *Series

NUnique returns the number of unique values in each group.

func (*GroupedSeries) Nth

func (g *GroupedSeries) Nth(n int) *Series

Nth returns the row at position n (if it exists) within each group.

func (*GroupedSeries) Reduce

func (g *GroupedSeries) Reduce(name string, lambda ReduceFn) *Series

Reduce iterates over the groups in the GroupedSeries and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new Series named "name_originalColName" where each reduced group is represented by a single row.

The new Series will be a slice of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into a new []float64 and return in a new Series.

Example
package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4, 5, 6}, []int{0, 0, 0, 1, 1, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	maxOdd := func(slice interface{}, isNull []bool) (value interface{}, null bool) {
		vals := slice.([]float64)
		max := math.Inf(-1)
		for i := range vals {
			if !isNull[i] && int(vals[i])%2 == 1 && vals[i] > max {
				max = vals[i]
			}
		}
		return max, false
	}
	fmt.Println(g.Reduce("max_odd", maxOdd))

}
Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|     ||   2 |
|     ||   3 |
|   1 ||   4 |
|     ||   5 |
|     ||   6 |
+-----++-----+

+-----++-------------+
| baz || max_odd_foo |
|-----||-------------|
|   0 ||           3 |
|   1 ||           5 |
+-----++-------------+

func (*GroupedSeries) Series

func (g *GroupedSeries) Series() *Series

Series returns the GroupedSeries as a Series, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name.

func (*GroupedSeries) StdDev

func (g *GroupedSeries) StdDev() *Series

StdDev coerces values to float64 and calculates the standard deviation of each group.

func (*GroupedSeries) String

func (g *GroupedSeries) String() string

func (*GroupedSeries) Sum

func (g *GroupedSeries) Sum() *Series

Sum coerces values to float64 and calculates the sum of each group.

type GroupedSeriesIterator

type GroupedSeriesIterator struct {
	// contains filtered or unexported fields
}

GroupedSeriesIterator iterates over all Series in the group.

func (*GroupedSeriesIterator) Next

func (g *GroupedSeriesIterator) Next() bool

Next advances to next grouped Series. Returns false at end of iteration.

func (*GroupedSeriesIterator) Series

func (g *GroupedSeriesIterator) Series() *Series

Series returns the current grouped Series.

type JoinOption

type JoinOption func(*joinConfig)

A JoinOption configures a lookup or merge function. Available lookup options: JoinOptionHow, JoinOptionLeftOn, JoinOptionRightOn

type Matrix

type Matrix interface {
	Dims() (r, c int)
	At(i, j int) float64
}

Matrix is an interface which is compatible with gonum's mat.Matrix interface

type NullFiller

type NullFiller struct {
	FillForward  bool
	FillBackward bool
	FillZero     bool
	FillFloat    float64
}

NullFiller fills every row with a null value and changes the row status to not-null. If multiple fields are provided, resolves in the following order: 1) `FillForward` - fills with the last valid value, 2) `FillBackward` - fills with the next valid value, 3) `FillZero` - fills with the zero type of the slice, 4) `FillFloat` - coerces to float64 and fills with the value provided.

type ReadOption

type ReadOption func(*readConfig)

A ReadOption configures a read function. Available read options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter, and ReadOptionSwitchDims.

type ReduceFn

type ReduceFn func(slice interface{}, isNull []bool) (value interface{}, null bool)

A ReduceFn is an anonymous function supplied to a Reduce function to reduce a slice of values to one value and one null status per group. isNull contains the null status of every value in the group.

type Resampler

type Resampler struct {
	ByYear      bool
	ByMonth     bool
	ByDay       bool
	ByWeek      bool
	StartOfWeek time.Weekday
	ByDuration  time.Duration
	Location    *time.Location
}

Resampler supplies logic for the Resample() function. Only the first `By` field that is selected (i.e., not left nil) is used - any others are ignored (if `ByWeek` is selected, it may be modified by `StartOfWeek`). `ByYear` truncates the timestamp by year. `ByMonth` truncates the timestamp by month. `ByDay` truncates the timestamp by day. `ByWeek` returns the first day of the most recent week (starting on `StartOfWeek`) relative to timestamp. Otherwise, truncates the timestamp `ByDuration`. If `Location` is not provided, time.UTC is used as the default location.

type Series

type Series struct {
	// contains filtered or unexported fields
}

A Series is a single column of data with one or more levels of aligned labels.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}).SetName("foo")
	fmt.Println(s)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+
Example (NestedSlice)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([][]string{{"foo", "bar"}, {"baz"}, {}}).
		SetName("a")
	fmt.Println(s)
}
Output:

+---++-----------+
| - ||     a     |
|---||-----------|
| 0 || [foo bar] |
| 1 ||     [baz] |
| 2 ||    (null) |
+---++-----------+
Example (SetNaNStatus)
package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{0, math.NaN()})
	fmt.Println("isNull:", s.GetNulls())

	tada.SetOptionNaNStatus(false)
	s = tada.NewSeries([]float64{0, math.NaN()})
	fmt.Println("isNull:", s.GetNulls())

	tada.SetOptionNaNStatus(true)
}
Output:

isNull: [false true]
isNull: [false false]
Example (SetSentinelNulls)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]string{"foo", "", "(null)"})
	fmt.Println("default sentinel null values\n isNull:", s.GetNulls())

	tada.SetOptionNullStrings(nil)
	s = tada.NewSeries([]string{"foo", "", "(null)"})
	fmt.Println("remove defaults\n isNull:", s.GetNulls())

	tada.SetOptionNullStrings(tada.GetOptionDefaultNullStrings())
}
Output:

default sentinel null values
 isNull: [false true true]
remove defaults
 isNull: [false false false]
Example (Zscore)
package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4, 5}).SetName("foo")
	fmt.Println(s)

	vals := s.GetValuesAsFloat64()
	ret := make([]float64, s.Len())
	mean := s.Mean()
	std := s.StdDev()
	for i := range vals {
		val := (vals[i] - mean) / std
		ret[i] = math.Round((val * 100)) / 100 // round to 2 decimal points
	}
	df := s.DataFrame().WithCol("zscore_foo", ret)
	fmt.Println(df)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
| 4 ||   5 |
+---++-----+

+---++-----+------------+
| - || foo | zscore_foo |
|---||-----|------------|
| 0 ||   1 |      -1.41 |
| 1 ||   2 |      -0.71 |
| 2 ||   3 |          0 |
| 3 ||   4 |       0.71 |
| 4 ||   5 |       1.41 |
+---++-----+------------+

func NewSeries

func NewSeries(slice interface{}, labels ...interface{}) *Series

NewSeries constructs a Series from a slice of values and optional label slices. // Slice and all labels must be supported slices.

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Series values are named 0 by default. The default values name is displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.

Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).

func (*Series) Add

func (s *Series) Add(other *Series, ignoreNulls bool) *Series

Add coerces other and s to float64 values, aligns other with s, and adds the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) Append

func (s *Series) Append(other *Series) *Series

Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.

func (*Series) Apply

func (s *Series) Apply(lambda ApplyFn) *Series

Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new Series.

Example (Float64)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3}).SetName("foo")
	fmt.Println(s)

	times2 := func(slice interface{}, isNull []bool) interface{} {
		vals := slice.([]float64)
		ret := make([]float64, len(vals))
		for i := range ret {
			ret[i] = vals[i] * 2
		}
		return ret
	}
	fmt.Println(s.Apply(times2))

}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 ||   2 |
| 1 ||   4 |
| 2 ||   6 |
+---++-----+

func (*Series) At

func (s *Series) At(index int) *Element

At returns the Element at the index position. If index is out of range, returns nil.

func (*Series) Bin

func (s *Series) Bin(bins []float64, config *Binner) (*Series, error)

Bin coerces the Series values to float64 and categorizes each row based on which bin interval it falls within. bins should be a slice of sequential edges that form intervals (left exclusive, right inclusive). For example, [1, 3, 5] represents the intervals 1-3 (excluding 1, including 3), and 3-5 (excluding 3, including 5). If these bins were supplied for a Series with values [3, 4], the returned Series would have values ["1-3", "3-5"]. Null values are not categorized. For default behavior, supply nil as config.

To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, nil)
	fmt.Println(binned)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
| 2 ||   5 |
+---++-----+

+---++--------+
| - ||  foo   |
|---||--------|
| 0 ||    0-2 |
| 1 ||    2-4 |
| 2 || (null) |
+---++--------+
Example (AndMore)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{AndMore: true})
	fmt.Println(binned)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
| 2 ||   5 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 || 0-2 |
| 1 || 2-4 |
| 2 ||  >4 |
+---++-----+
Example (CustomLabels)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{Labels: []string{"low", "high"}})
	fmt.Println(binned)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
+---++-----+

+---++------+
| - || foo  |
|---||------|
| 0 ||  low |
| 1 || high |
+---++------+

func (*Series) CSV

func (s *Series) CSV(options ...WriteOption) ([][]string, error)

CSV converts a Series to a DataFrame and returns as [][]string.

func (*Series) Cast

func (s *Series) Cast(containerAsType map[string]DType)

Cast casts the underlying container values (either label levels or Series values) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time. To apply to Series values, supply empty string name ("") or the Series name. Use cast to improve performance when calling multiple operations on values.

Example (Date)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	s.Cast(map[string]tada.DType{"foo": tada.Date})
	fmt.Println(s)
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
+---++----------------------+

+---++------------+
| - ||    foo     |
|---||------------|
| 0 || 2020-01-15 |
+---++------------+
Example (Time)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	s.Cast(map[string]tada.DType{"foo": tada.Time})
	fmt.Println(s)
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
+---++----------------------+

+---++----------+
| - ||   foo    |
|---||----------|
| 0 || 12:15:00 |
+---++----------+

func (*Series) Copy

func (s *Series) Copy() *Series

Copy returns a deep copy of a Series with no shared references to the original.

func (*Series) Count

func (s *Series) Count() int

Count counts the number of non-null Series values.

func (*Series) CumSum

func (s *Series) CumSum() *Series

CumSum coerces the Series values to float64 and returns the cumulative sum at each row position.

func (*Series) DataFrame

func (s *Series) DataFrame() *DataFrame

DataFrame converts a Series to a 1-column DataFrame.

func (*Series) Divide

func (s *Series) Divide(other *Series, ignoreNulls bool) *Series

Divide coerces other and s to float64 values, aligns other with s, and divides the aligned values of s by s, using the labels in s as an anchor. Dividing by 0 always returns a null value. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) DropLabels

func (s *Series) DropLabels(name string) *Series

DropLabels removes the first label level matching name. Returns a new Series.

func (*Series) DropNull

func (s *Series) DropNull() *Series

DropNull returns all the rows with non-null values. Returns a new Series.

func (*Series) DropRow

func (s *Series) DropRow(index int) *Series

DropRow removes the row at the specified index. Returns a new Series.

func (*Series) Earliest

func (s *Series) Earliest() time.Time

Earliest coerces the Series values to time.Time and calculates the earliest timestamp.

func (*Series) EqualsCSV

func (s *Series) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)

EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both s and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.

If includeLabels is true, then s's labels are included as columns.

func (*Series) Err

func (s *Series) Err() error

Err returns the most recent error attached to the Series, if any.

func (*Series) FillNull

func (s *Series) FillNull(how NullFiller) *Series

FillNull fills all the null values and makes them not-null. Returns a new Series.

func (*Series) Filter

func (s *Series) Filter(filters map[string]FilterFn) *Series

Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new Series.

func (*Series) FilterByValue

func (s *Series) FilterByValue(filters map[string]interface{}) *Series

FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Returns a new Series.

func (*Series) FilterIndex

func (s *Series) FilterIndex(container string, filterFn FilterFn) []int

FilterIndex returns the index positions of the rows in container (either the Series name or label name) that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil. FilterIndex may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

func (*Series) GetLabels

func (s *Series) GetLabels() []interface{}

GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*Series) GetNulls

func (s *Series) GetNulls() []bool

GetNulls returns whether each value is null or not.

func (*Series) GetValues

func (s *Series) GetValues() interface{}

GetValues returns a copy of the underlying Series data as an interface.

func (*Series) GetValuesAsFloat64

func (s *Series) GetValuesAsFloat64() []float64

GetValuesAsFloat64 coerces the Series values into []float64.

func (*Series) GetValuesAsString

func (s *Series) GetValuesAsString() []string

GetValuesAsString coerces the Series values into []string.

func (*Series) GetValuesAsTime

func (s *Series) GetValuesAsTime() []time.Time

GetValuesAsTime coerces the Series values into []time.Time.

func (*Series) GroupBy

func (s *Series) GroupBy(names ...string) *GroupedSeries

GroupBy groups the Series rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedSeries.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "bar", "foo", "bar"})
	g := s.GroupBy()
	fmt.Println(g)
}
Output:

	+-----++---+
|  -  || 0 |
|-----||---|
| foo || 1 |
|     || 3 |
| bar || 2 |
|     || 4 |
+-----++---+
Example (CompoundGroup)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "baz", "foo", "baz"}, []string{"bar", "qux", "bar", "qux"})
	g := s.GroupBy()
	fmt.Println(g)
	// +-----+-----++---+
	// |  -  |  -  || 0 |
	// |-----|-----||---|
	// | foo | bar || 1 |
	// |     |     || 3 |
	// | baz | qux || 2 |
	// |     |     || 4 |
	// +-----+-----++---+
}
Output:

func (*Series) HasLabels

func (s *Series) HasLabels(labelNames ...string) error

HasLabels returns an error if the Series does not contain all of the labelNames supplied.

func (*Series) Head

func (s *Series) Head(n int) *Series

Head returns the first n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.

func (*Series) InPlace

func (s *Series) InPlace() *SeriesMutator

InPlace returns a SeriesMutator, which contains most of the same methods as Series but never returns a new Series. If you want to save memory and improve performance and do not need to preserve the original Series, consider using InPlace().

func (*Series) IndexOfLabel

func (s *Series) IndexOfLabel(name string) int

IndexOfLabel returns the index position of the first label level with a name matching name (case-sensitive). If name does not match any container, -1 is returned.

func (*Series) IsNull

func (s *Series) IsNull() *Series

IsNull returns all the rows with null values. Returns a new Series.

func (*Series) Iterator

func (s *Series) Iterator() *SeriesIterator

Iterator returns an iterator which may be used to access the values in each row as map[string]Element.

func (*Series) LabelsAsSeries

func (s *Series) LabelsAsSeries(name string) *Series

LabelsAsSeries finds the first level with matching name and returns as a Series with all existing label levels (including itself). If label level name is default (prefixed with *), removes the prefix. Returns a new Series with shared labels.

func (*Series) Latest

func (s *Series) Latest() time.Time

Latest coerces the Series values to time.Time and calculates the latest timestamp.

func (*Series) Len

func (s *Series) Len() int

Len returns the number of rows in the Series.

func (*Series) ListLabelNames

func (s *Series) ListLabelNames() []string

ListLabelNames returns the name and position of all the label levels in the Series

func (*Series) Lookup

func (s *Series) Lookup(other *Series, options ...JoinOption) (*Series, error)

Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Lookup identifies the row alignment between s and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:

s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:

FOO BAR bar null baz corge

Returns a new Series.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo").SetLabelNames([]string{"a"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetLabelNames([]string{"a"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	lookup, _ := s.Lookup(s2)
	fmt.Println(lookup)
}
Output:

--original Series--
+---++-----+
| a || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

--Series to lookup--
+----++---+
| a  || 0 |
|----||---|
|  0 || 4 |
| 10 || 5 |
+----++---+

--result--
+---++--------+
| a ||  foo   |
|---||--------|
| 0 ||      4 |
| 1 || (null) |
+---++--------+
Example (WithOptions)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	lookup, _ := s.Lookup(
		s2,
		tada.JoinOptionHow("inner"),
		tada.JoinOptionLeftOn([]string{"a"}),
		tada.JoinOptionRightOn([]string{"b"}),
	)
	fmt.Println(lookup)
}
Output:

--original Series--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| foo | 0 || 1 |
| bar | 1 || 2 |
+-----+---++---+

--Series to lookup--
+----+-----++---+
| a  |  b  || 0 |
|----|-----||---|
|  0 | baz || 4 |
| 10 | bar || 5 |
+----+-----++---+

--result--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| bar | 1 || 5 |
+-----+---++---+

func (*Series) Max

func (s *Series) Max() float64

Max coerces the Series values to float64 and calculates the maximum.

func (*Series) Mean

func (s *Series) Mean() float64

Mean coerces the Series values to float64 and calculates the mean.

func (*Series) Median

func (s *Series) Median() float64

Median coerces the Series values to float64 and calculates the median.

func (*Series) Merge

func (s *Series) Merge(other *Series, options ...JoinOption) (*DataFrame, error)

Merge joins other onto s. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Merge identifies the row alignment between s and other and appends aligned values as new columns on s. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:

s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:

s FOO BAR QUX bar 0 null baz 1 corge

Finally, all container names (either the Series name or label name) are deduplicated after the merge so that they are unique. Returns a new DataFrame.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo")
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetName("bar")
	fmt.Println("--Series to merge--")
	fmt.Println(s2)

	fmt.Println("--result--")
	merged, _ := s.Merge(s2)
	fmt.Println(merged)
}
Output:

--original Series--
+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

--Series to merge--
+----++-----+
| -  || bar |
|----||-----|
|  0 ||   4 |
| 10 ||   5 |
+----++-----+

--result--
+---++-----+--------+
| - || foo |  bar   |
|---||-----|--------|
| 0 ||   1 |      4 |
| 1 ||   2 | (null) |
+---++-----+--------+
Example (WithOptions)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	merged, _ := s.Merge(s2,
		tada.JoinOptionHow("inner"),
		tada.JoinOptionLeftOn([]string{"a"}),
		tada.JoinOptionRightOn([]string{"b"}),
	)
	fmt.Println(merged)
}
Output:

--original Series--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| foo | 0 || 1 |
| bar | 1 || 2 |
+-----+---++---+

--Series to lookup--
+----+-----++---+
| a  |  b  || 0 |
|----|-----||---|
|  0 | baz || 4 |
| 10 | bar || 5 |
+----+-----++---+

--result--
+-----+---++---+-----+
|  a  | b || 0 | 0_1 |
|-----|---||---|-----|
| bar | 1 || 2 |   5 |
+-----+---++---+-----+

func (*Series) Min

func (s *Series) Min() float64

Min coerces the Series values to float64 and calculates the minimum.

func (*Series) Multiply

func (s *Series) Multiply(other *Series, ignoreNulls bool) *Series

Multiply coerces other and s to float64 values, aligns other with s, and multiplies the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) NUnique

func (s *Series) NUnique() int

NUnique counts the number of unique, non-null Series values.

func (*Series) Name

func (s *Series) Name() string

Name returns the name of the Series

func (*Series) NameOfLabel

func (s *Series) NameOfLabel(n int) string

NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"

func (*Series) Percentile

func (s *Series) Percentile() *Series

Percentile coerces the Series values to float64 returns the percentile rank of each value. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it.

func (*Series) PercentileBin

func (s *Series) PercentileBin(bins []float64, config *Binner) (*Series, error)

PercentileBin coerces the Series values to float64 and categorizes each value based on which percentile bin interval it falls within. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it. bins should be a slice of sequential percentile edges (between 0 and 1) that form intervals (left inclusive, right exclusive). NB: left inclusive, right exclusive is the opposite of the interval inclusion rules for the Bin() function. For example, [0, .5, 1] represents the percentile intervals 0-50% (including 0%, excluding 50%) and 50%-100% (including 50%, excluding 100%). If these bins were supplied for a Series with values [1, 1000], the returned Series would have values [0-0.5, 0.5-1], because 1 is in the bottom 50% of values and 1000 is in the top 50% of values. Null values are not categorized. For default behavior, supply nil as config.

To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.

Example
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.PercentileBin([]float64{0, .5, 1}, nil)
	fmt.Println(binned)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
+---++-----+

+---++-------+
| - ||  foo  |
|---||-------|
| 0 || 0-0.5 |
| 1 ||       |
| 2 || 0.5-1 |
| 3 ||       |
+---++-------+
Example (CustomLabels)
package main

import (
	"fmt"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.PercentileBin([]float64{0, .5, 1}, &tada.Binner{Labels: []string{"Bottom 50%", "Top 50%"}})
	fmt.Println(binned)
}
Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
+---++-----+

+---++------------+
| - ||    foo     |
|---||------------|
| 0 || Bottom 50% |
| 1 ||            |
| 2 ||    Top 50% |
| 3 ||            |
+---++------------+

func (*Series) Range

func (s *Series) Range(first, last int) *Series

Range returns the rows of the Series starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is out of range, a Series error is returned. In all cases, returns a new Series.

func (*Series) Rank

func (s *Series) Rank() *Series

Rank coerces the Series values to float64 and returns the rank of each (in ascending order - where 1 is the rank of the lowest value). Rows with the same value share the same rank.

func (*Series) Reduce

func (s *Series) Reduce(lambda ReduceFn) (value interface{}, isNull bool)

Reduce reduces all Series values to a single value and null status using lambda.

func (*Series) Relabel

func (s *Series) Relabel() *Series

Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new Series.

func (*Series) Resample

func (s *Series) Resample(by Resampler) *Series

Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Returns a new Series.

Example (ByHalfHour)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
		time.Date(2020, 1, 15, 12, 45, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	byHalfHour := tada.Resampler{ByDuration: 30 * time.Minute}
	fmt.Println(s.Resample(byHalfHour))
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
| 1 || 2020-01-15T12:45:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:00:00Z |
| 1 || 2020-01-15T12:30:00Z |
+---++----------------------+
Example (ByHour)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byHour := tada.Resampler{ByDuration: time.Hour}
	fmt.Println(s.Resample(byHour))
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:00:00Z |
+---++----------------------+
Example (ByMonth)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byMonth := tada.Resampler{ByMonth: true}
	fmt.Println(s.Resample(byMonth))
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-01T00:00:00Z |
+---++----------------------+
Example (ByWeek)
package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/stable/041620/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byWeek := tada.Resampler{ByWeek: true, StartOfWeek: time.Sunday}
	fmt.Println(s.Resample(byWeek))
}
Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-12T00:00:00Z |
+---++----------------------+

func (*Series) RollingDuration

func (s *Series) RollingDuration(d time.Duration) *GroupedSeries

RollingDuration iterates over each row in Series, coerces the values to time.Time, and groups each set of subsequent rows that are within d of the current row.

func (*Series) RollingN

func (s *Series) RollingN(n int) *GroupedSeries

RollingN iterates over each row in Series and groups each set of n subsequent rows after the current row.

func (*Series) SetLabelNames

func (s *Series) SetLabelNames(levelNames []string) *Series

SetLabelNames sets the names of all the label levels in the Series and returns the entire Series. If an error is returned, it is written to the Series.

func (*Series) SetName

func (s *Series) SetName(name string) *Series

SetName modifies the name of a Series in place and returns the original Series.

func (*Series) SetRows

func (s *Series) SetRows(lambda ApplyFn, rows []int) *Series

SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new Series.

func (*Series) Shift

func (s *Series) Shift(n int) *Series

Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. Returns a new Series.

func (*Series) Shuffle

func (s *Series) Shuffle(seed int64) *Series

Shuffle randomizes the row order of the Series. Returns a new Series.

func (*Series) Sort

func (s *Series) Sort(by ...Sorter) *Series

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new Series.

func (*Series) StdDev

func (s *Series) StdDev() float64

StdDev coerces the Series values to float64 and calculates the standard deviation.

func (*Series) String

func (s *Series) String() string

func (*Series) Struct

func (s *Series) Struct(structPointer interface{}, options ...WriteOption) error

Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.

func (*Series) Subset

func (s *Series) Subset(index []int) *Series

Subset returns only the rows specified at the index positions, in the order specified. Returns a new Series.

func (*Series) SubsetLabels

func (s *Series) SubsetLabels(index []int) *Series

SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Returns a new Series.

func (*Series) Subtract

func (s *Series) Subtract(other *Series, ignoreNulls bool) *Series

Subtract coerces other and s to float64 values, aligns other with s, and subtracts the aligned values of other from s, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) Sum

func (s *Series) Sum() float64

Sum coerces the Series values float64 and sums them.

func (*Series) SwapLabels

func (s *Series) SwapLabels(i, j string) *Series

SwapLabels swaps the label levels with names i and j. Returns a new Series.

func (*Series) Tail

func (s *Series) Tail(n int) *Series

Tail returns the last n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.

func (*Series) Type

func (s *Series) Type() reflect.Type

Type returns the slice type of the underlying Series values

func (*Series) Unique

func (s *Series) Unique(includeLabels bool) *Series

Unique returns the first appearance of all non-null values in the Series. If includeLabels is true, a row is considered unique only if its combination of labels and values is unique. Returns a new Series.

func (*Series) ValueCounts

func (s *Series) ValueCounts() map[string]int

ValueCounts counts the number of appearances of each stringified value in the Series.

func (*Series) Where

func (s *Series) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)

Where iterates over the rows in s and evaluates whether each one satisfies filters, which is a map of container names (either the Series name or label name) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.

Returns an unnamed Series a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.

func (*Series) WithLabels

func (s *Series) WithLabels(name string, input interface{}) *Series

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.

In all cases, returns a new Series.

func (*Series) WithValues

func (s *Series) WithValues(input interface{}) *Series

WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Returns a new Series.

func (*Series) WriteCSV

func (s *Series) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV converts a DataFrame to a csv with rows as the major dimension, and writes the output to w. Null values are replaced with "(null)".

type SeriesIterator

type SeriesIterator struct {
	// contains filtered or unexported fields
}

A SeriesIterator iterates over the rows in a Series.

func (*SeriesIterator) Next

func (iter *SeriesIterator) Next() bool

Next advances to next row. Returns false at end of iteration.

func (*SeriesIterator) Row

func (iter *SeriesIterator) Row() map[string]Element

Row returns the current row in the Series as map[string]Element. The map keys are the names of containers (including label levels). The name of the Series values column is the same as the name of the Series itself. The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Elements of the left-most column are returned.

type SeriesMutator

type SeriesMutator struct {
	// contains filtered or unexported fields
}

A SeriesMutator is used to change Series values in place.

func (*SeriesMutator) Append

func (s *SeriesMutator) Append(other *Series) error

Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.

func (*SeriesMutator) Apply

func (s *SeriesMutator) Apply(lambda ApplyFn) error

Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be changed in-place within the anonymous function. Modifies the underlying Series in place.

func (*SeriesMutator) DropLabels

func (s *SeriesMutator) DropLabels(name string) error

DropLabels removes the first label level matching name. Modifies the underlying Series in place.

func (*SeriesMutator) DropNull

func (s *SeriesMutator) DropNull()

DropNull returns all the rows with non-null values. Modifies the underlying Series.

func (*SeriesMutator) DropRow

func (s *SeriesMutator) DropRow(index int) error

DropRow removes the row at the specified index. Modifies the underlying Series in place.

func (*SeriesMutator) FillNull

func (s *SeriesMutator) FillNull(how NullFiller)

FillNull fills all the null values and makes them not-null. Modifies the underlying Series.

func (*SeriesMutator) Filter

func (s *SeriesMutator) Filter(filters map[string]FilterFn) error

Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying Series in place.

func (*SeriesMutator) FilterByValue

func (s *SeriesMutator) FilterByValue(filters map[string]interface{}) error

FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Modifies the underlying Series in place.

func (*SeriesMutator) Relabel

func (s *SeriesMutator) Relabel()

Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying Series in place.

func (*SeriesMutator) Resample

func (s *SeriesMutator) Resample(by Resampler)

Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Modifies the underlying Series in place.

func (*SeriesMutator) SetRows

func (s *SeriesMutator) SetRows(lambda ApplyFn, rows []int) error

SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying Series in place.

func (*SeriesMutator) Shift

func (s *SeriesMutator) Shift(n int)

Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. // Modifies the underlying Series.

func (*SeriesMutator) Shuffle

func (s *SeriesMutator) Shuffle(seed int64)

Shuffle randomizes the row order of the Series. Modifies the underlying Series.

func (*SeriesMutator) Sort

func (s *SeriesMutator) Sort(by ...Sorter) error

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. Modifies the underlying Series in place.

func (*SeriesMutator) Subset

func (s *SeriesMutator) Subset(index []int) error

Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying Series in place.

func (*SeriesMutator) SubsetLabels

func (s *SeriesMutator) SubsetLabels(index []int) error

SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Modifies the underlying Series in place.

func (*SeriesMutator) SwapLabels

func (s *SeriesMutator) SwapLabels(i, j string) error

SwapLabels swaps the label levels with names i and j. Modifies the underlying Series in place.

func (*SeriesMutator) WithLabels

func (s *SeriesMutator) WithLabels(name string, input interface{}) error

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.

In all cases, modifies the underlying Series in place.

func (*SeriesMutator) WithValues

func (s *SeriesMutator) WithValues(input interface{}) error

WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Modifies the underlying Series.

type Sorter

type Sorter struct {
	Name       string
	Descending bool
	DType      DType
}

A Sorter supplies details to the Sort() function. `Name` specifies the container (either label or column name) to sort. If `Descending` is true, values are sorted in descending order. `DType` specifies the data type to which values will be coerced before they are sorted (default: float64). Null values are always sorted to the bottom.

type StructTransposer

type StructTransposer [][]interface{}

A StructTransposer is a row-oriented representation of a DataFrame that can be randomly shuffled or transposed into a column-oriented struct representation of a DataFrame. It is useful for intuitive row-oriented testing.

func (StructTransposer) Shuffle

func (st StructTransposer) Shuffle(seed int64)

Shuffle randomly shuffles the row order in Rows, using a randomizer seeded with seed.

func (StructTransposer) Transpose

func (st StructTransposer) Transpose(structPointer interface{}) error

Transpose reads the values of an untyped, row-oriented struct representation of a DataFrame into a typed, column-oriented struct representation of a DataFrame. If all non-null values in a column have the same type, then the column will be a slice of that type. If any of the non-null values in a column have different types, then the column will be []interface{}. If all values are considered null by tada, then the column will be a slice of the type in the first row (when all values are null and the first row is nil, the column will be []interface{}). If an error is returned, values are still written to structPointer up until the point the error occurred.

type WriteOption

type WriteOption func(*writeConfig)

A WriteOption configures a write function. Available write options: WriteOptionExcludeLabels, WriteOptionDelimiter.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL