Documentation ¶
Overview ¶
Package tada (TAble DAta) enables test-driven data pipelines.
tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data. Some notable features of tada:
* flexible constructor that supports most primitive data types
* seamlessly handles null data and type conversions
* robust datetime support
* advanced filtering, lookups and merging, grouping, sorting, and pivoting
* multi-level labels and columns
* complete test coverage
* interoperable with existing pandas dataframes via Apache Arrow
The key data types are Series, DataFrames, and groupings of each. A Series is analogous to one column of a spreadsheet, and a DataFrame is analogous to a whole spreadsheet. Printing either data type will render an ASCII table.
Both Series and DataFrames have one or more "label levels". On printing, these appear as the leftmost columns in a table, and typically have values that help identify ("label") specific rows. They are analogous to the "index" concept in pandas.
For more detail and implementation notes, see https://docs.google.com/document/d/18DvZzd6Tg6Bz0SX0fY2SrXOjE8d9xDhU6bDEnaIc_rM/
Index ¶
- func DisableWarnings()
- func EnableWarnings()
- func EqualDataFrames(a, b *DataFrame) bool
- func EqualSeries(a, b *Series) bool
- func GetOptionDefaultNullStrings() []string
- func JoinOptionHow(how string) func(*joinConfig)
- func JoinOptionLeftOn(keys []string) func(*joinConfig)
- func JoinOptionRightOn(keys []string) func(*joinConfig)
- func MakeMultiLevelLabels(labels []interface{}) ([]interface{}, error)
- func PrettyDiff(got, want interface{}) (bool, *tablediff.Differences, error)
- func PrintOptionMaxCellWidth(n int)
- func PrintOptionMaxColumns(n int)
- func PrintOptionMaxRows(n int)
- func PrintOptionMergeRepeats(set bool)
- func PrintOptionWrapLines(set bool)
- func ReadOptionDelimiter(sep rune) func(*readConfig)
- func ReadOptionHeaders(n int) func(*readConfig)
- func ReadOptionLabels(n int) func(*readConfig)
- func ReadOptionSwitchDims() func(*readConfig)
- func SetOptionAddTimeFormat(format string)
- func SetOptionDefaultSeparator(sep string)
- func SetOptionNaNStatus(set bool)
- func SetOptionNullStrings(list []string)
- func WriteMockCSV(w io.Writer, n int, r io.Reader, options ...ReadOption) error
- func WriteOptionDelimiter(sep rune) func(*writeConfig)
- func WriteOptionExcludeLabels() func(*writeConfig)
- type ApplyFn
- type Binner
- type DType
- type DataFrame
- func ConcatSeries(series ...*Series) (*DataFrame, error)
- func NewDataFrame(slices []interface{}, labels ...interface{}) *DataFrame
- func ReadCSV(r io.Reader, options ...ReadOption) (*DataFrame, error)
- func ReadCSVFromRecords(records [][]string, options ...ReadOption) (ret *DataFrame, err error)
- func ReadInterfaceRecords(records [][]interface{}, options ...ReadOption) (ret *DataFrame, err error)
- func ReadMatrix(mat Matrix) *DataFrame
- func ReadStruct(strct interface{}, options ...ReadOption) (*DataFrame, error)
- func ReadStructSlice(slice interface{}) (*DataFrame, error)
- func (df *DataFrame) Append(other *DataFrame) *DataFrame
- func (df *DataFrame) Apply(lambdas map[string]ApplyFn) *DataFrame
- func (df *DataFrame) At(row, column int) *Element
- func (df *DataFrame) CSVRecords(options ...WriteOption) [][]string
- func (df *DataFrame) Cast(containerAsType map[string]DType)
- func (df *DataFrame) Col(name string) *Series
- func (df *DataFrame) Cols(names ...string) *DataFrame
- func (df *DataFrame) Copy() *DataFrame
- func (df *DataFrame) Count() *Series
- func (df *DataFrame) DeduplicateNames() *DataFrame
- func (df *DataFrame) DropCol(name string) *DataFrame
- func (df *DataFrame) DropLabels(name string) *DataFrame
- func (df *DataFrame) DropNull(subset ...string) *DataFrame
- func (df *DataFrame) DropRow(index int) *DataFrame
- func (df *DataFrame) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
- func (df *DataFrame) Err() error
- func (df *DataFrame) FillNull(how map[string]NullFiller) *DataFrame
- func (df *DataFrame) Filter(filters map[string]FilterFn) *DataFrame
- func (df *DataFrame) FilterByValue(filters map[string]interface{}) *DataFrame
- func (df *DataFrame) FilterCols(lambda func(string) bool, level int) *DataFrame
- func (df *DataFrame) FilterIndex(container string, filterFn FilterFn) []int
- func (df *DataFrame) GetLabels() []interface{}
- func (df *DataFrame) GroupBy(names ...string) *GroupedDataFrame
- func (df *DataFrame) HasCols(colNames ...string) error
- func (df *DataFrame) HasLabels(labelNames ...string) error
- func (df *DataFrame) HasType(sliceType string) (labelIndex, columnIndex []int)
- func (df *DataFrame) Head(n int) *DataFrame
- func (df *DataFrame) InPlace() *DataFrameMutator
- func (df *DataFrame) IndexOfContainer(name string, columns bool) int
- func (df *DataFrame) InterfaceRecords(options ...WriteOption) [][]interface{}
- func (df *DataFrame) IsNull(subset ...string) *DataFrame
- func (df *DataFrame) Iterator() *DataFrameIterator
- func (df *DataFrame) LabelsAsSeries(name string) *Series
- func (df *DataFrame) Len() int
- func (df *DataFrame) ListColNames() []string
- func (df *DataFrame) ListColNamesAtLevel(level int) []string
- func (df *DataFrame) ListLabelNames() []string
- func (df *DataFrame) Lookup(other *DataFrame, options ...JoinOption) (*DataFrame, error)
- func (df *DataFrame) Max() *Series
- func (df *DataFrame) Mean() *Series
- func (df *DataFrame) Median() *Series
- func (df *DataFrame) Merge(other *DataFrame, options ...JoinOption) (*DataFrame, error)
- func (df *DataFrame) Min() *Series
- func (df *DataFrame) NUnique() *Series
- func (df *DataFrame) Name() string
- func (df *DataFrame) NameOfCol(n int) string
- func (df *DataFrame) NameOfLabel(n int) string
- func (df *DataFrame) NumColumns() int
- func (df *DataFrame) NumLevels() int
- func (df *DataFrame) PivotTable(labels, columns, values, aggFunc string) (*DataFrame, error)
- func (df *DataFrame) PromoteToColLevel(name string) *DataFrame
- func (df *DataFrame) Range(first, last int) *DataFrame
- func (df *DataFrame) Reduce(name string, lambda ReduceFn) (*Series, error)
- func (df *DataFrame) Relabel() *DataFrame
- func (df *DataFrame) ReorderCols(colNames []string) *DataFrame
- func (df *DataFrame) ReorderLabels(levelNames []string) *DataFrame
- func (df *DataFrame) Resample(how map[string]Resampler) *DataFrame
- func (df *DataFrame) ResetLabels(index ...string) *DataFrame
- func (df *DataFrame) Series() *Series
- func (df *DataFrame) SetAsLabels(colNames ...string) *DataFrame
- func (df *DataFrame) SetColNames(colNames []string) *DataFrame
- func (df *DataFrame) SetLabelNames(levelNames []string) *DataFrame
- func (df *DataFrame) SetName(name string) *DataFrame
- func (df *DataFrame) SetNulls(n int, nulls []bool) error
- func (df *DataFrame) SetRows(lambda ApplyFn, container string, rows []int) *DataFrame
- func (df *DataFrame) Shuffle(seed int64) *DataFrame
- func (df *DataFrame) Sort(by ...Sorter) *DataFrame
- func (df *DataFrame) StdDev() *Series
- func (df *DataFrame) String() string
- func (df *DataFrame) Struct(structPointer interface{}, options ...WriteOption) error
- func (df *DataFrame) Subset(index []int) *DataFrame
- func (df *DataFrame) SubsetCols(index []int) *DataFrame
- func (df *DataFrame) SubsetLabels(index []int) *DataFrame
- func (df *DataFrame) Sum() *Series
- func (df *DataFrame) SumCols(name string, colNames ...string) (*Series, error)
- func (df *DataFrame) SwapLabels(i, j string) *DataFrame
- func (df *DataFrame) Tail(n int) *DataFrame
- func (df *DataFrame) Transpose() *DataFrame
- func (df *DataFrame) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)
- func (df *DataFrame) WithCol(name string, input interface{}) *DataFrame
- func (df *DataFrame) WithLabels(name string, input interface{}) *DataFrame
- func (df *DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error
- type DataFrameIterator
- type DataFrameMutator
- func (df *DataFrameMutator) Append(other *DataFrame) error
- func (df *DataFrameMutator) Apply(lambdas map[string]ApplyFn) error
- func (df *DataFrameMutator) DeduplicateNames()
- func (df *DataFrameMutator) DropCol(name string) error
- func (df *DataFrameMutator) DropLabels(name string) error
- func (df *DataFrameMutator) DropNull(subset ...string) error
- func (df *DataFrameMutator) DropRow(index int) error
- func (df *DataFrameMutator) FillNull(how map[string]NullFiller) error
- func (df *DataFrameMutator) Filter(filters map[string]FilterFn) error
- func (df *DataFrameMutator) FilterByValue(filters map[string]interface{}) error
- func (df *DataFrameMutator) FilterCols(lambda func(string) bool, level int) error
- func (df *DataFrameMutator) IsNull(subset ...string) error
- func (df *DataFrameMutator) Range(first, last int) error
- func (df *DataFrameMutator) Relabel()
- func (df *DataFrameMutator) ReorderCols(colNames []string) error
- func (df *DataFrameMutator) ReorderLabels(levelNames []string) error
- func (df *DataFrameMutator) Resample(how map[string]Resampler) error
- func (df *DataFrameMutator) ResetLabels(labelLevels ...string) error
- func (df *DataFrameMutator) SetAsLabels(colNames ...string)
- func (df *DataFrameMutator) SetRows(lambda ApplyFn, container string, rows []int) error
- func (df *DataFrameMutator) Shuffle(seed int64)
- func (df *DataFrameMutator) Sort(by ...Sorter) error
- func (df *DataFrameMutator) Subset(index []int) error
- func (df *DataFrameMutator) SubsetCols(index []int) error
- func (df *DataFrameMutator) SubsetLabels(index []int) error
- func (df *DataFrameMutator) SwapLabels(i, j string) error
- func (df *DataFrameMutator) WithCol(name string, input interface{}) error
- func (df *DataFrameMutator) WithLabels(name string, input interface{}) error
- type Element
- type FilterFn
- type GroupedDataFrame
- func (g *GroupedDataFrame) Apply(cols []string, lambda ApplyFn) *GroupedDataFrame
- func (g *GroupedDataFrame) Col(colName string) *GroupedSeries
- func (g *GroupedDataFrame) Count(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) DataFrame() *DataFrame
- func (g *GroupedDataFrame) Earliest(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Err() error
- func (g *GroupedDataFrame) First(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) GetGroup(group string) *DataFrame
- func (g *GroupedDataFrame) GetLabels() []interface{}
- func (g *GroupedDataFrame) HavingCount(lambda func(int) bool) *GroupedDataFrame
- func (g *GroupedDataFrame) Iterator() *GroupedDataFrameIterator
- func (g *GroupedDataFrame) Last(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Latest(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Len() int
- func (g *GroupedDataFrame) ListGroups() []string
- func (g *GroupedDataFrame) Max(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Mean(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Median(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Min(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) NUnique(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Nth(index int, colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Reduce(name string, cols []string, lambda ReduceFn) *DataFrame
- func (g *GroupedDataFrame) StdDev(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) String() string
- func (g *GroupedDataFrame) Sum(colNames ...string) *DataFrame
- type GroupedDataFrameIterator
- type GroupedSeries
- func (g *GroupedSeries) Align() *GroupedSeries
- func (g *GroupedSeries) Apply(lambda ApplyFn) *GroupedSeries
- func (g *GroupedSeries) Count() *Series
- func (g *GroupedSeries) Earliest() *Series
- func (g *GroupedSeries) Err() error
- func (g *GroupedSeries) First() *Series
- func (g *GroupedSeries) GetGroup(group string) *Series
- func (g *GroupedSeries) GetLabels() []interface{}
- func (g *GroupedSeries) HavingCount(lambda func(int) bool) *GroupedSeries
- func (g *GroupedSeries) Iterator() *GroupedSeriesIterator
- func (g *GroupedSeries) Last() *Series
- func (g *GroupedSeries) Latest() *Series
- func (g *GroupedSeries) Len() int
- func (g *GroupedSeries) ListGroups() []string
- func (g *GroupedSeries) Max() *Series
- func (g *GroupedSeries) Mean() *Series
- func (g *GroupedSeries) Median() *Series
- func (g *GroupedSeries) Min() *Series
- func (g *GroupedSeries) NUnique() *Series
- func (g *GroupedSeries) Nth(n int) *Series
- func (g *GroupedSeries) Reduce(name string, lambda ReduceFn) *Series
- func (g *GroupedSeries) Series() *Series
- func (g *GroupedSeries) StdDev() *Series
- func (g *GroupedSeries) String() string
- func (g *GroupedSeries) Sum() *Series
- type GroupedSeriesIterator
- type JoinOption
- type Matrix
- type NullFiller
- type ReadOption
- type ReduceFn
- type Resampler
- type Series
- func (s *Series) Add(other *Series, ignoreNulls bool) *Series
- func (s *Series) Append(other *Series) *Series
- func (s *Series) Apply(lambda ApplyFn) *Series
- func (s *Series) At(index int) *Element
- func (s *Series) Bin(bins []float64, config *Binner) (*Series, error)
- func (s *Series) CSV(options ...WriteOption) ([][]string, error)
- func (s *Series) Cast(containerAsType map[string]DType)
- func (s *Series) Copy() *Series
- func (s *Series) Count() int
- func (s *Series) CumSum() *Series
- func (s *Series) DataFrame() *DataFrame
- func (s *Series) Divide(other *Series, ignoreNulls bool) *Series
- func (s *Series) DropLabels(name string) *Series
- func (s *Series) DropNull() *Series
- func (s *Series) DropRow(index int) *Series
- func (s *Series) Earliest() time.Time
- func (s *Series) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
- func (s *Series) Err() error
- func (s *Series) FillNull(how NullFiller) *Series
- func (s *Series) Filter(filters map[string]FilterFn) *Series
- func (s *Series) FilterByValue(filters map[string]interface{}) *Series
- func (s *Series) FilterIndex(container string, filterFn FilterFn) []int
- func (s *Series) GetLabels() []interface{}
- func (s *Series) GetNulls() []bool
- func (s *Series) GetValues() interface{}
- func (s *Series) GetValuesAsFloat64() []float64
- func (s *Series) GetValuesAsString() []string
- func (s *Series) GetValuesAsTime() []time.Time
- func (s *Series) GroupBy(names ...string) *GroupedSeries
- func (s *Series) HasLabels(labelNames ...string) error
- func (s *Series) Head(n int) *Series
- func (s *Series) InPlace() *SeriesMutator
- func (s *Series) IndexOfLabel(name string) int
- func (s *Series) IsNull() *Series
- func (s *Series) Iterator() *SeriesIterator
- func (s *Series) LabelsAsSeries(name string) *Series
- func (s *Series) Latest() time.Time
- func (s *Series) Len() int
- func (s *Series) ListLabelNames() []string
- func (s *Series) Lookup(other *Series, options ...JoinOption) (*Series, error)
- func (s *Series) Max() float64
- func (s *Series) Mean() float64
- func (s *Series) Median() float64
- func (s *Series) Merge(other *Series, options ...JoinOption) (*DataFrame, error)
- func (s *Series) Min() float64
- func (s *Series) Multiply(other *Series, ignoreNulls bool) *Series
- func (s *Series) NUnique() int
- func (s *Series) Name() string
- func (s *Series) NameOfLabel(n int) string
- func (s *Series) Percentile() *Series
- func (s *Series) PercentileBin(bins []float64, config *Binner) (*Series, error)
- func (s *Series) Range(first, last int) *Series
- func (s *Series) Rank() *Series
- func (s *Series) Reduce(lambda ReduceFn) (value interface{}, isNull bool)
- func (s *Series) Relabel() *Series
- func (s *Series) Resample(by Resampler) *Series
- func (s *Series) RollingDuration(d time.Duration) *GroupedSeries
- func (s *Series) RollingN(n int) *GroupedSeries
- func (s *Series) SetLabelNames(levelNames []string) *Series
- func (s *Series) SetName(name string) *Series
- func (s *Series) SetRows(lambda ApplyFn, rows []int) *Series
- func (s *Series) Shift(n int) *Series
- func (s *Series) Shuffle(seed int64) *Series
- func (s *Series) Sort(by ...Sorter) *Series
- func (s *Series) StdDev() float64
- func (s *Series) String() string
- func (s *Series) Struct(structPointer interface{}, options ...WriteOption) error
- func (s *Series) Subset(index []int) *Series
- func (s *Series) SubsetLabels(index []int) *Series
- func (s *Series) Subtract(other *Series, ignoreNulls bool) *Series
- func (s *Series) Sum() float64
- func (s *Series) SwapLabels(i, j string) *Series
- func (s *Series) Tail(n int) *Series
- func (s *Series) Type() reflect.Type
- func (s *Series) Unique(includeLabels bool) *Series
- func (s *Series) ValueCounts() map[string]int
- func (s *Series) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)
- func (s *Series) WithLabels(name string, input interface{}) *Series
- func (s *Series) WithValues(input interface{}) *Series
- func (s *Series) WriteCSV(w io.Writer, options ...WriteOption) error
- type SeriesIterator
- type SeriesMutator
- func (s *SeriesMutator) Append(other *Series) error
- func (s *SeriesMutator) Apply(lambda ApplyFn) error
- func (s *SeriesMutator) DropLabels(name string) error
- func (s *SeriesMutator) DropNull()
- func (s *SeriesMutator) DropRow(index int) error
- func (s *SeriesMutator) FillNull(how NullFiller)
- func (s *SeriesMutator) Filter(filters map[string]FilterFn) error
- func (s *SeriesMutator) FilterByValue(filters map[string]interface{}) error
- func (s *SeriesMutator) Relabel()
- func (s *SeriesMutator) Resample(by Resampler)
- func (s *SeriesMutator) SetRows(lambda ApplyFn, rows []int) error
- func (s *SeriesMutator) Shift(n int)
- func (s *SeriesMutator) Shuffle(seed int64)
- func (s *SeriesMutator) Sort(by ...Sorter) error
- func (s *SeriesMutator) Subset(index []int) error
- func (s *SeriesMutator) SubsetLabels(index []int) error
- func (s *SeriesMutator) SwapLabels(i, j string) error
- func (s *SeriesMutator) WithLabels(name string, input interface{}) error
- func (s *SeriesMutator) WithValues(input interface{}) error
- type Sorter
- type StructTransposer
- type WriteOption
Examples ¶
- DataFrame
- DataFrame.Filter
- DataFrame.GroupBy
- DataFrame.SetColNames
- DataFrame.SetLabelNames
- DataFrame.SetLabelNames (Multiple)
- DataFrame.Sort
- DataFrame.Struct
- DataFrame.Struct (WithNulls)
- DataFrame.Where
- DataFrame.WithCol (Append)
- DataFrame.WithCol (Overwrite)
- DataFrame.WithCol (Rename)
- DataFrameMutator.WithCol (Rename)
- GroupedSeries.Align (Mean)
- GroupedSeries.Apply
- GroupedSeries.Apply (Align)
- GroupedSeries.HavingCount (Sum)
- GroupedSeries.Mean
- GroupedSeries.Reduce
- PrintOptionMaxCellWidth
- PrintOptionMaxColumns
- PrintOptionMaxRows
- ReadCSV
- ReadCSV (Delimiter)
- ReadCSV (MultipleHeaders)
- ReadCSV (MultipleHeadersWithLabels)
- ReadCSV (NoHeaders)
- ReadCSV (WithLabels)
- ReadCSVFromRecords
- ReadCSVFromRecords (ColsAsMajorDimension)
- Series
- Series (NestedSlice)
- Series (SetNaNStatus)
- Series (SetSentinelNulls)
- Series (Zscore)
- Series.Apply (Float64)
- Series.Bin
- Series.Bin (AndMore)
- Series.Bin (CustomLabels)
- Series.Cast (Date)
- Series.Cast (Time)
- Series.GroupBy
- Series.GroupBy (CompoundGroup)
- Series.Lookup
- Series.Lookup (WithOptions)
- Series.Merge
- Series.Merge (WithOptions)
- Series.PercentileBin
- Series.PercentileBin (CustomLabels)
- Series.Resample (ByHalfHour)
- Series.Resample (ByHour)
- Series.Resample (ByMonth)
- Series.Resample (ByWeek)
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
func DisableWarnings ¶
func DisableWarnings()
DisableWarnings prevents tada from writing warning messages to the default log writer.
func EnableWarnings ¶
func EnableWarnings()
EnableWarnings allows tada to write warning messages to the default log writer.
func EqualDataFrames ¶
EqualDataFrames returns whether two dataframes are identical or not.
func EqualSeries ¶
EqualSeries returns whether two Series are identical or not.
func GetOptionDefaultNullStrings ¶ added in v0.6.0
func GetOptionDefaultNullStrings() []string
GetOptionDefaultNullStrings returns the default list of strings that tada considers null.
func JoinOptionHow ¶ added in v0.6.0
func JoinOptionHow(how string) func(*joinConfig)
JoinOptionHow specifies how to join two Series or DataFrames. Supported options: left (ie left join), right, inner (default: left).
func JoinOptionLeftOn ¶ added in v0.6.0
func JoinOptionLeftOn(keys []string) func(*joinConfig)
JoinOptionLeftOn specifies the key(s) to use to join the left Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.
func JoinOptionRightOn ¶ added in v0.6.0
func JoinOptionRightOn(keys []string) func(*joinConfig)
JoinOptionRightOn specifies the key(s) to use to join the right Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.
func MakeMultiLevelLabels ¶
func MakeMultiLevelLabels(labels []interface{}) ([]interface{}, error)
MakeMultiLevelLabels expects labels to be a slice of slices. It returns a product of these slices by repeating each label value n times, where n is the number of unique label values in the other slices.
For example, [["foo", "bar"], [1, 2, 3]] returns [["foo", "foo", "foo", "bar", "bar", "bar"], [1, 2, 3, 1, 2, 3]]
func PrettyDiff ¶ added in v0.8.6
func PrettyDiff(got, want interface{}) (bool, *tablediff.Differences, error)
PrettyDiff reads two structs into DataFrames, prints each as a stringified csv table, and returns whether they are equal. If not, returns the differences between the two.
func PrintOptionMaxCellWidth ¶ added in v0.7.6
func PrintOptionMaxCellWidth(n int)
PrintOptionMaxCellWidth changes the max rune width of any cell displayed when printing a Series or DataFrame to n (default: 30).
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []string{"corgilius", "barrius", "foo"}, }).SetColNames([]string{"waldonius"}) tada.PrintOptionMaxCellWidth(5) fmt.Println(df) tada.PrintOptionMaxCellWidth(30) }
Output: +---++-------+ | - || wa... | |---||-------| | 0 || co... | | 1 || ba... | | 2 || foo | +---++-------+
func PrintOptionMaxColumns ¶ added in v0.1.0
func PrintOptionMaxColumns(n int)
PrintOptionMaxColumns changes the max number of columns displayed when printing a Series or DataFrame to n (default: 20).
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}, []float64{3, 4}, []float64{5, 6}, []float64{3, 4}, []float64{5, 6}, }).SetColNames([]string{"A", "B", "C", "D", "E"}) tada.PrintOptionMaxColumns(2) fmt.Println(df) tada.PrintOptionMaxColumns(20) }
Output: +---++---+-----+---+ | - || A | ... | E | |---||---|-----|---| | 0 || 1 | ... | 5 | | 1 || 2 | | 6 | +---++---+-----+---+
func PrintOptionMaxRows ¶ added in v0.1.0
func PrintOptionMaxRows(n int)
PrintOptionMaxRows changes the max number of rows displayed when printing a Series or DataFrame to n (default: 50).
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2, 3, 4, 5, 6, 7, 8}}).SetColNames([]string{"A"}) tada.PrintOptionMaxRows(6) fmt.Println(df) tada.PrintOptionMaxRows(50) }
Output: +-----++-----+ | - || A | |-----||-----| | 0 || 1 | | 1 || 2 | | 2 || 3 | | ... || ... | | 5 || 6 | | 6 || 7 | | 7 || 8 | +-----++-----+
func PrintOptionMergeRepeats ¶ added in v0.1.0
func PrintOptionMergeRepeats(set bool)
PrintOptionMergeRepeats (if true) instructs the String() function to merge repeated non-header values when printing a Series or DataFrame (default: true).
func PrintOptionWrapLines ¶ added in v0.1.0
func PrintOptionWrapLines(set bool)
PrintOptionWrapLines (if true) instructs the String() function to wrap overly-wide rows onto new lines instead of truncating them when printing a Series or DataFrame (default: truncate).
func ReadOptionDelimiter ¶ added in v0.1.0
func ReadOptionDelimiter(sep rune) func(*readConfig)
ReadOptionDelimiter configures a read function to use sep as a field delimiter for use in ReadCSV (default: ",").
func ReadOptionHeaders ¶ added in v0.1.0
func ReadOptionHeaders(n int) func(*readConfig)
ReadOptionHeaders configures a read function to expect n rows to be column headers (default: 1).
func ReadOptionLabels ¶ added in v0.1.0
func ReadOptionLabels(n int) func(*readConfig)
ReadOptionLabels configures a read function to expect the first n columns to be label levels (default: 0).
func ReadOptionSwitchDims ¶ added in v0.1.0
func ReadOptionSwitchDims() func(*readConfig)
ReadOptionSwitchDims configures a read function to expect columns to be the major dimension of csv data (default: expects rows to be the major dimension). For example, when reading this data:
[["foo", "bar"], ["baz", "qux"]]
default ReadOptionSwitchDims() (major dimension: rows) (major dimension: columns)
foo bar foo baz baz qux bar qux
func SetOptionAddTimeFormat ¶
func SetOptionAddTimeFormat(format string)
SetOptionAddTimeFormat adds format to the list of time formats that can be parsed when converting values from string to time.Time.
func SetOptionDefaultSeparator ¶ added in v0.1.0
func SetOptionDefaultSeparator(sep string)
SetOptionDefaultSeparator changes the separator used in group names and multi-level column names to sep (default: "|").
func SetOptionNaNStatus ¶ added in v0.6.0
func SetOptionNaNStatus(set bool)
SetOptionNaNStatus sets whether math.NaN() is considered a null value or not (default: true).
func SetOptionNullStrings ¶ added in v0.6.0
func SetOptionNullStrings(list []string)
SetOptionNullStrings replaces the default list of strings that tada considers null with list.
func WriteMockCSV ¶
WriteMockCSV reads r (configured by options) and writes n mock rows to w, with column names and types inferred based on the data in src. Regardless of the major dimension of src, the major dimension of the output is rows. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.
Default if no options are supplied: 1 header row, no labels, rows as major dimension
func WriteOptionDelimiter ¶ added in v0.4.0
func WriteOptionDelimiter(sep rune) func(*writeConfig)
WriteOptionDelimiter configures a write function to use sep as a field delimiter for use in write functions (default: ",").
func WriteOptionExcludeLabels ¶ added in v0.4.0
func WriteOptionExcludeLabels() func(*writeConfig)
WriteOptionExcludeLabels excludes the label levels from the output.
Types ¶
type ApplyFn ¶
type ApplyFn func(slice interface{}, isNull []bool) (equalLengthSlice interface{})
An ApplyFn is an anonymous function supplied to an Apply function to convert one slice to another. The function input will be a slice, and it must return a slice of equal length (though the type may be different). isNull contains the null status of every row in the input slice. The null status of a row may be changed by setting that row's isNull element within the function body.
type Binner ¶ added in v0.4.9
Binner supplies logic for the Bin() function. If `AndLess` is true, a bin is added that ranges between negative infinity and the first bin value. If `AndMore` is true, a bin is added that ranges between the last bin value and positive infinity. If `Labels` is not nil, then category names correspond to labels, and the number of labels must be one less than the number of bin values. Otherwise, category names are auto-generated from the range of the bin intervals.
type DataFrame ¶
type DataFrame struct {
// contains filtered or unexported fields
}
A DataFrame is one or more columns of data with one or more levels of aligned labels. A DataFrame is analogous to a spreadsheet.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}, []string{"baz", "qux"}}, ).SetName("foo") fmt.Println(df) }
Output: +---++---+-----+ | - || 0 | 1 | |---||---|-----| | 0 || 1 | baz | | 1 || 2 | qux | +---++---+-----+ name: foo
func ConcatSeries ¶
ConcatSeries merges multiple Series from left-to-right, one after the other, via left joins on shared keys. For advanced cases, use df.LookupAdvanced() + df.WithCol().
func NewDataFrame ¶
func NewDataFrame(slices []interface{}, labels ...interface{}) *DataFrame
NewDataFrame creates a new DataFrame with slices (akin to column values) and optional labels. Slices must be comprised of supported slices, and each label must be a supported slice.
If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Columns are named sequentially (e.g., 0, 1, etc) by default. Default column names are displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.
Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).
func ReadCSV ¶
func ReadCSV(r io.Reader, options ...ReadOption) (*DataFrame, error)
ReadCSV reads csv records in r into a Dataframe (configured by options). Rows must be the major dimension of r. For advanced cases, use the standard csv library NewReader().ReadAll() + tada.ReadCSVFromRecords(). Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter.
Default if no options are supplied: 1 header row; no labels; field delimiter is ","
If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.
Example ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := "foo, bar\n baz, qux\n corge, fred" df, _ := tada.ReadCSV(strings.NewReader(data)) fmt.Println(df) }
Output: +---++-------+------+ | - || foo | bar | |---||-------|------| | 0 || baz | qux | | 1 || corge | fred | +---++-------+------+
Example (Delimiter) ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := `foo|bar baz|qux corge|fred` df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionDelimiter('|')) fmt.Println(df) }
Output: +---++-------+------+ | - || foo | bar | |---||-------|------| | 0 || baz | qux | | 1 || corge | fred | +---++-------+------+
Example (MultipleHeaders) ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := "foo, bar\n baz, qux\n corge, fred" df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2)) fmt.Println(df) }
Output: +---++-------+------+ | || foo | bar | | - || baz | qux | |---||-------|------| | 0 || corge | fred | +---++-------+------+
Example (MultipleHeadersWithLabels) ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := ", foo, bar\n labels, baz, qux\n 1, corge, fred" df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2), tada.ReadOptionLabels(1)) fmt.Println(df) }
Output: +--------++-------+------+ | || foo | bar | | labels || baz | qux | |--------||-------|------| | 1 || corge | fred | +--------++-------+------+
Example (NoHeaders) ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := "foo, bar\n baz, qux\n corge, fred" df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(0)) fmt.Println(df) }
Output: +---++-------+------+ | - || 0 | 1 | |---||-------|------| | 0 || foo | bar | | 1 || baz | qux | | 2 || corge | fred | +---++-------+------+
Example (WithLabels) ¶
package main import ( "fmt" "strings" "github.com/ptiger10/tada" ) func main() { data := `foo, bar baz, qux corge, fred` df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionLabels(1)) fmt.Println(df) }
Output: +-------++------+ | foo || bar | |-------||------| | baz || qux | | corge || fred | +-------++------+
func ReadCSVFromRecords ¶ added in v0.4.0
func ReadCSVFromRecords(records [][]string, options ...ReadOption) (ret *DataFrame, err error)
ReadCSVFromRecords reads records into a DataFrame (configured by options). Often used with encoding/csv.NewReader().ReadAll() All columns will be read as []string. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.
Default if no options are supplied: 1 header row; no labels; rows as major dimension
If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { data := [][]string{ {"foo", "bar"}, {"baz", "qux"}, {"corge", "fred"}, } df, _ := tada.ReadCSVFromRecords(data) fmt.Println(df) }
Output: +---++-------+------+ | - || foo | bar | |---||-------|------| | 0 || baz | qux | | 1 || corge | fred | +---++-------+------+
Example (ColsAsMajorDimension) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { data := [][]string{ {"foo", "bar"}, {"baz", "qux"}, {"corge", "fred"}, } df, _ := tada.ReadCSVFromRecords(data, tada.ReadOptionSwitchDims()) fmt.Println(df) }
Output: +---++-----+-----+-------+ | - || foo | baz | corge | |---||-----|-----|-------| | 0 || bar | qux | fred | +---++-----+-----+-------+
func ReadInterfaceRecords ¶ added in v0.7.0
func ReadInterfaceRecords(records [][]interface{}, options ...ReadOption) (ret *DataFrame, err error)
ReadInterfaceRecords reads records into a DataFrame (configured by options). All columns will be read as []interface{}. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.
Default if no options are supplied: 1 header row; no labels; rows as major dimension
If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.
func ReadMatrix ¶
ReadMatrix reads data satisfying the gonum Matrix interface into a DataFrame. Panics if any slices in the matrix are shorter than the first slice.
func ReadStruct ¶
func ReadStruct(strct interface{}, options ...ReadOption) (*DataFrame, error)
ReadStruct reads the exported fields in strct into a DataFrame. strct must be a struct or pointer to a struct. If any exported field in strct is nil, returns an error.
If a "tada" tag is present with the value "isNull", this field must be [][]bool with one equal-lengthed slice for each exported field. These values will set the null status for each of the resulting value containers in the DataFrame, from left-to-right. If a "tada" tag has any other value, the resulting value container will have the same name as the tag value. Otherwise, the value container will have the same name as the exported field.
func ReadStructSlice ¶ added in v0.5.1
ReadStructSlice reads a slice of structs into a DataFrame with field names converted to column names, field values converted to column values, and default labels. The structs must all be of the same type.
A default label level named *0 is inserted ([]int incrementing from 0). Default label names are hidden on printing.
func (*DataFrame) Append ¶
Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Returns a new DataFrame.
func (*DataFrame) Apply ¶
Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new DataFrame.
func (*DataFrame) At ¶
At returns the Element at the row and column index positions. If row or column is out of range, returns nil.
func (*DataFrame) CSVRecords ¶ added in v0.6.0
func (df *DataFrame) CSVRecords(options ...WriteOption) [][]string
CSVRecords writes a DataFrame to a [][]string with rows as the major dimension. Null values are replaced with "(null)".
func (*DataFrame) Cast ¶
Cast coerces the underlying container values (column or label level) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time and caches the []byte values of the container (if inexpensive). Use cast to improve performance when calling multiple operations on values.
func (*DataFrame) Col ¶
Col finds the first column with matching name and returns as a Series. Similar to SelectLabels(), but selects column values instead of label values.
func (*DataFrame) Copy ¶
Copy returns a new DataFrame with identical values as the original but no shared objects (i.e., all internals are newly allocated).
func (*DataFrame) DeduplicateNames ¶
DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Returns a new DataFrame.
func (*DataFrame) DropLabels ¶
DropLabels drops the first label level matching name. Returns a new DataFrame.
func (*DataFrame) DropNull ¶
DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Returns a new DataFrame.
func (*DataFrame) DropRow ¶
DropRow removes the row at the specified index. Returns a new DataFrame.
func (*DataFrame) EqualsCSV ¶
func (df *DataFrame) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both df and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.
If includeLabels is true, then df's labels are included as columns.
func (*DataFrame) FillNull ¶
func (df *DataFrame) FillNull(how map[string]NullFiller) *DataFrame
FillNull fills null values and makes them non-null based on how, a map of container names (either column or label names) and tada.NullFiller structs. For each container name in the map, the first field selected (i.e., not left blank) in its NullFiller struct is the strategy used to replace null values in that container. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Returns a new DataFrame.
func (*DataFrame) Filter ¶
Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.
Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new DataFrame.
Example ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { dt1 := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC) dt2 := dt1.AddDate(0, 0, 1) df := tada.NewDataFrame([]interface{}{ []float64{1, 2, 3}, []time.Time{dt1, dt2, dt1}}, ). SetColNames([]string{"foo", "bar"}) fmt.Println(df) gt1 := func(val interface{}) bool { return val.(float64) > 1 } beforeDate := func(val interface{}) bool { return val.(time.Time).Before(dt2) } ret := df.Filter(map[string]tada.FilterFn{ "foo": gt1, "bar": beforeDate, }) fmt.Println(ret) }
Output: +---++-----+----------------------+ | - || foo | bar | |---||-----|----------------------| | 0 || 1 | 2020-01-01T00:00:00Z | | 1 || 2 | 2020-01-02T00:00:00Z | | 2 || 3 | 2020-01-01T00:00:00Z | +---++-----+----------------------+ +---++-----+----------------------+ | - || foo | bar | |---||-----|----------------------| | 2 || 3 | 2020-01-01T00:00:00Z | +---++-----+----------------------+
func (*DataFrame) FilterByValue ¶ added in v0.3.5
FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Returns a new DataFrame.
func (*DataFrame) FilterCols ¶
FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.
func (*DataFrame) FilterIndex ¶ added in v0.7.6
FilterIndex returns the index positions of the rows in container that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil.
func (*DataFrame) GetLabels ¶ added in v0.3.5
func (df *DataFrame) GetLabels() []interface{}
GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame(). NB: If supplying this output to either of these constructors, be sure to use the spread operator (...), or else the labels will not be read as separate levels.
func (*DataFrame) GroupBy ¶
func (df *DataFrame) GroupBy(names ...string) *GroupedDataFrame
GroupBy groups the DataFrame rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedDataFrame.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2, 3, 4}, }, []string{"foo", "bar", "foo", "bar"}). SetColNames([]string{"baz"}) g := df.GroupBy() fmt.Println(g) }
Output: +-----++-----+ | - || baz | |-----||-----| | foo || 1 | | || 3 | | bar || 2 | | || 4 | +-----++-----+
func (*DataFrame) HasCols ¶
HasCols returns an error if the DataFrame does not contain all of the colNames supplied.
func (*DataFrame) HasLabels ¶ added in v0.5.0
HasLabels returns an error if the DataFrame does not contain all of the labelNames supplied.
func (*DataFrame) HasType ¶ added in v0.3.8
HasType returns the index positions of all label and column containers containing a slice of values where reflect.Type.String() == sliceType. Container index positions may then be supplied to df.SubsetLabels() or df.SubsetCols().
For example, to search for datetime labels: labels, _ := df.HasType("[]time.Time")
To search for float64 columns: _, cols := df.HasType("[]float64")
func (*DataFrame) Head ¶
Head returns the first n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.
func (*DataFrame) InPlace ¶
func (df *DataFrame) InPlace() *DataFrameMutator
InPlace returns a DataFrameMutator, which contains most of the same methods as DataFrame but never returns a new DataFrame. If you want to save memory and improve performance and do not need to preserve the original DataFrame, consider using InPlace().
func (*DataFrame) IndexOfContainer ¶
IndexOfContainer returns the index position of the first container with a name matching name (case-sensitive). If name does not match any container, -1 is returned. If columns is true, only column names will be searched. If columns is false, only label level names will be searched.
func (*DataFrame) InterfaceRecords ¶ added in v0.7.6
func (df *DataFrame) InterfaceRecords(options ...WriteOption) [][]interface{}
InterfaceRecords writes a DataFrame to a [][]interface{} with columns as the major dimension. Null values are replaced with "(null)".
func (*DataFrame) IsNull ¶ added in v0.7.6
IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Returns a new DataFrame.
func (*DataFrame) Iterator ¶ added in v0.2.0
func (df *DataFrame) Iterator() *DataFrameIterator
Iterator returns an iterator which may be used to access the values in each row as map[string]Element.
func (*DataFrame) LabelsAsSeries ¶ added in v0.6.2
LabelsAsSeries finds the first label level with matching name and returns the values as a Series. Similar to Col(), but selects label values instead of column values. The labels in the Series are shared with the labels in the DataFrame. If label level name is default (prefixed with *), the prefix is removed.
func (*DataFrame) ListColNames ¶ added in v0.2.0
ListColNames returns the name of all the columns in the DataFrame, in order. If df has multiple column levels, each column name is a single string with level values separated by "|" (may be changed with SetOptionDefaultSeparator). To return the names at a specific level, use ListColNamesAtLevel().
func (*DataFrame) ListColNamesAtLevel ¶ added in v0.2.0
ListColNamesAtLevel returns the name of all the columns in the DataFrame, in order, at the supplied column level. If level is out of range, returns a nil slice.
func (*DataFrame) ListLabelNames ¶
ListLabelNames returns the name of all the label levels in the DataFrame, in order.
func (*DataFrame) Lookup ¶
func (df *DataFrame) Lookup(other *DataFrame, options ...JoinOption) (*DataFrame, error)
Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.
Lookup identifies the row alignment between df and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:
df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo
Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:
FOO BAR bar (null) baz corge
Returns a new DataFrame.
func (*DataFrame) Max ¶
Max coerces the values in each column to float64 and returns the maximum non-null value in each column.
func (*DataFrame) Mean ¶
Mean coerces the values in each column to float64 and calculates the mean of each column.
func (*DataFrame) Median ¶
Median coerces the values in each column to float64 and calculates the median of each column.
func (*DataFrame) Merge ¶
func (df *DataFrame) Merge(other *DataFrame, options ...JoinOption) (*DataFrame, error)
Merge joins other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.
Merge identifies the row alignment between df and other and appends aligned values as new columns on df. Rows are aligned when 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:
df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo
Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:
df FOO BAR QUX bar 0 null baz 1 corge
Finally, all container names (columns and label names) are deduplicated after the merge so that they are unique. Returns a new DataFrame.
func (*DataFrame) Min ¶
Min coerces the values in each column to float64 and returns the minimum non-null value in each column.
func (*DataFrame) NameOfCol ¶ added in v0.2.0
NameOfCol returns the name of the column at index position n. If n is out of range, returns "-out of range-"
func (*DataFrame) NameOfLabel ¶
NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"
func (*DataFrame) NumColumns ¶ added in v0.3.6
NumColumns returns the number of colums in the DataFrame.
func (*DataFrame) NumLevels ¶ added in v0.3.6
NumLevels returns the number of label levels in the DataFrame.
func (*DataFrame) PivotTable ¶
PivotTable creates a spreadsheet-style pivot table as a DataFrame by grouping rows using the unique values in labels, reducing the values in values using an aggFunc aggregation function, then promoting the unique values in columns to be new columns. labels, columns, and values should all refer to existing container names (either columns or labels). Supported aggFuncs: sum, mean, median, stdDev, count, min, max.
func (*DataFrame) PromoteToColLevel ¶
PromoteToColLevel pivots an existing container (either column or label names) into a new column level. If promoting would use either the last column or index level, it returns an error. Each unique value in the stacked column is stacked above each existing column. Promotion can add new columns and remove label rows with duplicate values.
func (*DataFrame) Range ¶
Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is greater than the length of the DataFrame, an error is returned. Returns a new DataFrame.
func (*DataFrame) Reduce ¶ added in v0.7.6
Reduce uses lambda to reduce all columns to a Series named name with column names as labels and reduced values as row values. The type of the new Series is a slice with the same type as the first value outputted by the anonymous function.
func (*DataFrame) Relabel ¶
Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new DataFrame.
func (*DataFrame) ReorderCols ¶ added in v0.6.8
ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.
func (*DataFrame) ReorderLabels ¶ added in v0.6.8
ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.
func (*DataFrame) Resample ¶ added in v0.2.6
Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.
Returns a new DataFrame.
func (*DataFrame) ResetLabels ¶
ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level. If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Returns a new DataFrame.
func (*DataFrame) Series ¶ added in v0.5.3
Series converts a single-columned DataFrame to a Series that shares the same underlying values and labels.
func (*DataFrame) SetAsLabels ¶ added in v0.3.6
SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Returns a new DataFrame.
func (*DataFrame) SetColNames ¶
SetColNames sets the names of all the columns in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}, []string{"baz", "qux"}}, ). SetColNames([]string{"foo", "bar"}) fmt.Println(df) }
Output: +---++-----+-----+ | - || foo | bar | |---||-----|-----| | 0 || 1 | baz | | 1 || 2 | qux | +---++-----+-----+
func (*DataFrame) SetLabelNames ¶
SetLabelNames sets the names of all the label levels in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{[]float64{1, 2}}). SetLabelNames([]string{"baz"}) fmt.Println(df) }
Output: +-----++---+ | baz || 0 | |-----||---| | 0 || 1 | | 1 || 2 | +-----++---+
Example (Multiple) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame( []interface{}{[]float64{1, 2}}, []int{0, 1}, []string{"foo", "bar"}, ). SetColNames([]string{"A"}). SetLabelNames([]string{"baz", "qux"}) fmt.Println(df) }
Output: +-----+-----++---+ | baz | qux || A | |-----|-----||---| | 0 | foo || 1 | | 1 | bar || 2 | +-----+-----++---+
func (*DataFrame) SetNulls ¶ added in v0.3.6
SetNulls overwrites the underlying boolean slice that records whether each value is null or not for the container at position n (either labels or columns).
func (*DataFrame) SetRows ¶ added in v0.7.6
SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new DataFrame.
func (*DataFrame) Shuffle ¶ added in v0.6.11
Shuffle randomizes the row order of the DataFrame. Returns a new DataFrame.
func (*DataFrame) Sort ¶
Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new DataFrame.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{2, 2, 1}, []string{"b", "c", "a"}}, ). SetColNames([]string{"foo", "bar"}) fmt.Println(df) // first sort by foo in ascending order, then sort by bar in descending order ret := df.Sort( // Float64 is the default sorting DType, and ascending is the default ordering tada.Sorter{Name: "foo"}, tada.Sorter{Name: "bar", DType: tada.String, Descending: true}, ) fmt.Println(ret) }
Output: +---++-----+-----+ | - || foo | bar | |---||-----|-----| | 0 || 2 | b | | 1 || | c | | 2 || 1 | a | +---++-----+-----+ +---++-----+-----+ | - || foo | bar | |---||-----|-----| | 2 || 1 | a | | 1 || 2 | c | | 0 || | b | +---++-----+-----+
func (*DataFrame) StdDev ¶ added in v0.5.3
StdDev coerces the values in each column to float64 and calculates the standard deviation of each column.
func (*DataFrame) String ¶
String prints the DataFrame in table form, with the number of rows constrained by optionMaxRows, and the number of columns constrained by optionMaxColumns, which may be configured with PrintOptionMaxRows(n) and PrintOptionMaxColumns(n), respectively. By default, repeated values are merged together, but this behavior may be disabled with PrintOptionAutoMerge(false). By default, overly-wide non-header cells are truncated, but this behavior may be changed to wrapping with PrintOptionWrapLines(true).
func (*DataFrame) Struct ¶ added in v0.5.3
func (df *DataFrame) Struct(structPointer interface{}, options ...WriteOption) error
Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame( []interface{}{ []float64{1, 2}, }, []string{"baz", "qux"}, ).SetLabelNames([]string{"foo"}). SetColNames([]string{"bar"}) type output struct { Foo []string `tada:"foo"` Bar []float64 `tada:"bar"` } var out output df.Struct(&out) fmt.Printf("%#v", out) }
Output: tada_test.output{Foo:[]string{"baz", "qux"}, Bar:[]float64{1, 2}}
Example (WithNulls) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame( []interface{}{ []float64{1, 2}, }, []string{"", "qux"}, ).SetLabelNames([]string{"foo"}). SetColNames([]string{"bar"}) type output struct { Foo []string `tada:"foo"` Bar []float64 `tada:"bar"` Nulls [][]bool `tada:"isNull"` } var out output df.Struct(&out) fmt.Printf("%#v", out) }
Output: tada_test.output{Foo:[]string{"", "qux"}, Bar:[]float64{1, 2}, Nulls:[][]bool{[]bool{true, false}, []bool{false, false}}}
func (*DataFrame) Subset ¶
Subset returns only the rows specified at the index positions, in the order specified. Returns a new DataFrame.
func (*DataFrame) SubsetCols ¶
SubsetCols returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.
func (*DataFrame) SubsetLabels ¶
SubsetLabels returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.
func (*DataFrame) SumCols ¶ added in v0.5.1
SumCols finds each column matching a supplied colName, coerces its values to float64, and adds them row-wise. The resulting Series is named name. If any column has a null value for a given row, that row is considered null.
func (*DataFrame) SwapLabels ¶
SwapLabels swaps the label levels with names i and j. Returns a new DataFrame.
func (*DataFrame) Tail ¶
Tail returns the last n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.
func (*DataFrame) Transpose ¶
Transpose transposes rows into columns. Row values become column values, column names become labels, labels become column names (and multi-level labels become multi-level columns) and label level names swap with column level names. For example a DataFrame with 2 rows and 1 column has 2 columns and 1 row after transposition. Because rows can contain heterogenous types, every column is coerced to []interface{}.
func (*DataFrame) Where ¶ added in v0.3.0
func (df *DataFrame) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)
Where iterates over the rows in df and evaluates whether each one satisfies filters, which is a map of container names (either column or label names) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.
Returns an unnamed Series with a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []int{1, 2}}, ). SetColNames([]string{"foo"}) fmt.Println(df) gt1 := func(val interface{}) bool { return val.(int) > 1 } ret, _ := df.Where(map[string]tada.FilterFn{"foo": gt1}, true, false) fmt.Println(ret) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ +---++-------+ | - || | |---||-------| | 0 || false | | 1 || true | +---++-------+
func (*DataFrame) WithCol ¶
WithCol resolves as follows:
If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.
If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.
In all cases, returns a new DataFrame.
Example (Append) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}}, ). SetColNames([]string{"foo"}) fmt.Println(df) ret := df.WithCol("bar", []bool{false, true}) fmt.Println(ret) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ +---++-----+-------+ | - || foo | bar | |---||-----|-------| | 0 || 1 | false | | 1 || 2 | true | +---++-----+-------+
Example (Overwrite) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}}, ). SetColNames([]string{"foo"}) fmt.Println(df) ret := df.WithCol("foo", []string{"baz", "qux"}) fmt.Println(ret) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ +---++-----+ | - || foo | |---||-----| | 0 || baz | | 1 || qux | +---++-----+
Example (Rename) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}}, ). SetColNames([]string{"foo"}) fmt.Println(df) ret := df.WithCol("foo", "qux") fmt.Println(ret) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ +---++-----+ | - || qux | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+
func (*DataFrame) WithLabels ¶
WithLabels resolves as follows:
If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.
If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.
In all cases, returns a new DataFrame.
type DataFrameIterator ¶ added in v0.2.0
type DataFrameIterator struct {
// contains filtered or unexported fields
}
A DataFrameIterator iterates over the rows in a DataFrame.
func (*DataFrameIterator) Next ¶ added in v0.2.0
func (iter *DataFrameIterator) Next() bool
Next advances to next row. Returns false at end of iteration.
func (*DataFrameIterator) Row ¶ added in v0.2.0
func (iter *DataFrameIterator) Row() map[string]Element
Row returns the current row in the DataFrame as a map. The map keys are the names of containers (including label levels). The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Element of the left-most column are returned.
type DataFrameMutator ¶
type DataFrameMutator struct {
// contains filtered or unexported fields
}
A DataFrameMutator is used to change DataFrame values in place.
func (*DataFrameMutator) Append ¶
func (df *DataFrameMutator) Append(other *DataFrame) error
Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) Apply ¶
func (df *DataFrameMutator) Apply(lambdas map[string]ApplyFn) error
Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be changed in-place within the anonymous function. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) DeduplicateNames ¶
func (df *DataFrameMutator) DeduplicateNames()
DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) DropCol ¶
func (df *DataFrameMutator) DropCol(name string) error
DropCol drops the first column matching name. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) DropLabels ¶
func (df *DataFrameMutator) DropLabels(name string) error
DropLabels drops the first label level matching name. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) DropNull ¶
func (df *DataFrameMutator) DropNull(subset ...string) error
DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Modifies the underlying DataFrame.
func (*DataFrameMutator) DropRow ¶
func (df *DataFrameMutator) DropRow(index int) error
DropRow removes the row at the specified index. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) FillNull ¶
func (df *DataFrameMutator) FillNull(how map[string]NullFiller) error
FillNull fills null values and makes them non-null based on how. How is a map of container names (either column or label names) and NullFillers. For each container name supplied, the first field selected (i.e., not left blank) in the NullFiller is the strategy used to replace null values. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Modifies the underlying DataFrame.
func (*DataFrameMutator) Filter ¶
func (df *DataFrameMutator) Filter(filters map[string]FilterFn) error
Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.
Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) FilterByValue ¶ added in v0.3.5
func (df *DataFrameMutator) FilterByValue(filters map[string]interface{}) error
FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) FilterCols ¶ added in v0.2.0
func (df *DataFrameMutator) FilterCols(lambda func(string) bool, level int) error
FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.
func (*DataFrameMutator) IsNull ¶ added in v0.7.6
func (df *DataFrameMutator) IsNull(subset ...string) error
IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Modifies the underlying DataFrame.
func (*DataFrameMutator) Range ¶ added in v0.7.6
func (df *DataFrameMutator) Range(first, last int) error
Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If first or last is out of range, an error is returned. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) Relabel ¶
func (df *DataFrameMutator) Relabel()
Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying DataFrame in place.
func (*DataFrameMutator) ReorderCols ¶ added in v0.6.8
func (df *DataFrameMutator) ReorderCols(colNames []string) error
ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.
func (*DataFrameMutator) ReorderLabels ¶ added in v0.6.8
func (df *DataFrameMutator) ReorderLabels(levelNames []string) error
ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.
func (*DataFrameMutator) Resample ¶ added in v0.2.6
func (df *DataFrameMutator) Resample(how map[string]Resampler) error
Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.
Modifies the underlying DataFrame in place.
func (*DataFrameMutator) ResetLabels ¶
func (df *DataFrameMutator) ResetLabels(labelLevels ...string) error
ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level(s). If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) SetAsLabels ¶ added in v0.3.6
func (df *DataFrameMutator) SetAsLabels(colNames ...string)
SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) SetRows ¶ added in v0.7.6
func (df *DataFrameMutator) SetRows(lambda ApplyFn, container string, rows []int) error
SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying DataFrame.
func (*DataFrameMutator) Shuffle ¶ added in v0.6.11
func (df *DataFrameMutator) Shuffle(seed int64)
Shuffle randomizes the row order of the DataFrame. Modifies the underlying DataFrame.
func (*DataFrameMutator) Sort ¶
func (df *DataFrameMutator) Sort(by ...Sorter) error
Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) Subset ¶
func (df *DataFrameMutator) Subset(index []int) error
Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) SubsetCols ¶
func (df *DataFrameMutator) SubsetCols(index []int) error
SubsetCols returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) SubsetLabels ¶
func (df *DataFrameMutator) SubsetLabels(index []int) error
SubsetLabels returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) SwapLabels ¶
func (df *DataFrameMutator) SwapLabels(i, j string) error
SwapLabels swaps the label levels with names i and j. Modifies the underlying DataFrame in place.
func (*DataFrameMutator) WithCol ¶
func (df *DataFrameMutator) WithCol(name string, input interface{}) error
WithCol resolves as follows:
If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.
If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.
In all cases, modifies the underlying DataFrame in place.
Example (Rename) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { df := tada.NewDataFrame([]interface{}{ []float64{1, 2}}, ). SetColNames([]string{"foo"}) fmt.Println(df) df.InPlace().WithCol("foo", "qux") fmt.Println(df) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ +---++-----+ | - || qux | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+
func (*DataFrameMutator) WithLabels ¶
func (df *DataFrameMutator) WithLabels(name string, input interface{}) error
WithLabels resolves as follows:
If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.
If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.
In all cases, modifies the underlying DataFrame in place.
type Element ¶
type Element struct { Val interface{} IsNull bool }
An Element is one {value, null status} pair in either a Series or DataFrame.
type FilterFn ¶
type FilterFn func(value interface{}) bool
A FilterFn is an anonymous function supplied to a Filter or Where function. The function will be called on every val in the container.
type GroupedDataFrame ¶
type GroupedDataFrame struct {
// contains filtered or unexported fields
}
A GroupedDataFrame is a collection of row positions sharing the same group key. A GroupedDataFrame has a reference to an underlying DataFrame, which is used for reduce operations.
func (*GroupedDataFrame) Apply ¶ added in v0.7.0
func (g *GroupedDataFrame) Apply(cols []string, lambda ApplyFn) *GroupedDataFrame
Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null) from a single column. Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.
func (*GroupedDataFrame) Col ¶
func (g *GroupedDataFrame) Col(colName string) *GroupedSeries
Col isolates the Series at containerName, which may be either a label level or column in the underlying DataFrame. Returns a GroupedSeries with the same groups and labels as in the GroupedDataFrame.
func (*GroupedDataFrame) Count ¶
func (g *GroupedDataFrame) Count(colNames ...string) *DataFrame
Count returns the number of non-null values in each group for the columns in colNames.
func (*GroupedDataFrame) DataFrame ¶ added in v0.4.10
func (g *GroupedDataFrame) DataFrame() *DataFrame
DataFrame returns the GroupedDataFrame as a DataFrame, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name. Columns used as label levels are dropped.
func (*GroupedDataFrame) Earliest ¶
func (g *GroupedDataFrame) Earliest(colNames ...string) *DataFrame
Earliest coerces the column values in colNames to time.Time and calculates the earliest timestamp of each group.
func (*GroupedDataFrame) Err ¶
func (g *GroupedDataFrame) Err() error
Err returns the underlying error, if any
func (*GroupedDataFrame) First ¶
func (g *GroupedDataFrame) First(colNames ...string) *DataFrame
First returns the first row within each group for the columns in colNames.
func (*GroupedDataFrame) GetGroup ¶
func (g *GroupedDataFrame) GetGroup(group string) *DataFrame
GetGroup returns the grouped rows sharing the same group key as a new DataFrame.
func (*GroupedDataFrame) GetLabels ¶ added in v0.4.3
func (g *GroupedDataFrame) GetLabels() []interface{}
GetLabels returns the grouped label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().
func (*GroupedDataFrame) HavingCount ¶
func (g *GroupedDataFrame) HavingCount(lambda func(int) bool) *GroupedDataFrame
HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).
func (*GroupedDataFrame) Iterator ¶ added in v0.2.0
func (g *GroupedDataFrame) Iterator() *GroupedDataFrameIterator
Iterator returns an iterator which may be used to access each group of rows as a new DataFrame, in the order in which the groups originally appeared.
func (*GroupedDataFrame) Last ¶
func (g *GroupedDataFrame) Last(colNames ...string) *DataFrame
Last returns the last row within each group for the columns in colNames.
func (*GroupedDataFrame) Latest ¶
func (g *GroupedDataFrame) Latest(colNames ...string) *DataFrame
Latest coerces the column values in colNames to time.Time and calculates the latest timestamp of each group.
func (*GroupedDataFrame) Len ¶
func (g *GroupedDataFrame) Len() int
Len returns the number of group labels.
func (*GroupedDataFrame) ListGroups ¶
func (g *GroupedDataFrame) ListGroups() []string
ListGroups returns a list of group keys in the order in which they originally appeared.
func (*GroupedDataFrame) Max ¶
func (g *GroupedDataFrame) Max(colNames ...string) *DataFrame
Max coerces the column values in colNames to float64 and calculates the maximum of each group.
func (*GroupedDataFrame) Mean ¶
func (g *GroupedDataFrame) Mean(colNames ...string) *DataFrame
Mean coerces the column values in colNames to float64 and calculates the mean of each group.
func (*GroupedDataFrame) Median ¶
func (g *GroupedDataFrame) Median(colNames ...string) *DataFrame
Median coerces the column values in colNames to float64 and calculates the median of each group.
func (*GroupedDataFrame) Min ¶
func (g *GroupedDataFrame) Min(colNames ...string) *DataFrame
Min coerces the column values in colNames to float64 and calculates the minimum of each group.
func (*GroupedDataFrame) NUnique ¶
func (g *GroupedDataFrame) NUnique(colNames ...string) *DataFrame
NUnique returns the number of unique, non-null values in each group for the columns in colNames.
func (*GroupedDataFrame) Nth ¶
func (g *GroupedDataFrame) Nth(index int, colNames ...string) *DataFrame
Nth returns the row at position n (if it exists) within each group for the columns in colNames.
func (*GroupedDataFrame) Reduce ¶
func (g *GroupedDataFrame) Reduce(name string, cols []string, lambda ReduceFn) *DataFrame
Reduce iterates over the groups in the GroupedDataFrame and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new DataFrame named "name_originalDataFrameName" with columns named "name_originalColumnName" where each reduced group is represented by a single row.
The columns in the new DataFrame will be slices of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values in each column, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into new []float64 columns and return in a new DataFrame.
func (*GroupedDataFrame) StdDev ¶ added in v0.5.3
func (g *GroupedDataFrame) StdDev(colNames ...string) *DataFrame
StdDev coerces the column values in colNames to float64 and calculates the standard deviation of each group.
func (*GroupedDataFrame) String ¶ added in v0.4.10
func (g *GroupedDataFrame) String() string
func (*GroupedDataFrame) Sum ¶
func (g *GroupedDataFrame) Sum(colNames ...string) *DataFrame
Sum coerces the column values in colNames to float64 and calculates the sum of each group.
type GroupedDataFrameIterator ¶ added in v0.2.0
type GroupedDataFrameIterator struct {
// contains filtered or unexported fields
}
GroupedDataFrameIterator iterates over all DataFrames in the group.
func (*GroupedDataFrameIterator) DataFrame ¶ added in v0.2.0
func (g *GroupedDataFrameIterator) DataFrame() *DataFrame
DataFrame returns the current grouped DataFrame.
func (*GroupedDataFrameIterator) Next ¶ added in v0.2.0
func (g *GroupedDataFrameIterator) Next() bool
Next advances to next grouped DataFrame. Returns false at end of iteration.
type GroupedSeries ¶
type GroupedSeries struct {
// contains filtered or unexported fields
}
A GroupedSeries is a collection of row positions sharing the same group key. A GroupedSeries has a reference to an underlying Series, which is used for reduce operations.
func (*GroupedSeries) Align ¶
func (g *GroupedSeries) Align() *GroupedSeries
Align changes subsequent reduce operations for this group to return a Series aligned with the original Series labels (the default behavior is to return a Series with one label per group). If the original Series is:
FOO baz 0 baz 1 bar 2 bar 4
and it is grouped by the "foo" label, then the default g.Sum() reducer would return:
FOO baz 1 bar 6
After g.Align(), the g.Sum() reducer would return:
FOO baz 1 baz 1 bar 6 bar 6
Example (Mean) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}). SetName("foo"). SetLabelNames([]string{"baz"}) fmt.Println(s) // here, s.GroupBy("baz") is equivalent to s.GroupBy() g := s.GroupBy("baz") fmt.Println(g.Align().Mean()) }
Output: +-----++-----+ | baz || foo | |-----||-----| | 0 || 1 | | 1 || 2 | | 0 || 3 | | 1 || 4 | +-----++-----+ +-----++----------+ | baz || mean_foo | |-----||----------| | 0 || 2 | | 1 || 3 | | 0 || 2 | | 1 || 3 | +-----++----------+
func (*GroupedSeries) Apply ¶ added in v0.7.0
func (g *GroupedSeries) Apply(lambda ApplyFn) *GroupedSeries
Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null). Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}). SetName("foobar"). SetLabelNames([]string{"baz", "qux"}) fmt.Println(s) g := s.GroupBy("baz") // if group has at least 3 items, multiply by 2. otherwise set as null. modifyBigGroup := func(slice interface{}, isNull []bool) interface{} { vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error ret := make([]float64, len(vals)) if len(vals) >= 3 { for i := range ret { ret[i] = vals[i] * 2 } } else { for i := range ret { isNull[i] = true } } return ret } fmt.Println(g.Apply(modifyBigGroup).Series()) }
Output: +-----+-----++--------+ | baz | qux || foobar | |-----|-----||--------| | bar | 0 || 1 | | | 1 || 2 | | foo | 2 || 3 | | bar | 3 || 4 | +-----+-----++--------+ +-----++--------+ | baz || foobar | |-----||--------| | bar || 2 | | || 4 | | || 8 | | foo || (null) | +-----++--------+
Example (Align) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}). SetName("foobar"). SetLabelNames([]string{"baz", "qux"}) fmt.Println(s) g := s.GroupBy("baz") // if group has at least 3 items, multiply by 2. otherwise set as null. modifyBigGroup := func(slice interface{}, isNull []bool) interface{} { vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error ret := make([]float64, len(vals)) if len(vals) >= 3 { for i := range ret { ret[i] = vals[i] * 2 } } else { for i := range ret { isNull[i] = true } } return ret } g.Align() fmt.Println(g.Apply(modifyBigGroup).Series()) }
Output: +-----+-----++--------+ | baz | qux || foobar | |-----|-----||--------| | bar | 0 || 1 | | | 1 || 2 | | foo | 2 || 3 | | bar | 3 || 4 | +-----+-----++--------+ +-----+-----++--------+ | baz | qux || foobar | |-----|-----||--------| | bar | 0 || 2 | | | 1 || 4 | | foo | 2 || (null) | | bar | 3 || 8 | +-----+-----++--------+
func (*GroupedSeries) Count ¶
func (g *GroupedSeries) Count() *Series
Count returns the number of non-null values in each group.
func (*GroupedSeries) Earliest ¶
func (g *GroupedSeries) Earliest() *Series
Earliest coerces the Series values to time.Time and calculates the earliest timestamp in each group.
func (*GroupedSeries) Err ¶
func (g *GroupedSeries) Err() error
Err returns the underlying error, if any.
func (*GroupedSeries) First ¶
func (g *GroupedSeries) First() *Series
First returns the first row in each group.
func (*GroupedSeries) GetGroup ¶
func (g *GroupedSeries) GetGroup(group string) *Series
GetGroup returns the grouped rows sharing the same group key as a new Series.
func (*GroupedSeries) GetLabels ¶ added in v0.4.3
func (g *GroupedSeries) GetLabels() []interface{}
GetLabels returns the grouped label levels as interface{} slices within an []interface returns the group's labels as slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().
func (*GroupedSeries) HavingCount ¶
func (g *GroupedSeries) HavingCount(lambda func(int) bool) *GroupedSeries
HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).
Example (Sum) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 1, 1}). SetName("foo"). SetLabelNames([]string{"baz"}) fmt.Println(s) countOf3 := func(n int) bool { return n == 3 } g := s.GroupBy("baz") fmt.Println(g.HavingCount(countOf3).Sum()) }
Output: +-----++-----+ | baz || foo | |-----||-----| | 0 || 1 | | 1 || 2 | | || 3 | | || 4 | +-----++-----+ +-----++---------+ | baz || sum_foo | |-----||---------| | 1 || 9 | +-----++---------+
func (*GroupedSeries) Iterator ¶ added in v0.2.0
func (g *GroupedSeries) Iterator() *GroupedSeriesIterator
Iterator returns an iterator which may be used to access each group of rows as a new Series, in the order in which the groups originally appeared.
func (*GroupedSeries) Last ¶
func (g *GroupedSeries) Last() *Series
Last returns the last row in each group.
func (*GroupedSeries) Latest ¶
func (g *GroupedSeries) Latest() *Series
Latest coerces the Series values to time.Time and calculates the latest timestamp in each group.
func (*GroupedSeries) Len ¶
func (g *GroupedSeries) Len() int
Len returns the number of group labels.
func (*GroupedSeries) ListGroups ¶
func (g *GroupedSeries) ListGroups() []string
ListGroups returns a list of group keys in the order in which they originally appeared.
func (*GroupedSeries) Max ¶
func (g *GroupedSeries) Max() *Series
Max coerces values to float64 and calculates the maximum of each group.
func (*GroupedSeries) Mean ¶
func (g *GroupedSeries) Mean() *Series
Mean coerces values to float64 and calculates the mean of each group.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}). SetName("foo"). SetLabelNames([]string{"baz"}) fmt.Println(s) // here, s.GroupBy("baz") is equivalent to s.GroupBy() g := s.GroupBy("baz") fmt.Println(g.Mean()) }
Output: +-----++-----+ | baz || foo | |-----||-----| | 0 || 1 | | 1 || 2 | | 0 || 3 | | 1 || 4 | +-----++-----+ +-----++----------+ | baz || mean_foo | |-----||----------| | 0 || 2 | | 1 || 3 | +-----++----------+
func (*GroupedSeries) Median ¶
func (g *GroupedSeries) Median() *Series
Median coerces values to float64 and calculates the median of each group.
func (*GroupedSeries) Min ¶
func (g *GroupedSeries) Min() *Series
Min coerces values to float64 and calculates the minimum of each group.
func (*GroupedSeries) NUnique ¶
func (g *GroupedSeries) NUnique() *Series
NUnique returns the number of unique values in each group.
func (*GroupedSeries) Nth ¶
func (g *GroupedSeries) Nth(n int) *Series
Nth returns the row at position n (if it exists) within each group.
func (*GroupedSeries) Reduce ¶
func (g *GroupedSeries) Reduce(name string, lambda ReduceFn) *Series
Reduce iterates over the groups in the GroupedSeries and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new Series named "name_originalColName" where each reduced group is represented by a single row.
The new Series will be a slice of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into a new []float64 and return in a new Series.
Example ¶
package main import ( "fmt" "math" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4, 5, 6}, []int{0, 0, 0, 1, 1, 1}). SetName("foo"). SetLabelNames([]string{"baz"}) fmt.Println(s) g := s.GroupBy("baz") maxOdd := func(slice interface{}, isNull []bool) (value interface{}, null bool) { vals := slice.([]float64) max := math.Inf(-1) for i := range vals { if !isNull[i] && int(vals[i])%2 == 1 && vals[i] > max { max = vals[i] } } return max, false } fmt.Println(g.Reduce("max_odd", maxOdd)) }
Output: +-----++-----+ | baz || foo | |-----||-----| | 0 || 1 | | || 2 | | || 3 | | 1 || 4 | | || 5 | | || 6 | +-----++-----+ +-----++-------------+ | baz || max_odd_foo | |-----||-------------| | 0 || 3 | | 1 || 5 | +-----++-------------+
func (*GroupedSeries) Series ¶ added in v0.4.10
func (g *GroupedSeries) Series() *Series
Series returns the GroupedSeries as a Series, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name.
func (*GroupedSeries) StdDev ¶ added in v0.5.3
func (g *GroupedSeries) StdDev() *Series
StdDev coerces values to float64 and calculates the standard deviation of each group.
func (*GroupedSeries) String ¶
func (g *GroupedSeries) String() string
func (*GroupedSeries) Sum ¶
func (g *GroupedSeries) Sum() *Series
Sum coerces values to float64 and calculates the sum of each group.
type GroupedSeriesIterator ¶ added in v0.2.0
type GroupedSeriesIterator struct {
// contains filtered or unexported fields
}
GroupedSeriesIterator iterates over all Series in the group.
func (*GroupedSeriesIterator) Next ¶ added in v0.2.0
func (g *GroupedSeriesIterator) Next() bool
Next advances to next grouped Series. Returns false at end of iteration.
func (*GroupedSeriesIterator) Series ¶ added in v0.2.0
func (g *GroupedSeriesIterator) Series() *Series
Series returns the current grouped Series.
type JoinOption ¶ added in v0.6.0
type JoinOption func(*joinConfig)
A JoinOption configures a lookup or merge function. Available lookup options: JoinOptionHow, JoinOptionLeftOn, JoinOptionRightOn
type NullFiller ¶
NullFiller fills every row with a null value and changes the row status to not-null. If multiple fields are provided, resolves in the following order: 1) `FillForward` - fills with the last valid value, 2) `FillBackward` - fills with the next valid value, 3) `FillZero` - fills with the zero type of the slice, 4) `FillFloat` - coerces to float64 and fills with the value provided.
type ReadOption ¶ added in v0.4.0
type ReadOption func(*readConfig)
A ReadOption configures a read function. Available read options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter, and ReadOptionSwitchDims.
type ReduceFn ¶ added in v0.7.0
A ReduceFn is an anonymous function supplied to a Reduce function to reduce a slice of values to one value and one null status per group. isNull contains the null status of every value in the group.
type Resampler ¶
type Resampler struct { ByYear bool ByMonth bool ByDay bool ByWeek bool StartOfWeek time.Weekday ByDuration time.Duration Location *time.Location }
Resampler supplies logic for the Resample() function. Only the first `By` field that is selected (i.e., not left nil) is used - any others are ignored (if `ByWeek` is selected, it may be modified by `StartOfWeek`). `ByYear` truncates the timestamp by year. `ByMonth` truncates the timestamp by month. `ByDay` truncates the timestamp by day. `ByWeek` returns the first day of the most recent week (starting on `StartOfWeek`) relative to timestamp. Otherwise, truncates the timestamp `ByDuration`. If `Location` is not provided, time.UTC is used as the default location.
type Series ¶
type Series struct {
// contains filtered or unexported fields
}
A Series is a single column of data with one or more levels of aligned labels.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2}).SetName("foo") fmt.Println(s) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+
Example (NestedSlice) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([][]string{{"foo", "bar"}, {"baz"}, {}}). SetName("a") fmt.Println(s) }
Output: +---++-----------+ | - || a | |---||-----------| | 0 || [foo bar] | | 1 || [baz] | | 2 || (null) | +---++-----------+
Example (SetNaNStatus) ¶
package main import ( "fmt" "math" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{0, math.NaN()}) fmt.Println("isNull:", s.GetNulls()) tada.SetOptionNaNStatus(false) s = tada.NewSeries([]float64{0, math.NaN()}) fmt.Println("isNull:", s.GetNulls()) tada.SetOptionNaNStatus(true) }
Output: isNull: [false true] isNull: [false false]
Example (SetSentinelNulls) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]string{"foo", "", "(null)"}) fmt.Println("default sentinel null values\n isNull:", s.GetNulls()) tada.SetOptionNullStrings(nil) s = tada.NewSeries([]string{"foo", "", "(null)"}) fmt.Println("remove defaults\n isNull:", s.GetNulls()) tada.SetOptionNullStrings(tada.GetOptionDefaultNullStrings()) }
Output: default sentinel null values isNull: [false true true] remove defaults isNull: [false false false]
Example (Zscore) ¶
package main import ( "fmt" "math" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4, 5}).SetName("foo") fmt.Println(s) vals := s.GetValuesAsFloat64() ret := make([]float64, s.Len()) mean := s.Mean() std := s.StdDev() for i := range vals { val := (vals[i] - mean) / std ret[i] = math.Round((val * 100)) / 100 // round to 2 decimal points } df := s.DataFrame().WithCol("zscore_foo", ret) fmt.Println(df) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | | 2 || 3 | | 3 || 4 | | 4 || 5 | +---++-----+ +---++-----+------------+ | - || foo | zscore_foo | |---||-----|------------| | 0 || 1 | -1.41 | | 1 || 2 | -0.71 | | 2 || 3 | 0 | | 3 || 4 | 0.71 | | 4 || 5 | 1.41 | +---++-----+------------+
func NewSeries ¶
func NewSeries(slice interface{}, labels ...interface{}) *Series
NewSeries constructs a Series from a slice of values and optional label slices. // Slice and all labels must be supported slices.
If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Series values are named 0 by default. The default values name is displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.
Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).
func (*Series) Add ¶
Add coerces other and s to float64 values, aligns other with s, and adds the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.
func (*Series) Append ¶
Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.
func (*Series) Apply ¶
Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new Series.
Example (Float64) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3}).SetName("foo") fmt.Println(s) times2 := func(slice interface{}, isNull []bool) interface{} { vals := slice.([]float64) ret := make([]float64, len(vals)) for i := range ret { ret[i] = vals[i] * 2 } return ret } fmt.Println(s.Apply(times2)) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | | 2 || 3 | +---++-----+ +---++-----+ | - || foo | |---||-----| | 0 || 2 | | 1 || 4 | | 2 || 6 | +---++-----+
func (*Series) At ¶
At returns the Element at the index position. If index is out of range, returns nil.
func (*Series) Bin ¶ added in v0.4.9
Bin coerces the Series values to float64 and categorizes each row based on which bin interval it falls within. bins should be a slice of sequential edges that form intervals (left exclusive, right inclusive). For example, [1, 3, 5] represents the intervals 1-3 (excluding 1, including 3), and 3-5 (excluding 3, including 5). If these bins were supplied for a Series with values [3, 4], the returned Series would have values ["1-3", "3-5"]. Null values are not categorized. For default behavior, supply nil as config.
To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo") fmt.Println(s) binned, _ := s.Bin([]float64{0, 2, 4}, nil) fmt.Println(binned) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 3 | | 2 || 5 | +---++-----+ +---++--------+ | - || foo | |---||--------| | 0 || 0-2 | | 1 || 2-4 | | 2 || (null) | +---++--------+
Example (AndMore) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo") fmt.Println(s) binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{AndMore: true}) fmt.Println(binned) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 3 | | 2 || 5 | +---++-----+ +---++-----+ | - || foo | |---||-----| | 0 || 0-2 | | 1 || 2-4 | | 2 || >4 | +---++-----+
Example (CustomLabels) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 3}).SetName("foo") fmt.Println(s) binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{Labels: []string{"low", "high"}}) fmt.Println(binned) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 3 | +---++-----+ +---++------+ | - || foo | |---||------| | 0 || low | | 1 || high | +---++------+
func (*Series) CSV ¶ added in v0.5.3
func (s *Series) CSV(options ...WriteOption) ([][]string, error)
CSV converts a Series to a DataFrame and returns as [][]string.
func (*Series) Cast ¶
Cast casts the underlying container values (either label levels or Series values) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time. To apply to Series values, supply empty string name ("") or the Series name. Use cast to improve performance when calling multiple operations on values.
Example (Date) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{ time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC), }).SetName("foo") fmt.Println(s) s.Cast(map[string]tada.DType{"foo": tada.Date}) fmt.Println(s) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:15:00Z | +---++----------------------+ +---++------------+ | - || foo | |---||------------| | 0 || 2020-01-15 | +---++------------+
Example (Time) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{ time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC), }).SetName("foo") fmt.Println(s) s.Cast(map[string]tada.DType{"foo": tada.Time}) fmt.Println(s) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:15:00Z | +---++----------------------+ +---++----------+ | - || foo | |---||----------| | 0 || 12:15:00 | +---++----------+
func (*Series) Copy ¶
Copy returns a deep copy of a Series with no shared references to the original.
func (*Series) CumSum ¶
CumSum coerces the Series values to float64 and returns the cumulative sum at each row position.
func (*Series) Divide ¶
Divide coerces other and s to float64 values, aligns other with s, and divides the aligned values of s by s, using the labels in s as an anchor. Dividing by 0 always returns a null value. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.
func (*Series) DropLabels ¶
DropLabels removes the first label level matching name. Returns a new Series.
func (*Series) Earliest ¶
Earliest coerces the Series values to time.Time and calculates the earliest timestamp.
func (*Series) EqualsCSV ¶
func (s *Series) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both s and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.
If includeLabels is true, then s's labels are included as columns.
func (*Series) FillNull ¶
func (s *Series) FillNull(how NullFiller) *Series
FillNull fills all the null values and makes them not-null. Returns a new Series.
func (*Series) Filter ¶
Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.
Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new Series.
func (*Series) FilterByValue ¶ added in v0.3.5
FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Returns a new Series.
func (*Series) FilterIndex ¶ added in v0.4.1
FilterIndex returns the index positions of the rows in container (either the Series name or label name) that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil. FilterIndex may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.
func (*Series) GetLabels ¶ added in v0.3.5
func (s *Series) GetLabels() []interface{}
GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().
func (*Series) GetValues ¶
func (s *Series) GetValues() interface{}
GetValues returns a copy of the underlying Series data as an interface.
func (*Series) GetValuesAsFloat64 ¶ added in v0.6.0
GetValuesAsFloat64 coerces the Series values into []float64.
func (*Series) GetValuesAsString ¶ added in v0.6.0
GetValuesAsString coerces the Series values into []string.
func (*Series) GetValuesAsTime ¶ added in v0.6.0
GetValuesAsTime coerces the Series values into []time.Time.
func (*Series) GroupBy ¶
func (s *Series) GroupBy(names ...string) *GroupedSeries
GroupBy groups the Series rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedSeries.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "bar", "foo", "bar"}) g := s.GroupBy() fmt.Println(g) }
Output: +-----++---+ | - || 0 | |-----||---| | foo || 1 | | || 3 | | bar || 2 | | || 4 | +-----++---+
Example (CompoundGroup) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "baz", "foo", "baz"}, []string{"bar", "qux", "bar", "qux"}) g := s.GroupBy() fmt.Println(g) // +-----+-----++---+ // | - | - || 0 | // |-----|-----||---| // | foo | bar || 1 | // | | || 3 | // | baz | qux || 2 | // | | || 4 | // +-----+-----++---+ }
Output:
func (*Series) HasLabels ¶ added in v0.5.0
HasLabels returns an error if the Series does not contain all of the labelNames supplied.
func (*Series) Head ¶
Head returns the first n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.
func (*Series) InPlace ¶
func (s *Series) InPlace() *SeriesMutator
InPlace returns a SeriesMutator, which contains most of the same methods as Series but never returns a new Series. If you want to save memory and improve performance and do not need to preserve the original Series, consider using InPlace().
func (*Series) IndexOfLabel ¶
IndexOfLabel returns the index position of the first label level with a name matching name (case-sensitive). If name does not match any container, -1 is returned.
func (*Series) IsNull ¶ added in v0.8.1
IsNull returns all the rows with null values. Returns a new Series.
func (*Series) Iterator ¶ added in v0.2.0
func (s *Series) Iterator() *SeriesIterator
Iterator returns an iterator which may be used to access the values in each row as map[string]Element.
func (*Series) LabelsAsSeries ¶ added in v0.5.3
LabelsAsSeries finds the first level with matching name and returns as a Series with all existing label levels (including itself). If label level name is default (prefixed with *), removes the prefix. Returns a new Series with shared labels.
func (*Series) Latest ¶
Latest coerces the Series values to time.Time and calculates the latest timestamp.
func (*Series) ListLabelNames ¶
ListLabelNames returns the name and position of all the label levels in the Series
func (*Series) Lookup ¶
func (s *Series) Lookup(other *Series, options ...JoinOption) (*Series, error)
Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.
Lookup identifies the row alignment between s and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:
s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo
Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:
FOO BAR bar null baz corge
Returns a new Series.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo").SetLabelNames([]string{"a"}) fmt.Println("--original Series--") fmt.Println(s) s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetLabelNames([]string{"a"}) fmt.Println("--Series to lookup--") fmt.Println(s2) fmt.Println("--result--") lookup, _ := s.Lookup(s2) fmt.Println(lookup) }
Output: --original Series-- +---++-----+ | a || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ --Series to lookup-- +----++---+ | a || 0 | |----||---| | 0 || 4 | | 10 || 5 | +----++---+ --result-- +---++--------+ | a || foo | |---||--------| | 0 || 4 | | 1 || (null) | +---++--------+
Example (WithOptions) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"}) fmt.Println("--original Series--") fmt.Println(s) s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"}) fmt.Println("--Series to lookup--") fmt.Println(s2) fmt.Println("--result--") lookup, _ := s.Lookup( s2, tada.JoinOptionHow("inner"), tada.JoinOptionLeftOn([]string{"a"}), tada.JoinOptionRightOn([]string{"b"}), ) fmt.Println(lookup) }
Output: --original Series-- +-----+---++---+ | a | b || 0 | |-----|---||---| | foo | 0 || 1 | | bar | 1 || 2 | +-----+---++---+ --Series to lookup-- +----+-----++---+ | a | b || 0 | |----|-----||---| | 0 | baz || 4 | | 10 | bar || 5 | +----+-----++---+ --result-- +-----+---++---+ | a | b || 0 | |-----|---||---| | bar | 1 || 5 | +-----+---++---+
func (*Series) Merge ¶
func (s *Series) Merge(other *Series, options ...JoinOption) (*DataFrame, error)
Merge joins other onto s. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.
Merge identifies the row alignment between s and other and appends aligned values as new columns on s. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:
s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo
Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:
s FOO BAR QUX bar 0 null baz 1 corge
Finally, all container names (either the Series name or label name) are deduplicated after the merge so that they are unique. Returns a new DataFrame.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo") fmt.Println("--original Series--") fmt.Println(s) s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetName("bar") fmt.Println("--Series to merge--") fmt.Println(s2) fmt.Println("--result--") merged, _ := s.Merge(s2) fmt.Println(merged) }
Output: --original Series-- +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | +---++-----+ --Series to merge-- +----++-----+ | - || bar | |----||-----| | 0 || 4 | | 10 || 5 | +----++-----+ --result-- +---++-----+--------+ | - || foo | bar | |---||-----|--------| | 0 || 1 | 4 | | 1 || 2 | (null) | +---++-----+--------+
Example (WithOptions) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"}) fmt.Println("--original Series--") fmt.Println(s) s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"}) fmt.Println("--Series to lookup--") fmt.Println(s2) fmt.Println("--result--") merged, _ := s.Merge(s2, tada.JoinOptionHow("inner"), tada.JoinOptionLeftOn([]string{"a"}), tada.JoinOptionRightOn([]string{"b"}), ) fmt.Println(merged) }
Output: --original Series-- +-----+---++---+ | a | b || 0 | |-----|---||---| | foo | 0 || 1 | | bar | 1 || 2 | +-----+---++---+ --Series to lookup-- +----+-----++---+ | a | b || 0 | |----|-----||---| | 0 | baz || 4 | | 10 | bar || 5 | +----+-----++---+ --result-- +-----+---++---+-----+ | a | b || 0 | 0_1 | |-----|---||---|-----| | bar | 1 || 2 | 5 | +-----+---++---+-----+
func (*Series) Multiply ¶
Multiply coerces other and s to float64 values, aligns other with s, and multiplies the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.
func (*Series) NameOfLabel ¶
NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"
func (*Series) Percentile ¶
Percentile coerces the Series values to float64 returns the percentile rank of each value. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it.
func (*Series) PercentileBin ¶ added in v0.4.9
PercentileBin coerces the Series values to float64 and categorizes each value based on which percentile bin interval it falls within. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it. bins should be a slice of sequential percentile edges (between 0 and 1) that form intervals (left inclusive, right exclusive). NB: left inclusive, right exclusive is the opposite of the interval inclusion rules for the Bin() function. For example, [0, .5, 1] represents the percentile intervals 0-50% (including 0%, excluding 50%) and 50%-100% (including 50%, excluding 100%). If these bins were supplied for a Series with values [1, 1000], the returned Series would have values [0-0.5, 0.5-1], because 1 is in the bottom 50% of values and 1000 is in the top 50% of values. Null values are not categorized. For default behavior, supply nil as config.
To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.
Example ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo") fmt.Println(s) binned, _ := s.PercentileBin([]float64{0, .5, 1}, nil) fmt.Println(binned) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | | 2 || 3 | | 3 || 4 | +---++-----+ +---++-------+ | - || foo | |---||-------| | 0 || 0-0.5 | | 1 || | | 2 || 0.5-1 | | 3 || | +---++-------+
Example (CustomLabels) ¶
package main import ( "fmt" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo") fmt.Println(s) binned, _ := s.PercentileBin([]float64{0, .5, 1}, &tada.Binner{Labels: []string{"Bottom 50%", "Top 50%"}}) fmt.Println(binned) }
Output: +---++-----+ | - || foo | |---||-----| | 0 || 1 | | 1 || 2 | | 2 || 3 | | 3 || 4 | +---++-----+ +---++------------+ | - || foo | |---||------------| | 0 || Bottom 50% | | 1 || | | 2 || Top 50% | | 3 || | +---++------------+
func (*Series) Range ¶
Range returns the rows of the Series starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is out of range, a Series error is returned. In all cases, returns a new Series.
func (*Series) Rank ¶
Rank coerces the Series values to float64 and returns the rank of each (in ascending order - where 1 is the rank of the lowest value). Rows with the same value share the same rank.
func (*Series) Reduce ¶ added in v0.7.6
Reduce reduces all Series values to a single value and null status using lambda.
func (*Series) Relabel ¶
Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new Series.
func (*Series) Resample ¶
Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.
Returns a new Series.
Example (ByHalfHour) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{ time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC), time.Date(2020, 1, 15, 12, 45, 0, 0, time.UTC), }).SetName("foo") fmt.Println(s) byHalfHour := tada.Resampler{ByDuration: 30 * time.Minute} fmt.Println(s.Resample(byHalfHour)) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:15:00Z | | 1 || 2020-01-15T12:45:00Z | +---++----------------------+ +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:00:00Z | | 1 || 2020-01-15T12:30:00Z | +---++----------------------+
Example (ByHour) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo") fmt.Println(s) byHour := tada.Resampler{ByDuration: time.Hour} fmt.Println(s.Resample(byHour)) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:30:00Z | +---++----------------------+ +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:00:00Z | +---++----------------------+
Example (ByMonth) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo") fmt.Println(s) byMonth := tada.Resampler{ByMonth: true} fmt.Println(s.Resample(byMonth)) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:30:00Z | +---++----------------------+ +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-01T00:00:00Z | +---++----------------------+
Example (ByWeek) ¶
package main import ( "fmt" "time" "github.com/ptiger10/tada" ) func main() { s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo") fmt.Println(s) byWeek := tada.Resampler{ByWeek: true, StartOfWeek: time.Sunday} fmt.Println(s.Resample(byWeek)) }
Output: +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-15T12:30:00Z | +---++----------------------+ +---++----------------------+ | - || foo | |---||----------------------| | 0 || 2020-01-12T00:00:00Z | +---++----------------------+
func (*Series) RollingDuration ¶
func (s *Series) RollingDuration(d time.Duration) *GroupedSeries
RollingDuration iterates over each row in Series, coerces the values to time.Time, and groups each set of subsequent rows that are within d of the current row.
func (*Series) RollingN ¶
func (s *Series) RollingN(n int) *GroupedSeries
RollingN iterates over each row in Series and groups each set of n subsequent rows after the current row.
func (*Series) SetLabelNames ¶
SetLabelNames sets the names of all the label levels in the Series and returns the entire Series. If an error is returned, it is written to the Series.
func (*Series) SetName ¶
SetName modifies the name of a Series in place and returns the original Series.
func (*Series) SetRows ¶ added in v0.7.6
SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new Series.
func (*Series) Shift ¶
Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. Returns a new Series.
func (*Series) Shuffle ¶ added in v0.6.11
Shuffle randomizes the row order of the Series. Returns a new Series.
func (*Series) Sort ¶
Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new Series.
func (*Series) StdDev ¶ added in v0.5.3
StdDev coerces the Series values to float64 and calculates the standard deviation.
func (*Series) Struct ¶ added in v0.6.0
func (s *Series) Struct(structPointer interface{}, options ...WriteOption) error
Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.
func (*Series) Subset ¶
Subset returns only the rows specified at the index positions, in the order specified. Returns a new Series.
func (*Series) SubsetLabels ¶
SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Returns a new Series.
func (*Series) Subtract ¶
Subtract coerces other and s to float64 values, aligns other with s, and subtracts the aligned values of other from s, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.
func (*Series) SwapLabels ¶
SwapLabels swaps the label levels with names i and j. Returns a new Series.
func (*Series) Tail ¶
Tail returns the last n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.
func (*Series) Unique ¶
Unique returns the first appearance of all non-null values in the Series. If includeLabels is true, a row is considered unique only if its combination of labels and values is unique. Returns a new Series.
func (*Series) ValueCounts ¶
ValueCounts counts the number of appearances of each stringified value in the Series.
func (*Series) Where ¶
Where iterates over the rows in s and evaluates whether each one satisfies filters, which is a map of container names (either the Series name or label name) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.
Returns an unnamed Series a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.
func (*Series) WithLabels ¶
WithLabels resolves as follows:
If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.
If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.
In all cases, returns a new Series.
func (*Series) WithValues ¶
WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Returns a new Series.
type SeriesIterator ¶ added in v0.2.0
type SeriesIterator struct {
// contains filtered or unexported fields
}
A SeriesIterator iterates over the rows in a Series.
func (*SeriesIterator) Next ¶ added in v0.2.0
func (iter *SeriesIterator) Next() bool
Next advances to next row. Returns false at end of iteration.
func (*SeriesIterator) Row ¶ added in v0.2.0
func (iter *SeriesIterator) Row() map[string]Element
Row returns the current row in the Series as map[string]Element. The map keys are the names of containers (including label levels). The name of the Series values column is the same as the name of the Series itself. The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Elements of the left-most column are returned.
type SeriesMutator ¶
type SeriesMutator struct {
// contains filtered or unexported fields
}
A SeriesMutator is used to change Series values in place.
func (*SeriesMutator) Append ¶
func (s *SeriesMutator) Append(other *Series) error
Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.
func (*SeriesMutator) Apply ¶
func (s *SeriesMutator) Apply(lambda ApplyFn) error
Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be changed in-place within the anonymous function. Modifies the underlying Series in place.
func (*SeriesMutator) DropLabels ¶
func (s *SeriesMutator) DropLabels(name string) error
DropLabels removes the first label level matching name. Modifies the underlying Series in place.
func (*SeriesMutator) DropNull ¶
func (s *SeriesMutator) DropNull()
DropNull returns all the rows with non-null values. Modifies the underlying Series.
func (*SeriesMutator) DropRow ¶
func (s *SeriesMutator) DropRow(index int) error
DropRow removes the row at the specified index. Modifies the underlying Series in place.
func (*SeriesMutator) FillNull ¶
func (s *SeriesMutator) FillNull(how NullFiller)
FillNull fills all the null values and makes them not-null. Modifies the underlying Series.
func (*SeriesMutator) Filter ¶
func (s *SeriesMutator) Filter(filters map[string]FilterFn) error
Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.
Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying Series in place.
func (*SeriesMutator) FilterByValue ¶ added in v0.3.5
func (s *SeriesMutator) FilterByValue(filters map[string]interface{}) error
FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Modifies the underlying Series in place.
func (*SeriesMutator) Relabel ¶
func (s *SeriesMutator) Relabel()
Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying Series in place.
func (*SeriesMutator) Resample ¶
func (s *SeriesMutator) Resample(by Resampler)
Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.
Modifies the underlying Series in place.
func (*SeriesMutator) SetRows ¶ added in v0.7.6
func (s *SeriesMutator) SetRows(lambda ApplyFn, rows []int) error
SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying Series in place.
func (*SeriesMutator) Shift ¶
func (s *SeriesMutator) Shift(n int)
Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. // Modifies the underlying Series.
func (*SeriesMutator) Shuffle ¶ added in v0.6.11
func (s *SeriesMutator) Shuffle(seed int64)
Shuffle randomizes the row order of the Series. Modifies the underlying Series.
func (*SeriesMutator) Sort ¶
func (s *SeriesMutator) Sort(by ...Sorter) error
Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. Modifies the underlying Series in place.
func (*SeriesMutator) Subset ¶
func (s *SeriesMutator) Subset(index []int) error
Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying Series in place.
func (*SeriesMutator) SubsetLabels ¶
func (s *SeriesMutator) SubsetLabels(index []int) error
SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Modifies the underlying Series in place.
func (*SeriesMutator) SwapLabels ¶
func (s *SeriesMutator) SwapLabels(i, j string) error
SwapLabels swaps the label levels with names i and j. Modifies the underlying Series in place.
func (*SeriesMutator) WithLabels ¶
func (s *SeriesMutator) WithLabels(name string, input interface{}) error
WithLabels resolves as follows:
If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.
If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.
In all cases, modifies the underlying Series in place.
func (*SeriesMutator) WithValues ¶
func (s *SeriesMutator) WithValues(input interface{}) error
WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Modifies the underlying Series.
type Sorter ¶
A Sorter supplies details to the Sort() function. `Name` specifies the container (either label or column name) to sort. If `Descending` is true, values are sorted in descending order. `DType` specifies the data type to which values will be coerced before they are sorted (default: float64). Null values are always sorted to the bottom.
type StructTransposer ¶ added in v0.6.4
type StructTransposer [][]interface{}
A StructTransposer is a row-oriented representation of a DataFrame that can be randomly shuffled or transposed into a column-oriented struct representation of a DataFrame. It is useful for intuitive row-oriented testing.
func (StructTransposer) Shuffle ¶ added in v0.6.5
func (st StructTransposer) Shuffle(seed int64)
Shuffle randomly shuffles the row order in Rows, using a randomizer seeded with seed.
func (StructTransposer) Transpose ¶ added in v0.6.4
func (st StructTransposer) Transpose(structPointer interface{}) error
Transpose reads the values of an untyped, row-oriented struct representation of a DataFrame into a typed, column-oriented struct representation of a DataFrame. If all non-null values in a column have the same type, then the column will be a slice of that type. If any of the non-null values in a column have different types, then the column will be []interface{}. If all values are considered null by tada, then the column will be a slice of the type in the first row (when all values are null and the first row is nil, the column will be []interface{}). If an error is returned, values are still written to structPointer up until the point the error occurred.
type WriteOption ¶ added in v0.4.0
type WriteOption func(*writeConfig)
A WriteOption configures a write function. Available write options: WriteOptionExcludeLabels, WriteOptionDelimiter.