tada

package module

v0.8.8 Latest Latest Go to latest Published: Apr 14, 2020 License: Apache-2.0 Imports: 20 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/ptiger10/tada

Links

Open Source Insights

README ¶

tada

tada (TAble DAta) is a package that enables test-driven data pipelines in pure Go.

DISCLAIMER: still under development. API subject to breaking changes until v1.

If you still want to use this regardless of the disclaimer, congratulations, you are an alpha tester! Please DM your feedback to me on the Gophers slack channel (Dave Fort) or create an issue.

tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data.

Some notable features of tada:

flexible constructor that supports most primitive data types
seamlessly handles null data and type conversions
robust datetime support
advanced filtering, lookups and merging, grouping, sorting, and pivoting
multi-level labels and columns
complete test coverage
interoperable with existing pandas dataframes via Apache Arrow
comparable to pandas performance on key operations

The key data types are Series, DataFrames, and groupings of each. A Series is analogous to one column of a spreadsheet, and a DataFrame is analogous to a whole spreadsheet. Printing either data type will render an ASCII table.

Both Series and DataFrames have one or more "label levels". On printing, these appear as the leftmost columns in a table, and typically have values that help identify ("label") specific rows. They are analogous to the "index" concept in pandas.

For more detail and implementation notes, see this doc.

Logo: @egonelbre, licensed under CC0

Example

You start with a CSV. Like most real-world data, it is messy. This one is missing a score in the first row. And we know that scores must range between 0 and 10, so the score of -100 and 1000 in the second and third rows must also be erroneous:

var data = `name, score
            joe doe,
            john doe, -100
            jane doe, 1000
            john doe, 5
            jane doe, 8
            john doe, 7
            jane doe, 10`

You want to write and validate a function that discards erroneous data, groups by the name column, and returns the mean of the groups.

First you write a test. You can test in two ways:

Comparing to stringified csv (compares stringified values, regardless of type)

func TestDataPipeline(t *testing.T) {
	want := `name, mean_score
           jane doe, 9
           john doe, 6`

	df, _ := tada.ReadCSV(strings.NewReader(data))
	ret := sampleDataPipeline(df)
	eq, diffs, _ := ret.EqualsCSV(true, strings.NewReader(want))
	if !eq {
		t.Errorf("sampleDataPipeline(): got %v, want %v, has diffs: \n%v", ret, want, diffs)
	}
}

Comparing to struct (compares typed values)

func Test_sampleDataPipelineTyped(t *testing.T) {
	type output struct {
		Name      []string  `tada:"name"`
		MeanScore []float64 `tada:"mean_score"`
	}
	want := output{
		Name:      []string{"jane doe", "john doe"},
		MeanScore: []float64{9, 5},
	}

	df, _ := tada.ReadCSV(strings.NewReader(data))

	out := sampleDataPipeline(df)
	var got output
	out.Struct(&got)
	if !reflect.DeepEqual(got, want) {
		t.Errorf("sampleDataPipelineTyped(): got %v, want %v", got, want)
	}
}

Then you write the data pipeline:

func sampleDataPipeline(df *tada.DataFrame) *tada.DataFrame {
	err := df.HasCols("name", "score")
	if err != nil {
		log.Fatal(err)
	}
	df.InPlace().DropNull()
	df.Cast(map[string]tada.DType{"score": tada.Float64})
	validScore := func(v interface{}) bool { return v.(float64) >= 0 && v.(float64) <= 10 }
	df.InPlace().Filter(map[string]tada.FilterFn{"score": validScore})
	df.InPlace().Sort(tada.Sorter{Name: "name", DType: tada.String})
	
	ret := df.GroupBy("name").Mean("score")
	if ret.Err() != nil {
		log.Fatal(ret.Err())
	}
	return ret
}

More examples

Usage

Constructor:

Series

s := tada.NewSeries([]float{1,2,3})

with one level of labels

s := tada.NewSeries([]float{1,2,3}, []string{"foo", "bar", "baz"})

DataFrame

df := tada.NewDataFrame([]interface{}{
  []string{"a"}, 
  []float64{100},
}).SetColNames([]string{"foo", "bar"})

Reading from CSV

f, err := os.Open("foo.csv")
... handle err
defer f.Close()
df, err := tada.ReadCSV(f)
... handle err

More examples

Performance Tuning

Modify a Series or DataFrame in place (without returning a new copy) by first calling InPlace().
If you expect to use a column as numeric, string, or time.Time values multiple times, Cast() it to tada.Float64, tada.String, or tada.DateTime, respectively.

Inter-process communication (IPC)

Apache Arrow
- Read from existing Pandas dataframes using the Apache Arrow specification.
- Because the go/arrow library is still not v1.0, convenience functions and patterns are versioned in a separate repo.

Documentation ¶

Overview ¶

Package tada (TAble DAta) enables test-driven data pipelines.

tada combines concepts from pandas, spreadsheets, R, Apache Spark, and SQL. Its most common use cases are cleaning, aggregating, transforming, and analyzing data. Some notable features of tada:

* flexible constructor that supports most primitive data types

* seamlessly handles null data and type conversions

* robust datetime support

* advanced filtering, lookups and merging, grouping, sorting, and pivoting

* multi-level labels and columns

* complete test coverage

* interoperable with existing pandas dataframes via Apache Arrow

The key data types are Series, DataFrames, and groupings of each. A Series is analogous to one column of a spreadsheet, and a DataFrame is analogous to a whole spreadsheet. Printing either data type will render an ASCII table.

Both Series and DataFrames have one or more "label levels". On printing, these appear as the leftmost columns in a table, and typically have values that help identify ("label") specific rows. They are analogous to the "index" concept in pandas.

For more detail and implementation notes, see https://docs.google.com/document/d/18DvZzd6Tg6Bz0SX0fY2SrXOjE8d9xDhU6bDEnaIc_rM/

Index ¶

func DisableWarnings()
func EnableWarnings()
func EqualDataFrames(a, b *DataFrame) bool
func EqualSeries(a, b *Series) bool
func GetOptionDefaultNullStrings() []string
func JoinOptionHow(how string) func(*joinConfig)
func JoinOptionLeftOn(keys []string) func(*joinConfig)
func JoinOptionRightOn(keys []string) func(*joinConfig)
func MakeMultiLevelLabels(labels []interface{}) ([]interface{}, error)
func PrettyDiff(got, want interface{}) (bool, *tablediff.Differences, error)
func PrintOptionMaxCellWidth(n int)
func PrintOptionMaxColumns(n int)
func PrintOptionMaxRows(n int)
func PrintOptionMergeRepeats(set bool)
func PrintOptionWrapLines(set bool)
func ReadOptionDelimiter(sep rune) func(*readConfig)
func ReadOptionHeaders(n int) func(*readConfig)
func ReadOptionLabels(n int) func(*readConfig)
func ReadOptionSwitchDims() func(*readConfig)
func SetOptionAddTimeFormat(format string)
func SetOptionDefaultSeparator(sep string)
func SetOptionNaNStatus(set bool)
func SetOptionNullStrings(list []string)
func WriteMockCSV(w io.Writer, n int, r io.Reader, options ...ReadOption) error
func WriteOptionDelimiter(sep rune) func(*writeConfig)
func WriteOptionExcludeLabels() func(*writeConfig)
type ApplyFn
type Binner
type DType
type DataFrame
- func ConcatSeries(series ...*Series) (*DataFrame, error)
- func NewDataFrame(slices []interface{}, labels ...interface{}) *DataFrame
- func ReadCSV(r io.Reader, options ...ReadOption) (*DataFrame, error)
- func ReadCSVFromRecords(records [][]string, options ...ReadOption) (ret *DataFrame, err error)
- func ReadInterfaceRecords(records [][]interface{}, options ...ReadOption) (ret *DataFrame, err error)
- func ReadMatrix(mat Matrix) *DataFrame
- func ReadStruct(strct interface{}, options ...ReadOption) (*DataFrame, error)
- func ReadStructSlice(slice interface{}) (*DataFrame, error)
- func (df *DataFrame) Append(other *DataFrame) *DataFrame
- func (df *DataFrame) Apply(lambdas map[string]ApplyFn) *DataFrame
- func (df *DataFrame) At(row, column int) *Element
- func (df *DataFrame) CSVRecords(options ...WriteOption) [][]string
- func (df *DataFrame) Cast(containerAsType map[string]DType)
- func (df *DataFrame) Col(name string) *Series
- func (df *DataFrame) Cols(names ...string) *DataFrame
- func (df *DataFrame) Copy() *DataFrame
- func (df *DataFrame) Count() *Series
- func (df *DataFrame) DeduplicateNames() *DataFrame
- func (df *DataFrame) DropCol(name string) *DataFrame
- func (df *DataFrame) DropLabels(name string) *DataFrame
- func (df *DataFrame) DropNull(subset ...string) *DataFrame
- func (df *DataFrame) DropRow(index int) *DataFrame
- func (df *DataFrame) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
- func (df *DataFrame) Err() error
- func (df *DataFrame) FillNull(how map[string]NullFiller) *DataFrame
- func (df *DataFrame) Filter(filters map[string]FilterFn) *DataFrame
- func (df *DataFrame) FilterByValue(filters map[string]interface{}) *DataFrame
- func (df *DataFrame) FilterCols(lambda func(string) bool, level int) *DataFrame
- func (df *DataFrame) FilterIndex(container string, filterFn FilterFn) []int
- func (df *DataFrame) GetLabels() []interface{}
- func (df *DataFrame) GroupBy(names ...string) *GroupedDataFrame
- func (df *DataFrame) HasCols(colNames ...string) error
- func (df *DataFrame) HasLabels(labelNames ...string) error
- func (df *DataFrame) HasType(sliceType string) (labelIndex, columnIndex []int)
- func (df *DataFrame) Head(n int) *DataFrame
- func (df *DataFrame) InPlace() *DataFrameMutator
- func (df *DataFrame) IndexOfContainer(name string, columns bool) int
- func (df *DataFrame) InterfaceRecords(options ...WriteOption) [][]interface{}
- func (df *DataFrame) IsNull(subset ...string) *DataFrame
- func (df *DataFrame) Iterator() *DataFrameIterator
- func (df *DataFrame) LabelsAsSeries(name string) *Series
- func (df *DataFrame) Len() int
- func (df *DataFrame) ListColNames() []string
- func (df *DataFrame) ListColNamesAtLevel(level int) []string
- func (df *DataFrame) ListLabelNames() []string
- func (df *DataFrame) Lookup(other *DataFrame, options ...JoinOption) (*DataFrame, error)
- func (df *DataFrame) Max() *Series
- func (df *DataFrame) Mean() *Series
- func (df *DataFrame) Median() *Series
- func (df *DataFrame) Merge(other *DataFrame, options ...JoinOption) (*DataFrame, error)
- func (df *DataFrame) Min() *Series
- func (df *DataFrame) NUnique() *Series
- func (df *DataFrame) Name() string
- func (df *DataFrame) NameOfCol(n int) string
- func (df *DataFrame) NameOfLabel(n int) string
- func (df *DataFrame) NumColumns() int
- func (df *DataFrame) NumLevels() int
- func (df *DataFrame) PivotTable(labels, columns, values, aggFunc string) (*DataFrame, error)
- func (df *DataFrame) PromoteToColLevel(name string) *DataFrame
- func (df *DataFrame) Range(first, last int) *DataFrame
- func (df *DataFrame) Reduce(name string, lambda ReduceFn) (*Series, error)
- func (df *DataFrame) Relabel() *DataFrame
- func (df *DataFrame) ReorderCols(colNames []string) *DataFrame
- func (df *DataFrame) ReorderLabels(levelNames []string) *DataFrame
- func (df *DataFrame) Resample(how map[string]Resampler) *DataFrame
- func (df *DataFrame) ResetLabels(index ...string) *DataFrame
- func (df *DataFrame) Series() *Series
- func (df *DataFrame) SetAsLabels(colNames ...string) *DataFrame
- func (df *DataFrame) SetColNames(colNames []string) *DataFrame
- func (df *DataFrame) SetLabelNames(levelNames []string) *DataFrame
- func (df *DataFrame) SetName(name string) *DataFrame
- func (df *DataFrame) SetNulls(n int, nulls []bool) error
- func (df *DataFrame) SetRows(lambda ApplyFn, container string, rows []int) *DataFrame
- func (df *DataFrame) Shuffle(seed int64) *DataFrame
- func (df *DataFrame) Sort(by ...Sorter) *DataFrame
- func (df *DataFrame) StdDev() *Series
- func (df *DataFrame) String() string
- func (df *DataFrame) Struct(structPointer interface{}, options ...WriteOption) error
- func (df *DataFrame) Subset(index []int) *DataFrame
- func (df *DataFrame) SubsetCols(index []int) *DataFrame
- func (df *DataFrame) SubsetLabels(index []int) *DataFrame
- func (df *DataFrame) Sum() *Series
- func (df *DataFrame) SumCols(name string, colNames ...string) (*Series, error)
- func (df *DataFrame) SwapLabels(i, j string) *DataFrame
- func (df *DataFrame) Tail(n int) *DataFrame
- func (df *DataFrame) Transpose() *DataFrame
- func (df *DataFrame) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)
- func (df *DataFrame) WithCol(name string, input interface{}) *DataFrame
- func (df *DataFrame) WithLabels(name string, input interface{}) *DataFrame
- func (df *DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error
type DataFrameIterator
- func (iter *DataFrameIterator) Next() bool
- func (iter *DataFrameIterator) Row() map[string]Element
type DataFrameMutator
- func (df *DataFrameMutator) Append(other *DataFrame) error
- func (df *DataFrameMutator) Apply(lambdas map[string]ApplyFn) error
- func (df *DataFrameMutator) DeduplicateNames()
- func (df *DataFrameMutator) DropCol(name string) error
- func (df *DataFrameMutator) DropLabels(name string) error
- func (df *DataFrameMutator) DropNull(subset ...string) error
- func (df *DataFrameMutator) DropRow(index int) error
- func (df *DataFrameMutator) FillNull(how map[string]NullFiller) error
- func (df *DataFrameMutator) Filter(filters map[string]FilterFn) error
- func (df *DataFrameMutator) FilterByValue(filters map[string]interface{}) error
- func (df *DataFrameMutator) FilterCols(lambda func(string) bool, level int) error
- func (df *DataFrameMutator) IsNull(subset ...string) error
- func (df *DataFrameMutator) Range(first, last int) error
- func (df *DataFrameMutator) Relabel()
- func (df *DataFrameMutator) ReorderCols(colNames []string) error
- func (df *DataFrameMutator) ReorderLabels(levelNames []string) error
- func (df *DataFrameMutator) Resample(how map[string]Resampler) error
- func (df *DataFrameMutator) ResetLabels(labelLevels ...string) error
- func (df *DataFrameMutator) SetAsLabels(colNames ...string)
- func (df *DataFrameMutator) SetRows(lambda ApplyFn, container string, rows []int) error
- func (df *DataFrameMutator) Shuffle(seed int64)
- func (df *DataFrameMutator) Sort(by ...Sorter) error
- func (df *DataFrameMutator) Subset(index []int) error
- func (df *DataFrameMutator) SubsetCols(index []int) error
- func (df *DataFrameMutator) SubsetLabels(index []int) error
- func (df *DataFrameMutator) SwapLabels(i, j string) error
- func (df *DataFrameMutator) WithCol(name string, input interface{}) error
- func (df *DataFrameMutator) WithLabels(name string, input interface{}) error
type Element
type FilterFn
type GroupedDataFrame
- func (g *GroupedDataFrame) Apply(cols []string, lambda ApplyFn) *GroupedDataFrame
- func (g *GroupedDataFrame) Col(colName string) *GroupedSeries
- func (g *GroupedDataFrame) Count(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) DataFrame() *DataFrame
- func (g *GroupedDataFrame) Earliest(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Err() error
- func (g *GroupedDataFrame) First(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) GetGroup(group string) *DataFrame
- func (g *GroupedDataFrame) GetLabels() []interface{}
- func (g *GroupedDataFrame) HavingCount(lambda func(int) bool) *GroupedDataFrame
- func (g *GroupedDataFrame) Iterator() *GroupedDataFrameIterator
- func (g *GroupedDataFrame) Last(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Latest(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Len() int
- func (g *GroupedDataFrame) ListGroups() []string
- func (g *GroupedDataFrame) Max(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Mean(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Median(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Min(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) NUnique(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Nth(index int, colNames ...string) *DataFrame
- func (g *GroupedDataFrame) Reduce(name string, cols []string, lambda ReduceFn) *DataFrame
- func (g *GroupedDataFrame) StdDev(colNames ...string) *DataFrame
- func (g *GroupedDataFrame) String() string
- func (g *GroupedDataFrame) Sum(colNames ...string) *DataFrame
type GroupedDataFrameIterator
- func (g *GroupedDataFrameIterator) DataFrame() *DataFrame
- func (g *GroupedDataFrameIterator) Next() bool
type GroupedSeries
- func (g *GroupedSeries) Align() *GroupedSeries
- func (g *GroupedSeries) Apply(lambda ApplyFn) *GroupedSeries
- func (g *GroupedSeries) Count() *Series
- func (g *GroupedSeries) Earliest() *Series
- func (g *GroupedSeries) Err() error
- func (g *GroupedSeries) First() *Series
- func (g *GroupedSeries) GetGroup(group string) *Series
- func (g *GroupedSeries) GetLabels() []interface{}
- func (g *GroupedSeries) HavingCount(lambda func(int) bool) *GroupedSeries
- func (g *GroupedSeries) Iterator() *GroupedSeriesIterator
- func (g *GroupedSeries) Last() *Series
- func (g *GroupedSeries) Latest() *Series
- func (g *GroupedSeries) Len() int
- func (g *GroupedSeries) ListGroups() []string
- func (g *GroupedSeries) Max() *Series
- func (g *GroupedSeries) Mean() *Series
- func (g *GroupedSeries) Median() *Series
- func (g *GroupedSeries) Min() *Series
- func (g *GroupedSeries) NUnique() *Series
- func (g *GroupedSeries) Nth(n int) *Series
- func (g *GroupedSeries) Reduce(name string, lambda ReduceFn) *Series
- func (g *GroupedSeries) Series() *Series
- func (g *GroupedSeries) StdDev() *Series
- func (g *GroupedSeries) String() string
- func (g *GroupedSeries) Sum() *Series
type GroupedSeriesIterator
- func (g *GroupedSeriesIterator) Next() bool
- func (g *GroupedSeriesIterator) Series() *Series
type JoinOption
type Matrix
type NullFiller
type ReadOption
type ReduceFn
type Resampler
type Series
- func NewSeries(slice interface{}, labels ...interface{}) *Series
- func (s *Series) Add(other *Series, ignoreNulls bool) *Series
- func (s *Series) Append(other *Series) *Series
- func (s *Series) Apply(lambda ApplyFn) *Series
- func (s *Series) At(index int) *Element
- func (s *Series) Bin(bins []float64, config *Binner) (*Series, error)
- func (s *Series) CSV(options ...WriteOption) ([][]string, error)
- func (s *Series) Cast(containerAsType map[string]DType)
- func (s *Series) Copy() *Series
- func (s *Series) Count() int
- func (s *Series) CumSum() *Series
- func (s *Series) DataFrame() *DataFrame
- func (s *Series) Divide(other *Series, ignoreNulls bool) *Series
- func (s *Series) DropLabels(name string) *Series
- func (s *Series) DropNull() *Series
- func (s *Series) DropRow(index int) *Series
- func (s *Series) Earliest() time.Time
- func (s *Series) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)
- func (s *Series) Err() error
- func (s *Series) FillNull(how NullFiller) *Series
- func (s *Series) Filter(filters map[string]FilterFn) *Series
- func (s *Series) FilterByValue(filters map[string]interface{}) *Series
- func (s *Series) FilterIndex(container string, filterFn FilterFn) []int
- func (s *Series) GetLabels() []interface{}
- func (s *Series) GetNulls() []bool
- func (s *Series) GetValues() interface{}
- func (s *Series) GetValuesAsFloat64() []float64
- func (s *Series) GetValuesAsString() []string
- func (s *Series) GetValuesAsTime() []time.Time
- func (s *Series) GroupBy(names ...string) *GroupedSeries
- func (s *Series) HasLabels(labelNames ...string) error
- func (s *Series) Head(n int) *Series
- func (s *Series) InPlace() *SeriesMutator
- func (s *Series) IndexOfLabel(name string) int
- func (s *Series) IsNull() *Series
- func (s *Series) Iterator() *SeriesIterator
- func (s *Series) LabelsAsSeries(name string) *Series
- func (s *Series) Latest() time.Time
- func (s *Series) Len() int
- func (s *Series) ListLabelNames() []string
- func (s *Series) Lookup(other *Series, options ...JoinOption) (*Series, error)
- func (s *Series) Max() float64
- func (s *Series) Mean() float64
- func (s *Series) Median() float64
- func (s *Series) Merge(other *Series, options ...JoinOption) (*DataFrame, error)
- func (s *Series) Min() float64
- func (s *Series) Multiply(other *Series, ignoreNulls bool) *Series
- func (s *Series) NUnique() int
- func (s *Series) Name() string
- func (s *Series) NameOfLabel(n int) string
- func (s *Series) Percentile() *Series
- func (s *Series) PercentileBin(bins []float64, config *Binner) (*Series, error)
- func (s *Series) Range(first, last int) *Series
- func (s *Series) Rank() *Series
- func (s *Series) Reduce(lambda ReduceFn) (value interface{}, isNull bool)
- func (s *Series) Relabel() *Series
- func (s *Series) Resample(by Resampler) *Series
- func (s *Series) RollingDuration(d time.Duration) *GroupedSeries
- func (s *Series) RollingN(n int) *GroupedSeries
- func (s *Series) SetLabelNames(levelNames []string) *Series
- func (s *Series) SetName(name string) *Series
- func (s *Series) SetRows(lambda ApplyFn, rows []int) *Series
- func (s *Series) Shift(n int) *Series
- func (s *Series) Shuffle(seed int64) *Series
- func (s *Series) Sort(by ...Sorter) *Series
- func (s *Series) StdDev() float64
- func (s *Series) String() string
- func (s *Series) Struct(structPointer interface{}, options ...WriteOption) error
- func (s *Series) Subset(index []int) *Series
- func (s *Series) SubsetLabels(index []int) *Series
- func (s *Series) Subtract(other *Series, ignoreNulls bool) *Series
- func (s *Series) Sum() float64
- func (s *Series) SwapLabels(i, j string) *Series
- func (s *Series) Tail(n int) *Series
- func (s *Series) Type() reflect.Type
- func (s *Series) Unique(includeLabels bool) *Series
- func (s *Series) ValueCounts() map[string]int
- func (s *Series) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)
- func (s *Series) WithLabels(name string, input interface{}) *Series
- func (s *Series) WithValues(input interface{}) *Series
- func (s *Series) WriteCSV(w io.Writer, options ...WriteOption) error
type SeriesIterator
- func (iter *SeriesIterator) Next() bool
- func (iter *SeriesIterator) Row() map[string]Element
type SeriesMutator
- func (s *SeriesMutator) Append(other *Series) error
- func (s *SeriesMutator) Apply(lambda ApplyFn) error
- func (s *SeriesMutator) DropLabels(name string) error
- func (s *SeriesMutator) DropNull()
- func (s *SeriesMutator) DropRow(index int) error
- func (s *SeriesMutator) FillNull(how NullFiller)
- func (s *SeriesMutator) Filter(filters map[string]FilterFn) error
- func (s *SeriesMutator) FilterByValue(filters map[string]interface{}) error
- func (s *SeriesMutator) Relabel()
- func (s *SeriesMutator) Resample(by Resampler)
- func (s *SeriesMutator) SetRows(lambda ApplyFn, rows []int) error
- func (s *SeriesMutator) Shift(n int)
- func (s *SeriesMutator) Shuffle(seed int64)
- func (s *SeriesMutator) Sort(by ...Sorter) error
- func (s *SeriesMutator) Subset(index []int) error
- func (s *SeriesMutator) SubsetLabels(index []int) error
- func (s *SeriesMutator) SwapLabels(i, j string) error
- func (s *SeriesMutator) WithLabels(name string, input interface{}) error
- func (s *SeriesMutator) WithValues(input interface{}) error
type Sorter
type StructTransposer
- func (st StructTransposer) Shuffle(seed int64)
- func (st StructTransposer) Transpose(structPointer interface{}) error
type WriteOption

Examples ¶

DataFrame
DataFrame.Filter
DataFrame.GroupBy
DataFrame.SetColNames
DataFrame.SetLabelNames
DataFrame.SetLabelNames (Multiple)
DataFrame.Sort
DataFrame.Struct
DataFrame.Struct (WithNulls)
DataFrame.Where
DataFrame.WithCol (Append)
DataFrame.WithCol (Overwrite)
DataFrame.WithCol (Rename)
DataFrameMutator.WithCol (Rename)
GroupedSeries.Align (Mean)
GroupedSeries.Apply
GroupedSeries.Apply (Align)
GroupedSeries.HavingCount (Sum)
GroupedSeries.Mean
GroupedSeries.Reduce
PrintOptionMaxCellWidth
PrintOptionMaxColumns
PrintOptionMaxRows
ReadCSV
ReadCSV (Delimiter)
ReadCSV (MultipleHeaders)
ReadCSV (MultipleHeadersWithLabels)
ReadCSV (NoHeaders)
ReadCSV (WithLabels)
ReadCSVFromRecords
ReadCSVFromRecords (ColsAsMajorDimension)
Series
Series (NestedSlice)
Series (SetNaNStatus)
Series (SetSentinelNulls)
Series (Zscore)
Series.Apply (Float64)
Series.Bin
Series.Bin (AndMore)
Series.Bin (CustomLabels)
Series.Cast (Date)
Series.Cast (Time)
Series.GroupBy
Series.GroupBy (CompoundGroup)
Series.Lookup
Series.Lookup (WithOptions)
Series.Merge
Series.Merge (WithOptions)
Series.PercentileBin
Series.PercentileBin (CustomLabels)
Series.Resample (ByHalfHour)
Series.Resample (ByHour)
Series.Resample (ByMonth)
Series.Resample (ByWeek)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DisableWarnings ¶

func DisableWarnings()

DisableWarnings prevents tada from writing warning messages to the default log writer.

func EnableWarnings ¶

func EnableWarnings()

EnableWarnings allows tada to write warning messages to the default log writer.

func EqualDataFrames ¶

func EqualDataFrames(a, b *DataFrame) bool

EqualDataFrames returns whether two dataframes are identical or not.

func EqualSeries ¶

func EqualSeries(a, b *Series) bool

EqualSeries returns whether two Series are identical or not.

func GetOptionDefaultNullStrings ¶ added in v0.6.0

func GetOptionDefaultNullStrings() []string

GetOptionDefaultNullStrings returns the default list of strings that tada considers null.

func JoinOptionHow ¶ added in v0.6.0

func JoinOptionHow(how string) func(*joinConfig)

JoinOptionHow specifies how to join two Series or DataFrames. Supported options: left (ie left join), right, inner (default: left).

func JoinOptionLeftOn ¶ added in v0.6.0

func JoinOptionLeftOn(keys []string) func(*joinConfig)

JoinOptionLeftOn specifies the key(s) to use to join the left Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.

func JoinOptionRightOn ¶ added in v0.6.0

func JoinOptionRightOn(keys []string) func(*joinConfig)

JoinOptionRightOn specifies the key(s) to use to join the right Series/DataFrame. Keys must be existing container names (either label level or column names). Default: no keys are specified, so shared label names are used automatically as keys.

func MakeMultiLevelLabels ¶

func MakeMultiLevelLabels(labels []interface{}) ([]interface{}, error)

MakeMultiLevelLabels expects labels to be a slice of slices. It returns a product of these slices by repeating each label value n times, where n is the number of unique label values in the other slices.

For example, [["foo", "bar"], [1, 2, 3]] returns [["foo", "foo", "foo", "bar", "bar", "bar"], [1, 2, 3, 1, 2, 3]]

func PrettyDiff ¶ added in v0.8.6

func PrettyDiff(got, want interface{}) (bool, *tablediff.Differences, error)

PrettyDiff reads two structs into DataFrames, prints each as a stringified csv table, and returns whether they are equal. If not, returns the differences between the two.

func PrintOptionMaxCellWidth ¶ added in v0.7.6

func PrintOptionMaxCellWidth(n int)

PrintOptionMaxCellWidth changes the max rune width of any cell displayed when printing a Series or DataFrame to n (default: 30).

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]string{"corgilius", "barrius", "foo"},
	}).SetColNames([]string{"waldonius"})
	tada.PrintOptionMaxCellWidth(5)
	fmt.Println(df)
	tada.PrintOptionMaxCellWidth(30)
}

Output:

+---++-------+
| - || wa... |
|---||-------|
| 0 || co... |
| 1 || ba... |
| 2 ||   foo |
+---++-------+

func PrintOptionMaxColumns ¶ added in v0.1.0

func PrintOptionMaxColumns(n int)

PrintOptionMaxColumns changes the max number of columns displayed when printing a Series or DataFrame to n (default: 20).

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []float64{3, 4}, []float64{5, 6},
		[]float64{3, 4}, []float64{5, 6},
	}).SetColNames([]string{"A", "B", "C", "D", "E"})
	tada.PrintOptionMaxColumns(2)
	fmt.Println(df)
	tada.PrintOptionMaxColumns(20)
}

Output:

+---++---+-----+---+
| - || A | ... | E |
|---||---|-----|---|
| 0 || 1 | ... | 5 |
| 1 || 2 |     | 6 |
+---++---+-----+---+

func PrintOptionMaxRows ¶ added in v0.1.0

func PrintOptionMaxRows(n int)

PrintOptionMaxRows changes the max number of rows displayed when printing a Series or DataFrame to n (default: 50).

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3, 4, 5, 6, 7, 8}}).SetColNames([]string{"A"})
	tada.PrintOptionMaxRows(6)
	fmt.Println(df)
	tada.PrintOptionMaxRows(50)
}

Output:

+-----++-----+
|  -  ||  A  |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   2 ||   3 |
| ... || ... |
|   5 ||   6 |
|   6 ||   7 |
|   7 ||   8 |
+-----++-----+

func PrintOptionMergeRepeats ¶ added in v0.1.0

func PrintOptionMergeRepeats(set bool)

PrintOptionMergeRepeats (if true) instructs the String() function to merge repeated non-header values when printing a Series or DataFrame (default: true).

func PrintOptionWrapLines ¶ added in v0.1.0

func PrintOptionWrapLines(set bool)

PrintOptionWrapLines (if true) instructs the String() function to wrap overly-wide rows onto new lines instead of truncating them when printing a Series or DataFrame (default: truncate).

func ReadOptionDelimiter ¶ added in v0.1.0

func ReadOptionDelimiter(sep rune) func(*readConfig)

ReadOptionDelimiter configures a read function to use sep as a field delimiter for use in ReadCSV (default: ",").

func ReadOptionHeaders ¶ added in v0.1.0

func ReadOptionHeaders(n int) func(*readConfig)

ReadOptionHeaders configures a read function to expect n rows to be column headers (default: 1).

func ReadOptionLabels ¶ added in v0.1.0

func ReadOptionLabels(n int) func(*readConfig)

ReadOptionLabels configures a read function to expect the first n columns to be label levels (default: 0).

func ReadOptionSwitchDims ¶ added in v0.1.0

func ReadOptionSwitchDims() func(*readConfig)

ReadOptionSwitchDims configures a read function to expect columns to be the major dimension of csv data (default: expects rows to be the major dimension). For example, when reading this data:

[["foo", "bar"], ["baz", "qux"]]

default ReadOptionSwitchDims() (major dimension: rows) (major dimension: columns)

	foo bar							foo baz
 baz qux							bar qux

func SetOptionAddTimeFormat ¶

func SetOptionAddTimeFormat(format string)

SetOptionAddTimeFormat adds format to the list of time formats that can be parsed when converting values from string to time.Time.

func SetOptionDefaultSeparator ¶ added in v0.1.0

func SetOptionDefaultSeparator(sep string)

SetOptionDefaultSeparator changes the separator used in group names and multi-level column names to sep (default: "|").

func SetOptionNaNStatus ¶ added in v0.6.0

func SetOptionNaNStatus(set bool)

SetOptionNaNStatus sets whether math.NaN() is considered a null value or not (default: true).

func SetOptionNullStrings ¶ added in v0.6.0

func SetOptionNullStrings(list []string)

SetOptionNullStrings replaces the default list of strings that tada considers null with list.

func WriteMockCSV ¶

func WriteMockCSV(w io.Writer, n int, r io.Reader, options ...ReadOption) error

WriteMockCSV reads r (configured by options) and writes n mock rows to w, with column names and types inferred based on the data in src. Regardless of the major dimension of src, the major dimension of the output is rows. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row, no labels, rows as major dimension

func WriteOptionDelimiter ¶ added in v0.4.0

func WriteOptionDelimiter(sep rune) func(*writeConfig)

WriteOptionDelimiter configures a write function to use sep as a field delimiter for use in write functions (default: ",").

func WriteOptionExcludeLabels ¶ added in v0.4.0

func WriteOptionExcludeLabels() func(*writeConfig)

WriteOptionExcludeLabels excludes the label levels from the output.

Types ¶

type ApplyFn ¶

type ApplyFn func(slice interface{}, isNull []bool) (equalLengthSlice interface{})

An ApplyFn is an anonymous function supplied to an Apply function to convert one slice to another. The function input will be a slice, and it must return a slice of equal length (though the type may be different). isNull contains the null status of every row in the input slice. The null status of a row may be changed by setting that row's isNull element within the function body.

type Binner ¶ added in v0.4.9

type Binner struct {
	AndLess bool
	AndMore bool
	Labels  []string
}

Binner supplies logic for the Bin() function. If `AndLess` is true, a bin is added that ranges between negative infinity and the first bin value. If `AndMore` is true, a bin is added that ranges between the last bin value and positive infinity. If `Labels` is not nil, then category names correspond to labels, and the number of labels must be one less than the number of bin values. Otherwise, category names are auto-generated from the range of the bin intervals.

type DType ¶

type DType int

DType is a DataType that may be used in Sort() or Cast().

const (
	// Float64 -> float64
	Float64 DType = iota
	// String -> string
	String
	// DateTime -> time.Time
	DateTime // always tz-aware
	// Time -> civil.Time
	Time
	// Date -> civil.Date
	Date
)

type DataFrame ¶

type DataFrame struct {
	// contains filtered or unexported fields
}

A DataFrame is one or more columns of data with one or more levels of aligned labels. A DataFrame is analogous to a spreadsheet.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []string{"baz", "qux"}},
	).SetName("foo")
	fmt.Println(df)
}

Output:

+---++---+-----+
| - || 0 |  1  |
|---||---|-----|
| 0 || 1 | baz |
| 1 || 2 | qux |
+---++---+-----+
name: foo

func ConcatSeries ¶

func ConcatSeries(series ...*Series) (*DataFrame, error)

ConcatSeries merges multiple Series from left-to-right, one after the other, via left joins on shared keys. For advanced cases, use df.LookupAdvanced() + df.WithCol().

func NewDataFrame ¶

func NewDataFrame(slices []interface{}, labels ...interface{}) *DataFrame

NewDataFrame creates a new DataFrame with slices (akin to column values) and optional labels. Slices must be comprised of supported slices, and each label must be a supported slice.

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Columns are named sequentially (e.g., 0, 1, etc) by default. Default column names are displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.

Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).

func ReadCSV ¶

func ReadCSV(r io.Reader, options ...ReadOption) (*DataFrame, error)

ReadCSV reads csv records in r into a Dataframe (configured by options). Rows must be the major dimension of r. For advanced cases, use the standard csv library NewReader().ReadAll() + tada.ReadCSVFromRecords(). Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter.

Default if no options are supplied: 1 header row; no labels; field delimiter is ","

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

Example ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data))
	fmt.Println(df)
}

Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+

Example (Delimiter) ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := `foo|bar
	baz|qux
	corge|fred`
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionDelimiter('|'))
	fmt.Println(df)
}

Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+

Example (MultipleHeaders) ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2))
	fmt.Println(df)
}

Output:

+---++-------+------+
|   ||  foo  | bar  |
| - ||  baz  | qux  |
|---||-------|------|
| 0 || corge | fred |
+---++-------+------+

Example (MultipleHeadersWithLabels) ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := ", foo, bar\n labels, baz, qux\n 1, corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(2), tada.ReadOptionLabels(1))
	fmt.Println(df)
}

Output:

+--------++-------+------+
|        ||  foo  | bar  |
| labels ||  baz  | qux  |
|--------||-------|------|
|      1 || corge | fred |
+--------++-------+------+

Example (NoHeaders) ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := "foo, bar\n baz, qux\n corge, fred"
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionHeaders(0))
	fmt.Println(df)
}

Output:

+---++-------+------+
| - ||   0   |  1   |
|---||-------|------|
| 0 ||   foo |  bar |
| 1 ||   baz |  qux |
| 2 || corge | fred |
+---++-------+------+

Example (WithLabels) ¶

package main

import (
	"fmt"
	"strings"

	"github.com/ptiger10/tada"
)

func main() {
	data := `foo, bar
	baz, qux
	corge, fred`
	df, _ := tada.ReadCSV(strings.NewReader(data), tada.ReadOptionLabels(1))
	fmt.Println(df)
}

Output:

+-------++------+
|  foo  || bar  |
|-------||------|
|   baz ||  qux |
| corge || fred |
+-------++------+

func ReadCSVFromRecords ¶ added in v0.4.0

func ReadCSVFromRecords(records [][]string, options ...ReadOption) (ret *DataFrame, err error)

ReadCSVFromRecords reads records into a DataFrame (configured by options). Often used with encoding/csv.NewReader().ReadAll() All columns will be read as []string. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row; no labels; rows as major dimension

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	data := [][]string{
		{"foo", "bar"},
		{"baz", "qux"},
		{"corge", "fred"},
	}
	df, _ := tada.ReadCSVFromRecords(data)
	fmt.Println(df)
}

Output:

+---++-------+------+
| - ||  foo  | bar  |
|---||-------|------|
| 0 ||   baz |  qux |
| 1 || corge | fred |
+---++-------+------+

Example (ColsAsMajorDimension) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	data := [][]string{
		{"foo", "bar"},
		{"baz", "qux"},
		{"corge", "fred"},
	}
	df, _ := tada.ReadCSVFromRecords(data, tada.ReadOptionSwitchDims())
	fmt.Println(df)
}

Output:

+---++-----+-----+-------+
| - || foo | baz | corge |
|---||-----|-----|-------|
| 0 || bar | qux |  fred |
+---++-----+-----+-------+

func ReadInterfaceRecords ¶ added in v0.7.0

func ReadInterfaceRecords(records [][]interface{}, options ...ReadOption) (ret *DataFrame, err error)

ReadInterfaceRecords reads records into a DataFrame (configured by options). All columns will be read as []interface{}. Available options: ReadOptionHeaders, ReadOptionLabels, ReadOptionSwitchDims.

Default if no options are supplied: 1 header row; no labels; rows as major dimension

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). If no headers are supplied, a default level of sequential column names (e.g., 0, 1, etc) is used. Default column names are displayed on printing. Label levels are named *i (e.g., *0, *1, etc) by default when first created. Default label names are hidden on printing.

func ReadMatrix ¶

func ReadMatrix(mat Matrix) *DataFrame

ReadMatrix reads data satisfying the gonum Matrix interface into a DataFrame. Panics if any slices in the matrix are shorter than the first slice.

func ReadStruct ¶

func ReadStruct(strct interface{}, options ...ReadOption) (*DataFrame, error)

ReadStruct reads the exported fields in strct into a DataFrame. strct must be a struct or pointer to a struct. If any exported field in strct is nil, returns an error.

If a "tada" tag is present with the value "isNull", this field must be [][]bool with one equal-lengthed slice for each exported field. These values will set the null status for each of the resulting value containers in the DataFrame, from left-to-right. If a "tada" tag has any other value, the resulting value container will have the same name as the tag value. Otherwise, the value container will have the same name as the exported field.

func ReadStructSlice ¶ added in v0.5.1

func ReadStructSlice(slice interface{}) (*DataFrame, error)

ReadStructSlice reads a slice of structs into a DataFrame with field names converted to column names, field values converted to column values, and default labels. The structs must all be of the same type.

A default label level named *0 is inserted ([]int incrementing from 0). Default label names are hidden on printing.

func (*DataFrame) Append ¶

func (df *DataFrame) Append(other *DataFrame) *DataFrame

Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Returns a new DataFrame.

func (*DataFrame) Apply ¶

func (df *DataFrame) Apply(lambdas map[string]ApplyFn) *DataFrame

Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new DataFrame.

func (*DataFrame) At ¶

func (df *DataFrame) At(row, column int) *Element

At returns the Element at the row and column index positions. If row or column is out of range, returns nil.

func (*DataFrame) CSVRecords ¶ added in v0.6.0

func (df *DataFrame) CSVRecords(options ...WriteOption) [][]string

CSVRecords writes a DataFrame to a [][]string with rows as the major dimension. Null values are replaced with "(null)".

func (*DataFrame) Cast ¶

func (df *DataFrame) Cast(containerAsType map[string]DType)

Cast coerces the underlying container values (column or label level) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time and caches the []byte values of the container (if inexpensive). Use cast to improve performance when calling multiple operations on values.

func (*DataFrame) Col ¶

func (df *DataFrame) Col(name string) *Series

Col finds the first column with matching name and returns as a Series. Similar to SelectLabels(), but selects column values instead of label values.

func (*DataFrame) Cols ¶

func (df *DataFrame) Cols(names ...string) *DataFrame

Cols returns all columns with matching names.

func (*DataFrame) Copy ¶

func (df *DataFrame) Copy() *DataFrame

Copy returns a new DataFrame with identical values as the original but no shared objects (i.e., all internals are newly allocated).

func (*DataFrame) Count ¶

func (df *DataFrame) Count() *Series

Count counts the number of non-null values in each column.

func (*DataFrame) DeduplicateNames ¶

func (df *DataFrame) DeduplicateNames() *DataFrame

DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Returns a new DataFrame.

func (*DataFrame) DropCol ¶

func (df *DataFrame) DropCol(name string) *DataFrame

DropCol drops the first column matching name. Returns a new DataFrame.

func (*DataFrame) DropLabels ¶

func (df *DataFrame) DropLabels(name string) *DataFrame

DropLabels drops the first label level matching name. Returns a new DataFrame.

func (*DataFrame) DropNull ¶

func (df *DataFrame) DropNull(subset ...string) *DataFrame

DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Returns a new DataFrame.

func (*DataFrame) DropRow ¶

func (df *DataFrame) DropRow(index int) *DataFrame

DropRow removes the row at the specified index. Returns a new DataFrame.

func (*DataFrame) EqualsCSV ¶

func (df *DataFrame) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)

EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both df and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.

If includeLabels is true, then df's labels are included as columns.

func (*DataFrame) Err ¶

func (df *DataFrame) Err() error

Err returns the most recent error attached to the DataFrame, if any.

func (*DataFrame) FillNull ¶

func (df *DataFrame) FillNull(how map[string]NullFiller) *DataFrame

FillNull fills null values and makes them non-null based on how, a map of container names (either column or label names) and tada.NullFiller structs. For each container name in the map, the first field selected (i.e., not left blank) in its NullFiller struct is the strategy used to replace null values in that container. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Returns a new DataFrame.

func (*DataFrame) Filter ¶

func (df *DataFrame) Filter(filters map[string]FilterFn) *DataFrame

Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new DataFrame.

Example ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	dt1 := time.Date(2020, 1, 1, 0, 0, 0, 0, time.UTC)
	dt2 := dt1.AddDate(0, 0, 1)
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3}, []time.Time{dt1, dt2, dt1}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)

	gt1 := func(val interface{}) bool { return val.(float64) > 1 }
	beforeDate := func(val interface{}) bool { return val.(time.Time).Before(dt2) }
	ret := df.Filter(map[string]tada.FilterFn{
		"foo": gt1,
		"bar": beforeDate,
	})
	fmt.Println(ret)
}

Output:

+---++-----+----------------------+
| - || foo |         bar          |
|---||-----|----------------------|
| 0 ||   1 | 2020-01-01T00:00:00Z |
| 1 ||   2 | 2020-01-02T00:00:00Z |
| 2 ||   3 | 2020-01-01T00:00:00Z |
+---++-----+----------------------+

+---++-----+----------------------+
| - || foo |         bar          |
|---||-----|----------------------|
| 2 ||   3 | 2020-01-01T00:00:00Z |
+---++-----+----------------------+

func (*DataFrame) FilterByValue ¶ added in v0.3.5

func (df *DataFrame) FilterByValue(filters map[string]interface{}) *DataFrame

FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Returns a new DataFrame.

func (*DataFrame) FilterCols ¶

func (df *DataFrame) FilterCols(lambda func(string) bool, level int) *DataFrame

FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.

func (*DataFrame) FilterIndex ¶ added in v0.7.6

func (df *DataFrame) FilterIndex(container string, filterFn FilterFn) []int

FilterIndex returns the index positions of the rows in container that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil.

func (*DataFrame) GetLabels ¶ added in v0.3.5

func (df *DataFrame) GetLabels() []interface{}

GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame(). NB: If supplying this output to either of these constructors, be sure to use the spread operator (...), or else the labels will not be read as separate levels.

func (*DataFrame) GroupBy ¶

func (df *DataFrame) GroupBy(names ...string) *GroupedDataFrame

GroupBy groups the DataFrame rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedDataFrame.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2, 3, 4},
	},
		[]string{"foo", "bar", "foo", "bar"}).
		SetColNames([]string{"baz"})
	g := df.GroupBy()
	fmt.Println(g)
}

Output:

+-----++-----+
|  -  || baz |
|-----||-----|
| foo ||   1 |
|     ||   3 |
| bar ||   2 |
|     ||   4 |
+-----++-----+

func (*DataFrame) HasCols ¶

func (df *DataFrame) HasCols(colNames ...string) error

HasCols returns an error if the DataFrame does not contain all of the colNames supplied.

func (*DataFrame) HasLabels ¶ added in v0.5.0

func (df *DataFrame) HasLabels(labelNames ...string) error

HasLabels returns an error if the DataFrame does not contain all of the labelNames supplied.

func (*DataFrame) HasType ¶ added in v0.3.8

func (df *DataFrame) HasType(sliceType string) (labelIndex, columnIndex []int)

HasType returns the index positions of all label and column containers containing a slice of values where reflect.Type.String() == sliceType. Container index positions may then be supplied to df.SubsetLabels() or df.SubsetCols().

For example, to search for datetime labels: labels, _ := df.HasType("[]time.Time")

To search for float64 columns: _, cols := df.HasType("[]float64")

func (*DataFrame) Head ¶

func (df *DataFrame) Head(n int) *DataFrame

Head returns the first n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.

func (*DataFrame) InPlace ¶

func (df *DataFrame) InPlace() *DataFrameMutator

InPlace returns a DataFrameMutator, which contains most of the same methods as DataFrame but never returns a new DataFrame. If you want to save memory and improve performance and do not need to preserve the original DataFrame, consider using InPlace().

func (*DataFrame) IndexOfContainer ¶

func (df *DataFrame) IndexOfContainer(name string, columns bool) int

IndexOfContainer returns the index position of the first container with a name matching name (case-sensitive). If name does not match any container, -1 is returned. If columns is true, only column names will be searched. If columns is false, only label level names will be searched.

func (*DataFrame) InterfaceRecords ¶ added in v0.7.6

func (df *DataFrame) InterfaceRecords(options ...WriteOption) [][]interface{}

InterfaceRecords writes a DataFrame to a [][]interface{} with columns as the major dimension. Null values are replaced with "(null)".

func (*DataFrame) IsNull ¶ added in v0.7.6

func (df *DataFrame) IsNull(subset ...string) *DataFrame

IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Returns a new DataFrame.

func (*DataFrame) Iterator ¶ added in v0.2.0

func (df *DataFrame) Iterator() *DataFrameIterator

Iterator returns an iterator which may be used to access the values in each row as map[string]Element.

func (*DataFrame) LabelsAsSeries ¶ added in v0.6.2

func (df *DataFrame) LabelsAsSeries(name string) *Series

LabelsAsSeries finds the first label level with matching name and returns the values as a Series. Similar to Col(), but selects label values instead of column values. The labels in the Series are shared with the labels in the DataFrame. If label level name is default (prefixed with *), the prefix is removed.

func (*DataFrame) Len ¶

func (df *DataFrame) Len() int

Len returns the number of rows in each column of the DataFrame.

func (*DataFrame) ListColNames ¶ added in v0.2.0

func (df *DataFrame) ListColNames() []string

ListColNames returns the name of all the columns in the DataFrame, in order. If df has multiple column levels, each column name is a single string with level values separated by "|" (may be changed with SetOptionDefaultSeparator). To return the names at a specific level, use ListColNamesAtLevel().

func (*DataFrame) ListColNamesAtLevel ¶ added in v0.2.0

func (df *DataFrame) ListColNamesAtLevel(level int) []string

ListColNamesAtLevel returns the name of all the columns in the DataFrame, in order, at the supplied column level. If level is out of range, returns a nil slice.

func (*DataFrame) ListLabelNames ¶

func (df *DataFrame) ListLabelNames() []string

ListLabelNames returns the name of all the label levels in the DataFrame, in order.

func (*DataFrame) Lookup ¶

func (df *DataFrame) Lookup(other *DataFrame, options ...JoinOption) (*DataFrame, error)

Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Lookup identifies the row alignment between df and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:

df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:

FOO BAR bar (null) baz corge

Returns a new DataFrame.

func (*DataFrame) Max ¶

func (df *DataFrame) Max() *Series

Max coerces the values in each column to float64 and returns the maximum non-null value in each column.

func (*DataFrame) Mean ¶

func (df *DataFrame) Mean() *Series

Mean coerces the values in each column to float64 and calculates the mean of each column.

func (*DataFrame) Median ¶

func (df *DataFrame) Median() *Series

Median coerces the values in each column to float64 and calculates the median of each column.

func (*DataFrame) Merge ¶

func (df *DataFrame) Merge(other *DataFrame, options ...JoinOption) (*DataFrame, error)

Merge joins other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Merge identifies the row alignment between df and other and appends aligned values as new columns on df. Rows are aligned when 1) one or more containers (either column or label level) in other share the same name as one or more containers in df, and 2) the stringified values in the other containers match the values in the df containers. For the following dataframes:

df other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in df is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:

df FOO BAR QUX bar 0 null baz 1 corge

Finally, all container names (columns and label names) are deduplicated after the merge so that they are unique. Returns a new DataFrame.

func (*DataFrame) Min ¶

func (df *DataFrame) Min() *Series

Min coerces the values in each column to float64 and returns the minimum non-null value in each column.

func (*DataFrame) NUnique ¶

func (df *DataFrame) NUnique() *Series

NUnique counts the number of unique non-null values in each column.

func (*DataFrame) Name ¶

func (df *DataFrame) Name() string

Name returns the name of the DataFrame.

func (*DataFrame) NameOfCol ¶ added in v0.2.0

func (df *DataFrame) NameOfCol(n int) string

NameOfCol returns the name of the column at index position n. If n is out of range, returns "-out of range-"

func (*DataFrame) NameOfLabel ¶

func (df *DataFrame) NameOfLabel(n int) string

NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"

func (*DataFrame) NumColumns ¶ added in v0.3.6

func (df *DataFrame) NumColumns() int

NumColumns returns the number of colums in the DataFrame.

func (*DataFrame) NumLevels ¶ added in v0.3.6

func (df *DataFrame) NumLevels() int

NumLevels returns the number of label levels in the DataFrame.

func (*DataFrame) PivotTable ¶

func (df *DataFrame) PivotTable(labels, columns, values, aggFunc string) (*DataFrame, error)

PivotTable creates a spreadsheet-style pivot table as a DataFrame by grouping rows using the unique values in labels, reducing the values in values using an aggFunc aggregation function, then promoting the unique values in columns to be new columns. labels, columns, and values should all refer to existing container names (either columns or labels). Supported aggFuncs: sum, mean, median, stdDev, count, min, max.

func (*DataFrame) PromoteToColLevel ¶

func (df *DataFrame) PromoteToColLevel(name string) *DataFrame

PromoteToColLevel pivots an existing container (either column or label names) into a new column level. If promoting would use either the last column or index level, it returns an error. Each unique value in the stacked column is stacked above each existing column. Promotion can add new columns and remove label rows with duplicate values.

func (*DataFrame) Range ¶

func (df *DataFrame) Range(first, last int) *DataFrame

Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is greater than the length of the DataFrame, an error is returned. Returns a new DataFrame.

func (*DataFrame) Reduce ¶ added in v0.7.6

func (df *DataFrame) Reduce(name string, lambda ReduceFn) (*Series, error)

Reduce uses lambda to reduce all columns to a Series named name with column names as labels and reduced values as row values. The type of the new Series is a slice with the same type as the first value outputted by the anonymous function.

func (*DataFrame) Relabel ¶

func (df *DataFrame) Relabel() *DataFrame

Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new DataFrame.

func (*DataFrame) ReorderCols ¶ added in v0.6.8

func (df *DataFrame) ReorderCols(colNames []string) *DataFrame

ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.

func (*DataFrame) ReorderLabels ¶ added in v0.6.8

func (df *DataFrame) ReorderLabels(levelNames []string) *DataFrame

ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Returns a new DataFrame.

func (*DataFrame) Resample ¶ added in v0.2.6

func (df *DataFrame) Resample(how map[string]Resampler) *DataFrame

Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Returns a new DataFrame.

func (*DataFrame) ResetLabels ¶

func (df *DataFrame) ResetLabels(index ...string) *DataFrame

ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level. If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Returns a new DataFrame.

func (*DataFrame) Series ¶ added in v0.5.3

func (df *DataFrame) Series() *Series

Series converts a single-columned DataFrame to a Series that shares the same underlying values and labels.

func (*DataFrame) SetAsLabels ¶ added in v0.3.6

func (df *DataFrame) SetAsLabels(colNames ...string) *DataFrame

SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Returns a new DataFrame.

func (*DataFrame) SetColNames ¶

func (df *DataFrame) SetColNames(colNames []string) *DataFrame

SetColNames sets the names of all the columns in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}, []string{"baz", "qux"}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)
}

Output:

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 0 ||   1 | baz |
| 1 ||   2 | qux |
+---++-----+-----+

func (*DataFrame) SetLabelNames ¶

func (df *DataFrame) SetLabelNames(levelNames []string) *DataFrame

SetLabelNames sets the names of all the label levels in the DataFrame and returns the entire DataFrame. If an error is returned, it is written to the DataFrame.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{[]float64{1, 2}}).
		SetLabelNames([]string{"baz"})
	fmt.Println(df)
}

Output:

+-----++---+
| baz || 0 |
|-----||---|
|   0 || 1 |
|   1 || 2 |
+-----++---+

Example (Multiple) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{[]float64{1, 2}},
		[]int{0, 1}, []string{"foo", "bar"},
	).
		SetColNames([]string{"A"}).
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(df)
}

Output:

+-----+-----++---+
| baz | qux || A |
|-----|-----||---|
|   0 | foo || 1 |
|   1 | bar || 2 |
+-----+-----++---+

func (*DataFrame) SetName ¶

func (df *DataFrame) SetName(name string) *DataFrame

SetName sets the name of a DataFrame and returns the entire DataFrame.

func (*DataFrame) SetNulls ¶ added in v0.3.6

func (df *DataFrame) SetNulls(n int, nulls []bool) error

SetNulls overwrites the underlying boolean slice that records whether each value is null or not for the container at position n (either labels or columns).

func (*DataFrame) SetRows ¶ added in v0.7.6

func (df *DataFrame) SetRows(lambda ApplyFn, container string, rows []int) *DataFrame

SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new DataFrame.

func (*DataFrame) Shuffle ¶ added in v0.6.11

func (df *DataFrame) Shuffle(seed int64) *DataFrame

Shuffle randomizes the row order of the DataFrame. Returns a new DataFrame.

func (*DataFrame) Sort ¶

func (df *DataFrame) Sort(by ...Sorter) *DataFrame

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new DataFrame.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{2, 2, 1}, []string{"b", "c", "a"}},
	).
		SetColNames([]string{"foo", "bar"})
	fmt.Println(df)

	// first sort by foo in ascending order, then sort by bar in descending order
	ret := df.Sort(
		// Float64 is the default sorting DType, and ascending is the default ordering
		tada.Sorter{Name: "foo"},
		tada.Sorter{Name: "bar", DType: tada.String, Descending: true},
	)
	fmt.Println(ret)
}

Output:

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 0 ||   2 |   b |
| 1 ||     |   c |
| 2 ||   1 |   a |
+---++-----+-----+

+---++-----+-----+
| - || foo | bar |
|---||-----|-----|
| 2 ||   1 |   a |
| 1 ||   2 |   c |
| 0 ||     |   b |
+---++-----+-----+

func (*DataFrame) StdDev ¶ added in v0.5.3

func (df *DataFrame) StdDev() *Series

StdDev coerces the values in each column to float64 and calculates the standard deviation of each column.

func (*DataFrame) String ¶

func (df *DataFrame) String() string

String prints the DataFrame in table form, with the number of rows constrained by optionMaxRows, and the number of columns constrained by optionMaxColumns, which may be configured with PrintOptionMaxRows(n) and PrintOptionMaxColumns(n), respectively. By default, repeated values are merged together, but this behavior may be disabled with PrintOptionAutoMerge(false). By default, overly-wide non-header cells are truncated, but this behavior may be changed to wrapping with PrintOptionWrapLines(true).

func (*DataFrame) Struct ¶ added in v0.5.3

func (df *DataFrame) Struct(structPointer interface{}, options ...WriteOption) error

Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{
			[]float64{1, 2},
		},
		[]string{"baz", "qux"},
	).SetLabelNames([]string{"foo"}).
		SetColNames([]string{"bar"})
	type output struct {
		Foo []string  `tada:"foo"`
		Bar []float64 `tada:"bar"`
	}
	var out output
	df.Struct(&out)
	fmt.Printf("%#v", out)
}

Output:

tada_test.output{Foo:[]string{"baz", "qux"}, Bar:[]float64{1, 2}}

Example (WithNulls) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame(
		[]interface{}{
			[]float64{1, 2},
		},
		[]string{"", "qux"},
	).SetLabelNames([]string{"foo"}).
		SetColNames([]string{"bar"})
	type output struct {
		Foo   []string  `tada:"foo"`
		Bar   []float64 `tada:"bar"`
		Nulls [][]bool  `tada:"isNull"`
	}
	var out output
	df.Struct(&out)
	fmt.Printf("%#v", out)
}

Output:

tada_test.output{Foo:[]string{"", "qux"}, Bar:[]float64{1, 2}, Nulls:[][]bool{[]bool{true, false}, []bool{false, false}}}

func (*DataFrame) Subset ¶

func (df *DataFrame) Subset(index []int) *DataFrame

Subset returns only the rows specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) SubsetCols ¶

func (df *DataFrame) SubsetCols(index []int) *DataFrame

SubsetCols returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) SubsetLabels ¶

func (df *DataFrame) SubsetLabels(index []int) *DataFrame

SubsetLabels returns only the labels specified at the index positions, in the order specified. Returns a new DataFrame.

func (*DataFrame) Sum ¶

func (df *DataFrame) Sum() *Series

Sum coerces the values in each column to float64 and sums each column.

func (*DataFrame) SumCols ¶ added in v0.5.1

func (df *DataFrame) SumCols(name string, colNames ...string) (*Series, error)

SumCols finds each column matching a supplied colName, coerces its values to float64, and adds them row-wise. The resulting Series is named name. If any column has a null value for a given row, that row is considered null.

func (*DataFrame) SwapLabels ¶

func (df *DataFrame) SwapLabels(i, j string) *DataFrame

SwapLabels swaps the label levels with names i and j. Returns a new DataFrame.

func (*DataFrame) Tail ¶

func (df *DataFrame) Tail(n int) *DataFrame

Tail returns the last n rows of the DataFrame. If n is greater than the length of the DataFrame, returns the entire DataFrame. In either case, returns a new DataFrame.

func (*DataFrame) Transpose ¶

func (df *DataFrame) Transpose() *DataFrame

Transpose transposes rows into columns. Row values become column values, column names become labels, labels become column names (and multi-level labels become multi-level columns) and label level names swap with column level names. For example a DataFrame with 2 rows and 1 column has 2 columns and 1 row after transposition. Because rows can contain heterogenous types, every column is coerced to []interface{}.

func (*DataFrame) Where ¶ added in v0.3.0

func (df *DataFrame) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)

Where iterates over the rows in df and evaluates whether each one satisfies filters, which is a map of container names (either column or label names) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.

Returns an unnamed Series with a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]int{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	gt1 := func(val interface{}) bool { return val.(int) > 1 }
	ret, _ := df.Where(map[string]tada.FilterFn{"foo": gt1}, true, false)
	fmt.Println(ret)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-------+
| - ||       |
|---||-------|
| 0 || false |
| 1 ||  true |
+---++-------+

func (*DataFrame) WithCol ¶

func (df *DataFrame) WithCol(name string, input interface{}) *DataFrame

WithCol resolves as follows:

If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.

If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, returns a new DataFrame.

Example (Append) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("bar", []bool{false, true})
	fmt.Println(ret)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+-------+
| - || foo |  bar  |
|---||-----|-------|
| 0 ||   1 | false |
| 1 ||   2 |  true |
+---++-----+-------+

Example (Overwrite) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("foo", []string{"baz", "qux"})
	fmt.Println(ret)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 || baz |
| 1 || qux |
+---++-----+

Example (Rename) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	ret := df.WithCol("foo", "qux")
	fmt.Println(ret)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || qux |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

func (*DataFrame) WithLabels ¶

func (df *DataFrame) WithLabels(name string, input interface{}) *DataFrame

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, returns a new DataFrame.

func (*DataFrame) WriteCSV ¶ added in v0.4.0

func (df *DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV converts a DataFrame to a csv with rows as the major dimension, and writes the output to w. Null values are replaced with "(null)".

type DataFrameIterator ¶ added in v0.2.0

type DataFrameIterator struct {
	// contains filtered or unexported fields
}

A DataFrameIterator iterates over the rows in a DataFrame.

func (*DataFrameIterator) Next ¶ added in v0.2.0

func (iter *DataFrameIterator) Next() bool

Next advances to next row. Returns false at end of iteration.

func (*DataFrameIterator) Row ¶ added in v0.2.0

func (iter *DataFrameIterator) Row() map[string]Element

Row returns the current row in the DataFrame as a map. The map keys are the names of containers (including label levels). The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Element of the left-most column are returned.

type DataFrameMutator ¶

type DataFrameMutator struct {
	// contains filtered or unexported fields
}

A DataFrameMutator is used to change DataFrame values in place.

func (*DataFrameMutator) Append ¶

func (df *DataFrameMutator) Append(other *DataFrame) error

Append adds the other labels and values as new rows to the DataFrame. If the types of any container do not match, all the values in that container are coerced to string. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Apply ¶

func (df *DataFrameMutator) Apply(lambdas map[string]ApplyFn) error

Apply applies an anonymous function to every row in a container based on lambdas, which is a map of container names (either column or label names) to anonymous functions. A row's null status can be changed in-place within the anonymous function. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DeduplicateNames ¶

func (df *DataFrameMutator) DeduplicateNames()

DeduplicateNames deduplicates the names of containers (label levels and columns) from left-to-right by appending _n to duplicate names, where n is equal to the number of times that name has already appeared. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropCol ¶

func (df *DataFrameMutator) DropCol(name string) error

DropCol drops the first column matching name. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropLabels ¶

func (df *DataFrameMutator) DropLabels(name string) error

DropLabels drops the first label level matching name. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) DropNull ¶

func (df *DataFrameMutator) DropNull(subset ...string) error

DropNull removes rows with a null value in any column. If subset is supplied, removes any rows with null values in any of the specified columns. Modifies the underlying DataFrame.

func (*DataFrameMutator) DropRow ¶

func (df *DataFrameMutator) DropRow(index int) error

DropRow removes the row at the specified index. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FillNull ¶

func (df *DataFrameMutator) FillNull(how map[string]NullFiller) error

FillNull fills null values and makes them non-null based on how. How is a map of container names (either column or label names) and NullFillers. For each container name supplied, the first field selected (i.e., not left blank) in the NullFiller is the strategy used to replace null values. FillForward fills null values with the most recent non-null value in the container. FillBackward fills null values with the next non-null value in the container. FillZero fills null values with the zero value for that container type. FillFloat converts the container values to float64 and fills null values with the value supplied. If no field is selected, the container values are converted to float64 and all null values are filled with 0. Modifies the underlying DataFrame.

func (*DataFrameMutator) Filter ¶

func (df *DataFrameMutator) Filter(filters map[string]FilterFn) error

Filter returns a new DataFrame with only rows that satisfy all of the filters, which is a map of container names (either column name or label name) and anonymous functions.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FilterByValue ¶ added in v0.3.5

func (df *DataFrameMutator) FilterByValue(filters map[string]interface{}) error

FilterByValue returns the rows in the DataFrame satisfying all filters, which is a map of of container names (either column or label names) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) FilterCols ¶ added in v0.2.0

func (df *DataFrameMutator) FilterCols(lambda func(string) bool, level int) error

FilterCols returns the columns with names that satisfy lambda at the supplied column level. level should be 0 unless df has multiple column levels.

func (*DataFrameMutator) IsNull ¶ added in v0.7.6

func (df *DataFrameMutator) IsNull(subset ...string) error

IsNull returns all the rows with any null values. If subset is supplied, returns all the rows with all non-null values in the specified columns. Modifies the underlying DataFrame.

func (*DataFrameMutator) Range ¶ added in v0.7.6

func (df *DataFrameMutator) Range(first, last int) error

Range returns the rows of the DataFrame starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If first or last is out of range, an error is returned. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Relabel ¶

func (df *DataFrameMutator) Relabel()

Relabel resets the DataFrame labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying DataFrame in place.

func (*DataFrameMutator) ReorderCols ¶ added in v0.6.8

func (df *DataFrameMutator) ReorderCols(colNames []string) error

ReorderCols reorders the columns to be in the same order as specified by colNames. If a column is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) ReorderLabels ¶ added in v0.6.8

func (df *DataFrameMutator) ReorderLabels(levelNames []string) error

ReorderLabels reorders the label levels to be in the same order as specified by levelNames. If a level is not specified, it is excluded from the resulting DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) Resample ¶ added in v0.2.6

func (df *DataFrameMutator) Resample(how map[string]Resampler) error

Resample coerces values to time.Time and truncates them by the logic supplied in how, which is a map of of container names (either column or label names) to tada.Resampler structs. For each container name in the map, the first By field selected (i.e., not left blank) in its Resampler struct provides the resampling logic for that container. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Modifies the underlying DataFrame in place.

func (*DataFrameMutator) ResetLabels ¶

func (df *DataFrameMutator) ResetLabels(labelLevels ...string) error

ResetLabels appends the label level(s) at the supplied index levels as columns and drops the level(s). If no index levels are supplied, all label levels are appended as columns and dropped as levels, and replaced by a default label column. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SetAsLabels ¶ added in v0.3.6

func (df *DataFrameMutator) SetAsLabels(colNames ...string)

SetAsLabels appends the column(s) supplied as colNames as label levels and drops the column(s). The number of colNames supplied must be less than the number of columns in the Series. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SetRows ¶ added in v0.7.6

func (df *DataFrameMutator) SetRows(lambda ApplyFn, container string, rows []int) error

SetRows applies lambda within container (either label or column name) to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying DataFrame.

func (*DataFrameMutator) Shuffle ¶ added in v0.6.11

func (df *DataFrameMutator) Shuffle(seed int64)

Shuffle randomizes the row order of the DataFrame. Modifies the underlying DataFrame.

func (*DataFrameMutator) Sort ¶

func (df *DataFrameMutator) Sort(by ...Sorter) error

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, does not sort. If no DType is supplied for a Sorter, sorts as float64. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) Subset ¶

func (df *DataFrameMutator) Subset(index []int) error

Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SubsetCols ¶

func (df *DataFrameMutator) SubsetCols(index []int) error

SubsetCols returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SubsetLabels ¶

func (df *DataFrameMutator) SubsetLabels(index []int) error

SubsetLabels returns only the labels specified at the index positions, in the order specified. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) SwapLabels ¶

func (df *DataFrameMutator) SwapLabels(i, j string) error

SwapLabels swaps the label levels with names i and j. Modifies the underlying DataFrame in place.

func (*DataFrameMutator) WithCol ¶

func (df *DataFrameMutator) WithCol(name string, input interface{}) error

WithCol resolves as follows:

If a scalar string is supplied as input and a column exists that matches name: rename the column to match input. In this case, name must already exist.

If a slice is supplied as input and a column exists that matches name: replace the values at this column to match input. If a slice is supplied as input and a column does not exist that matches name: append a new column named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, modifies the underlying DataFrame in place.

Example (Rename) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	df := tada.NewDataFrame([]interface{}{
		[]float64{1, 2}},
	).
		SetColNames([]string{"foo"})
	fmt.Println(df)

	df.InPlace().WithCol("foo", "qux")
	fmt.Println(df)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

+---++-----+
| - || qux |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

func (*DataFrameMutator) WithLabels ¶

func (df *DataFrameMutator) WithLabels(name string, input interface{}) error

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying DataFrame.

In all cases, modifies the underlying DataFrame in place.

type Element ¶

type Element struct {
	Val    interface{}
	IsNull bool
}

An Element is one {value, null status} pair in either a Series or DataFrame.

type FilterFn ¶

type FilterFn func(value interface{}) bool

A FilterFn is an anonymous function supplied to a Filter or Where function. The function will be called on every val in the container.

type GroupedDataFrame ¶

type GroupedDataFrame struct {
	// contains filtered or unexported fields
}

A GroupedDataFrame is a collection of row positions sharing the same group key. A GroupedDataFrame has a reference to an underlying DataFrame, which is used for reduce operations.

func (*GroupedDataFrame) Apply ¶ added in v0.7.0

func (g *GroupedDataFrame) Apply(cols []string, lambda ApplyFn) *GroupedDataFrame

Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null) from a single column. Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.

func (*GroupedDataFrame) Col ¶

func (g *GroupedDataFrame) Col(colName string) *GroupedSeries

Col isolates the Series at containerName, which may be either a label level or column in the underlying DataFrame. Returns a GroupedSeries with the same groups and labels as in the GroupedDataFrame.

func (*GroupedDataFrame) Count ¶

func (g *GroupedDataFrame) Count(colNames ...string) *DataFrame

Count returns the number of non-null values in each group for the columns in colNames.

func (*GroupedDataFrame) DataFrame ¶ added in v0.4.10

func (g *GroupedDataFrame) DataFrame() *DataFrame

DataFrame returns the GroupedDataFrame as a DataFrame, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name. Columns used as label levels are dropped.

func (*GroupedDataFrame) Earliest ¶

func (g *GroupedDataFrame) Earliest(colNames ...string) *DataFrame

Earliest coerces the column values in colNames to time.Time and calculates the earliest timestamp of each group.

func (*GroupedDataFrame) Err ¶

func (g *GroupedDataFrame) Err() error

Err returns the underlying error, if any

func (*GroupedDataFrame) First ¶

func (g *GroupedDataFrame) First(colNames ...string) *DataFrame

First returns the first row within each group for the columns in colNames.

func (*GroupedDataFrame) GetGroup ¶

func (g *GroupedDataFrame) GetGroup(group string) *DataFrame

GetGroup returns the grouped rows sharing the same group key as a new DataFrame.

func (*GroupedDataFrame) GetLabels ¶ added in v0.4.3

func (g *GroupedDataFrame) GetLabels() []interface{}

GetLabels returns the grouped label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*GroupedDataFrame) HavingCount ¶

func (g *GroupedDataFrame) HavingCount(lambda func(int) bool) *GroupedDataFrame

HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).

func (*GroupedDataFrame) Iterator ¶ added in v0.2.0

func (g *GroupedDataFrame) Iterator() *GroupedDataFrameIterator

Iterator returns an iterator which may be used to access each group of rows as a new DataFrame, in the order in which the groups originally appeared.

func (*GroupedDataFrame) Last ¶

func (g *GroupedDataFrame) Last(colNames ...string) *DataFrame

Last returns the last row within each group for the columns in colNames.

func (*GroupedDataFrame) Latest ¶

func (g *GroupedDataFrame) Latest(colNames ...string) *DataFrame

Latest coerces the column values in colNames to time.Time and calculates the latest timestamp of each group.

func (*GroupedDataFrame) Len ¶

func (g *GroupedDataFrame) Len() int

Len returns the number of group labels.

func (*GroupedDataFrame) ListGroups ¶

func (g *GroupedDataFrame) ListGroups() []string

ListGroups returns a list of group keys in the order in which they originally appeared.

func (*GroupedDataFrame) Max ¶

func (g *GroupedDataFrame) Max(colNames ...string) *DataFrame

Max coerces the column values in colNames to float64 and calculates the maximum of each group.

func (*GroupedDataFrame) Mean ¶

func (g *GroupedDataFrame) Mean(colNames ...string) *DataFrame

Mean coerces the column values in colNames to float64 and calculates the mean of each group.

func (*GroupedDataFrame) Median ¶

func (g *GroupedDataFrame) Median(colNames ...string) *DataFrame

Median coerces the column values in colNames to float64 and calculates the median of each group.

func (*GroupedDataFrame) Min ¶

func (g *GroupedDataFrame) Min(colNames ...string) *DataFrame

Min coerces the column values in colNames to float64 and calculates the minimum of each group.

func (*GroupedDataFrame) NUnique ¶

func (g *GroupedDataFrame) NUnique(colNames ...string) *DataFrame

NUnique returns the number of unique, non-null values in each group for the columns in colNames.

func (*GroupedDataFrame) Nth ¶

func (g *GroupedDataFrame) Nth(index int, colNames ...string) *DataFrame

Nth returns the row at position n (if it exists) within each group for the columns in colNames.

func (*GroupedDataFrame) Reduce ¶

func (g *GroupedDataFrame) Reduce(name string, cols []string, lambda ReduceFn) *DataFrame

Reduce iterates over the groups in the GroupedDataFrame and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new DataFrame named "name_originalDataFrameName" with columns named "name_originalColumnName" where each reduced group is represented by a single row.

The columns in the new DataFrame will be slices of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values in each column, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into new []float64 columns and return in a new DataFrame.

func (*GroupedDataFrame) StdDev ¶ added in v0.5.3

func (g *GroupedDataFrame) StdDev(colNames ...string) *DataFrame

StdDev coerces the column values in colNames to float64 and calculates the standard deviation of each group.

func (*GroupedDataFrame) String ¶ added in v0.4.10

func (g *GroupedDataFrame) String() string

func (*GroupedDataFrame) Sum ¶

func (g *GroupedDataFrame) Sum(colNames ...string) *DataFrame

Sum coerces the column values in colNames to float64 and calculates the sum of each group.

type GroupedDataFrameIterator ¶ added in v0.2.0

type GroupedDataFrameIterator struct {
	// contains filtered or unexported fields
}

GroupedDataFrameIterator iterates over all DataFrames in the group.

func (*GroupedDataFrameIterator) DataFrame ¶ added in v0.2.0

func (g *GroupedDataFrameIterator) DataFrame() *DataFrame

DataFrame returns the current grouped DataFrame.

func (*GroupedDataFrameIterator) Next ¶ added in v0.2.0

func (g *GroupedDataFrameIterator) Next() bool

Next advances to next grouped DataFrame. Returns false at end of iteration.

type GroupedSeries ¶

type GroupedSeries struct {
	// contains filtered or unexported fields
}

A GroupedSeries is a collection of row positions sharing the same group key. A GroupedSeries has a reference to an underlying Series, which is used for reduce operations.

func (*GroupedSeries) Align ¶

func (g *GroupedSeries) Align() *GroupedSeries

Align changes subsequent reduce operations for this group to return a Series aligned with the original Series labels (the default behavior is to return a Series with one label per group). If the original Series is:

FOO baz 0 baz 1 bar 2 bar 4

and it is grouped by the "foo" label, then the default g.Sum() reducer would return:

FOO baz 1 bar 6

After g.Align(), the g.Sum() reducer would return:

FOO baz 1 baz 1 bar 6 bar 6

Example (Mean) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	// here, s.GroupBy("baz") is equivalent to s.GroupBy()
	g := s.GroupBy("baz")
	fmt.Println(g.Align().Mean())

}

Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   0 ||   3 |
|   1 ||   4 |
+-----++-----+

+-----++----------+
| baz || mean_foo |
|-----||----------|
|   0 ||        2 |
|   1 ||        3 |
|   0 ||        2 |
|   1 ||        3 |
+-----++----------+

func (*GroupedSeries) Apply ¶ added in v0.7.0

func (g *GroupedSeries) Apply(lambda ApplyFn) *GroupedSeries

Apply applies lambda to every group. Each lambda input will be a slice of grouped values (including values considered null). Each lambda output must be a slice that is the same length as the input. A row's null status can be set in-place within the anonymous function by accessing the []bool argument.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}).
		SetName("foobar").
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	// if group has at least 3 items, multiply by 2. otherwise set as null.
	modifyBigGroup := func(slice interface{}, isNull []bool) interface{} {
		vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error
		ret := make([]float64, len(vals))
		if len(vals) >= 3 {
			for i := range ret {
				ret[i] = vals[i] * 2
			}
		} else {
			for i := range ret {
				isNull[i] = true
			}
		}
		return ret
	}
	fmt.Println(g.Apply(modifyBigGroup).Series())

}

Output:

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      1 |
|     |   1 ||      2 |
| foo |   2 ||      3 |
| bar |   3 ||      4 |
+-----+-----++--------+

+-----++--------+
| baz || foobar |
|-----||--------|
| bar ||      2 |
|     ||      4 |
|     ||      8 |
| foo || (null) |
+-----++--------+

Example (Align) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"bar", "bar", "foo", "bar"}, []int{0, 1, 2, 3}).
		SetName("foobar").
		SetLabelNames([]string{"baz", "qux"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	// if group has at least 3 items, multiply by 2. otherwise set as null.
	modifyBigGroup := func(slice interface{}, isNull []bool) interface{} {
		vals, _ := slice.([]float64) // in normal usage, check the type assertion and handle an error
		ret := make([]float64, len(vals))
		if len(vals) >= 3 {
			for i := range ret {
				ret[i] = vals[i] * 2
			}
		} else {
			for i := range ret {
				isNull[i] = true
			}
		}
		return ret
	}
	g.Align()
	fmt.Println(g.Apply(modifyBigGroup).Series())

}

Output:

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      1 |
|     |   1 ||      2 |
| foo |   2 ||      3 |
| bar |   3 ||      4 |
+-----+-----++--------+

+-----+-----++--------+
| baz | qux || foobar |
|-----|-----||--------|
| bar |   0 ||      2 |
|     |   1 ||      4 |
| foo |   2 || (null) |
| bar |   3 ||      8 |
+-----+-----++--------+

func (*GroupedSeries) Count ¶

func (g *GroupedSeries) Count() *Series

Count returns the number of non-null values in each group.

func (*GroupedSeries) Earliest ¶

func (g *GroupedSeries) Earliest() *Series

Earliest coerces the Series values to time.Time and calculates the earliest timestamp in each group.

func (*GroupedSeries) Err ¶

func (g *GroupedSeries) Err() error

Err returns the underlying error, if any.

func (*GroupedSeries) First ¶

func (g *GroupedSeries) First() *Series

First returns the first row in each group.

func (*GroupedSeries) GetGroup ¶

func (g *GroupedSeries) GetGroup(group string) *Series

GetGroup returns the grouped rows sharing the same group key as a new Series.

func (*GroupedSeries) GetLabels ¶ added in v0.4.3

func (g *GroupedSeries) GetLabels() []interface{}

GetLabels returns the grouped label levels as interface{} slices within an []interface returns the group's labels as slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*GroupedSeries) HavingCount ¶

func (g *GroupedSeries) HavingCount(lambda func(int) bool) *GroupedSeries

HavingCount removes any groups from g that do not satisfy the boolean function supplied in lambda. For each group, the input into lambda is the total number of values in the group (null or not-null).

Example (Sum) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 1, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	countOf3 := func(n int) bool { return n == 3 }
	g := s.GroupBy("baz")
	fmt.Println(g.HavingCount(countOf3).Sum())

}

Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|     ||   3 |
|     ||   4 |
+-----++-----+

+-----++---------+
| baz || sum_foo |
|-----||---------|
|   1 ||       9 |
+-----++---------+

func (*GroupedSeries) Iterator ¶ added in v0.2.0

func (g *GroupedSeries) Iterator() *GroupedSeriesIterator

Iterator returns an iterator which may be used to access each group of rows as a new Series, in the order in which the groups originally appeared.

func (*GroupedSeries) Last ¶

func (g *GroupedSeries) Last() *Series

Last returns the last row in each group.

func (*GroupedSeries) Latest ¶

func (g *GroupedSeries) Latest() *Series

Latest coerces the Series values to time.Time and calculates the latest timestamp in each group.

func (*GroupedSeries) Len ¶

func (g *GroupedSeries) Len() int

Len returns the number of group labels.

func (*GroupedSeries) ListGroups ¶

func (g *GroupedSeries) ListGroups() []string

ListGroups returns a list of group keys in the order in which they originally appeared.

func (*GroupedSeries) Max ¶

func (g *GroupedSeries) Max() *Series

Max coerces values to float64 and calculates the maximum of each group.

func (*GroupedSeries) Mean ¶

func (g *GroupedSeries) Mean() *Series

Mean coerces values to float64 and calculates the mean of each group.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []int{0, 1, 0, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	// here, s.GroupBy("baz") is equivalent to s.GroupBy()
	g := s.GroupBy("baz")
	fmt.Println(g.Mean())

}

Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|   1 ||   2 |
|   0 ||   3 |
|   1 ||   4 |
+-----++-----+

+-----++----------+
| baz || mean_foo |
|-----||----------|
|   0 ||        2 |
|   1 ||        3 |
+-----++----------+

func (*GroupedSeries) Median ¶

func (g *GroupedSeries) Median() *Series

Median coerces values to float64 and calculates the median of each group.

func (*GroupedSeries) Min ¶

func (g *GroupedSeries) Min() *Series

Min coerces values to float64 and calculates the minimum of each group.

func (*GroupedSeries) NUnique ¶

func (g *GroupedSeries) NUnique() *Series

NUnique returns the number of unique values in each group.

func (*GroupedSeries) Nth ¶

func (g *GroupedSeries) Nth(n int) *Series

Nth returns the row at position n (if it exists) within each group.

func (*GroupedSeries) Reduce ¶

func (g *GroupedSeries) Reduce(name string, lambda ReduceFn) *Series

Reduce iterates over the groups in the GroupedSeries and reduces each group of values into a single value using the function supplied in lambda. Reduce returns a new Series named "name_originalColName" where each reduced group is represented by a single row.

The new Series will be a slice of reduced values with the same type as the GroupReduceFn output. With GroupReduceFn.Float64, for example, Reduce will iterate over all the grouped values, coerce each group to []float64, reduce each groupedSlice to a single float64 value, then concatenate these reduced values into a new []float64 and return in a new Series.

Example ¶

package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4, 5, 6}, []int{0, 0, 0, 1, 1, 1}).
		SetName("foo").
		SetLabelNames([]string{"baz"})
	fmt.Println(s)

	g := s.GroupBy("baz")
	maxOdd := func(slice interface{}, isNull []bool) (value interface{}, null bool) {
		vals := slice.([]float64)
		max := math.Inf(-1)
		for i := range vals {
			if !isNull[i] && int(vals[i])%2 == 1 && vals[i] > max {
				max = vals[i]
			}
		}
		return max, false
	}
	fmt.Println(g.Reduce("max_odd", maxOdd))

}

Output:

+-----++-----+
| baz || foo |
|-----||-----|
|   0 ||   1 |
|     ||   2 |
|     ||   3 |
|   1 ||   4 |
|     ||   5 |
|     ||   6 |
+-----++-----+

+-----++-------------+
| baz || max_odd_foo |
|-----||-------------|
|   0 ||           3 |
|   1 ||           5 |
+-----++-------------+

func (*GroupedSeries) Series ¶ added in v0.4.10

func (g *GroupedSeries) Series() *Series

Series returns the GroupedSeries as a Series, with group names as label levels, in order of appearance in the original Series, and values grouped together by group name.

func (*GroupedSeries) StdDev ¶ added in v0.5.3

func (g *GroupedSeries) StdDev() *Series

StdDev coerces values to float64 and calculates the standard deviation of each group.

func (*GroupedSeries) String ¶

func (g *GroupedSeries) String() string

func (*GroupedSeries) Sum ¶

func (g *GroupedSeries) Sum() *Series

Sum coerces values to float64 and calculates the sum of each group.

type GroupedSeriesIterator ¶ added in v0.2.0

type GroupedSeriesIterator struct {
	// contains filtered or unexported fields
}

GroupedSeriesIterator iterates over all Series in the group.

func (*GroupedSeriesIterator) Next ¶ added in v0.2.0

func (g *GroupedSeriesIterator) Next() bool

Next advances to next grouped Series. Returns false at end of iteration.

func (*GroupedSeriesIterator) Series ¶ added in v0.2.0

func (g *GroupedSeriesIterator) Series() *Series

Series returns the current grouped Series.

type JoinOption ¶ added in v0.6.0

type JoinOption func(*joinConfig)

A JoinOption configures a lookup or merge function. Available lookup options: JoinOptionHow, JoinOptionLeftOn, JoinOptionRightOn

type Matrix ¶

type Matrix interface {
	Dims() (r, c int)
	At(i, j int) float64
}

Matrix is an interface which is compatible with gonum's mat.Matrix interface

type NullFiller ¶

type NullFiller struct {
	FillForward  bool
	FillBackward bool
	FillZero     bool
	FillFloat    float64
}

NullFiller fills every row with a null value and changes the row status to not-null. If multiple fields are provided, resolves in the following order: 1) `FillForward` - fills with the last valid value, 2) `FillBackward` - fills with the next valid value, 3) `FillZero` - fills with the zero type of the slice, 4) `FillFloat` - coerces to float64 and fills with the value provided.

type ReadOption ¶ added in v0.4.0

type ReadOption func(*readConfig)

A ReadOption configures a read function. Available read options: ReadOptionHeaders, ReadOptionLabels, ReadOptionDelimiter, and ReadOptionSwitchDims.

type ReduceFn ¶ added in v0.7.0

type ReduceFn func(slice interface{}, isNull []bool) (value interface{}, null bool)

A ReduceFn is an anonymous function supplied to a Reduce function to reduce a slice of values to one value and one null status per group. isNull contains the null status of every value in the group.

type Resampler ¶

type Resampler struct {
	ByYear      bool
	ByMonth     bool
	ByDay       bool
	ByWeek      bool
	StartOfWeek time.Weekday
	ByDuration  time.Duration
	Location    *time.Location
}

Resampler supplies logic for the Resample() function. Only the first `By` field that is selected (i.e., not left nil) is used - any others are ignored (if `ByWeek` is selected, it may be modified by `StartOfWeek`). `ByYear` truncates the timestamp by year. `ByMonth` truncates the timestamp by month. `ByDay` truncates the timestamp by day. `ByWeek` returns the first day of the most recent week (starting on `StartOfWeek`) relative to timestamp. Otherwise, truncates the timestamp `ByDuration`. If `Location` is not provided, time.UTC is used as the default location.

type Series ¶

type Series struct {
	// contains filtered or unexported fields
}

A Series is a single column of data with one or more levels of aligned labels.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}).SetName("foo")
	fmt.Println(s)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

Example (NestedSlice) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([][]string{{"foo", "bar"}, {"baz"}, {}}).
		SetName("a")
	fmt.Println(s)
}

Output:

+---++-----------+
| - ||     a     |
|---||-----------|
| 0 || [foo bar] |
| 1 ||     [baz] |
| 2 ||    (null) |
+---++-----------+

Example (SetNaNStatus) ¶

package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{0, math.NaN()})
	fmt.Println("isNull:", s.GetNulls())

	tada.SetOptionNaNStatus(false)
	s = tada.NewSeries([]float64{0, math.NaN()})
	fmt.Println("isNull:", s.GetNulls())

	tada.SetOptionNaNStatus(true)
}

Output:

isNull: [false true]
isNull: [false false]

Example (SetSentinelNulls) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]string{"foo", "", "(null)"})
	fmt.Println("default sentinel null values\n isNull:", s.GetNulls())

	tada.SetOptionNullStrings(nil)
	s = tada.NewSeries([]string{"foo", "", "(null)"})
	fmt.Println("remove defaults\n isNull:", s.GetNulls())

	tada.SetOptionNullStrings(tada.GetOptionDefaultNullStrings())
}

Output:

default sentinel null values
 isNull: [false true true]
remove defaults
 isNull: [false false false]

Example (Zscore) ¶

package main

import (
	"fmt"
	"math"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4, 5}).SetName("foo")
	fmt.Println(s)

	vals := s.GetValuesAsFloat64()
	ret := make([]float64, s.Len())
	mean := s.Mean()
	std := s.StdDev()
	for i := range vals {
		val := (vals[i] - mean) / std
		ret[i] = math.Round((val * 100)) / 100 // round to 2 decimal points
	}
	df := s.DataFrame().WithCol("zscore_foo", ret)
	fmt.Println(df)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
| 4 ||   5 |
+---++-----+

+---++-----+------------+
| - || foo | zscore_foo |
|---||-----|------------|
| 0 ||   1 |      -1.41 |
| 1 ||   2 |      -0.71 |
| 2 ||   3 |          0 |
| 3 ||   4 |       0.71 |
| 4 ||   5 |       1.41 |
+---++-----+------------+

func NewSeries ¶

func NewSeries(slice interface{}, labels ...interface{}) *Series

NewSeries constructs a Series from a slice of values and optional label slices. // Slice and all labels must be supported slices.

If no labels are supplied, a default label level is inserted ([]int incrementing from 0). Series values are named 0 by default. The default values name is displayed on printing. Label levels are named *n (e.g., *0, *1, etc) by default. Default label names are hidden on printing.

Supported slice types: all variants of []float, []int, & []uint, []string, []bool, []time.Time, []interface{}, and 2-dimensional variants of each (e.g., [][]string, [][]float64).

func (*Series) Add ¶

func (s *Series) Add(other *Series, ignoreNulls bool) *Series

Add coerces other and s to float64 values, aligns other with s, and adds the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) Append ¶

func (s *Series) Append(other *Series) *Series

Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.

func (*Series) Apply ¶

func (s *Series) Apply(lambda ApplyFn) *Series

Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be set in-place within the anonymous function by accessing the []bool argument. Returns a new Series.

Example (Float64) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3}).SetName("foo")
	fmt.Println(s)

	times2 := func(slice interface{}, isNull []bool) interface{} {
		vals := slice.([]float64)
		ret := make([]float64, len(vals))
		for i := range ret {
			ret[i] = vals[i] * 2
		}
		return ret
	}
	fmt.Println(s.Apply(times2))

}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 ||   2 |
| 1 ||   4 |
| 2 ||   6 |
+---++-----+

func (*Series) At ¶

func (s *Series) At(index int) *Element

At returns the Element at the index position. If index is out of range, returns nil.

func (*Series) Bin ¶ added in v0.4.9

func (s *Series) Bin(bins []float64, config *Binner) (*Series, error)

Bin coerces the Series values to float64 and categorizes each row based on which bin interval it falls within. bins should be a slice of sequential edges that form intervals (left exclusive, right inclusive). For example, [1, 3, 5] represents the intervals 1-3 (excluding 1, including 3), and 3-5 (excluding 3, including 5). If these bins were supplied for a Series with values [3, 4], the returned Series would have values ["1-3", "3-5"]. Null values are not categorized. For default behavior, supply nil as config.

To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, nil)
	fmt.Println(binned)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
| 2 ||   5 |
+---++-----+

+---++--------+
| - ||  foo   |
|---||--------|
| 0 ||    0-2 |
| 1 ||    2-4 |
| 2 || (null) |
+---++--------+

Example (AndMore) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3, 5}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{AndMore: true})
	fmt.Println(binned)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
| 2 ||   5 |
+---++-----+

+---++-----+
| - || foo |
|---||-----|
| 0 || 0-2 |
| 1 || 2-4 |
| 2 ||  >4 |
+---++-----+

Example (CustomLabels) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 3}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.Bin([]float64{0, 2, 4}, &tada.Binner{Labels: []string{"low", "high"}})
	fmt.Println(binned)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   3 |
+---++-----+

+---++------+
| - || foo  |
|---||------|
| 0 ||  low |
| 1 || high |
+---++------+

func (*Series) CSV ¶ added in v0.5.3

func (s *Series) CSV(options ...WriteOption) ([][]string, error)

CSV converts a Series to a DataFrame and returns as [][]string.

func (*Series) Cast ¶

func (s *Series) Cast(containerAsType map[string]DType)

Cast casts the underlying container values (either label levels or Series values) to []float64, []string, []time.Time (aka timezone-aware DateTime), []civil.Date, or []civil.Time. To apply to Series values, supply empty string name ("") or the Series name. Use cast to improve performance when calling multiple operations on values.

Example (Date) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	s.Cast(map[string]tada.DType{"foo": tada.Date})
	fmt.Println(s)
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
+---++----------------------+

+---++------------+
| - ||    foo     |
|---||------------|
| 0 || 2020-01-15 |
+---++------------+

Example (Time) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	s.Cast(map[string]tada.DType{"foo": tada.Time})
	fmt.Println(s)
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
+---++----------------------+

+---++----------+
| - ||   foo    |
|---||----------|
| 0 || 12:15:00 |
+---++----------+

func (*Series) Copy ¶

func (s *Series) Copy() *Series

Copy returns a deep copy of a Series with no shared references to the original.

func (*Series) Count ¶

func (s *Series) Count() int

Count counts the number of non-null Series values.

func (*Series) CumSum ¶

func (s *Series) CumSum() *Series

CumSum coerces the Series values to float64 and returns the cumulative sum at each row position.

func (*Series) DataFrame ¶ added in v0.5.3

func (s *Series) DataFrame() *DataFrame

DataFrame converts a Series to a 1-column DataFrame.

func (*Series) Divide ¶

func (s *Series) Divide(other *Series, ignoreNulls bool) *Series

Divide coerces other and s to float64 values, aligns other with s, and divides the aligned values of s by s, using the labels in s as an anchor. Dividing by 0 always returns a null value. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) DropLabels ¶

func (s *Series) DropLabels(name string) *Series

DropLabels removes the first label level matching name. Returns a new Series.

func (*Series) DropNull ¶

func (s *Series) DropNull() *Series

DropNull returns all the rows with non-null values. Returns a new Series.

func (*Series) DropRow ¶

func (s *Series) DropRow(index int) *Series

DropRow removes the row at the specified index. Returns a new Series.

func (*Series) Earliest ¶

func (s *Series) Earliest() time.Time

Earliest coerces the Series values to time.Time and calculates the earliest timestamp.

func (*Series) EqualsCSV ¶

func (s *Series) EqualsCSV(includeLabels bool, want io.Reader, wantOptions ...ReadOption) (bool, *tablediff.Differences, error)

EqualsCSV reads want (configured by wantOptions) into a dataframe, converts both s and want into [][]string records, and evaluates whether the stringified values match. If they do not match, returns a tablediff.Differences object that can be printed to isolate their differences.

If includeLabels is true, then s's labels are included as columns.

func (*Series) Err ¶

func (s *Series) Err() error

Err returns the most recent error attached to the Series, if any.

func (*Series) FillNull ¶

func (s *Series) FillNull(how NullFiller) *Series

FillNull fills all the null values and makes them not-null. Returns a new Series.

func (*Series) Filter ¶

func (s *Series) Filter(filters map[string]FilterFn) *Series

Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Returns a new Series.

func (*Series) FilterByValue ¶ added in v0.3.5

func (s *Series) FilterByValue(filters map[string]interface{}) *Series

FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Returns a new Series.

func (*Series) FilterIndex ¶ added in v0.4.1

func (s *Series) FilterIndex(container string, filterFn FilterFn) []int

FilterIndex returns the index positions of the rows in container (either the Series name or label name) that satsify filterFn. A filter that matches no rows returns empty []int. An out of range container returns nil. FilterIndex may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

func (*Series) GetLabels ¶ added in v0.3.5

func (s *Series) GetLabels() []interface{}

GetLabels returns label levels as interface{} slices within an []interface that may be supplied as optional labels argument to NewSeries() or NewDataFrame().

func (*Series) GetNulls ¶ added in v0.3.6

func (s *Series) GetNulls() []bool

GetNulls returns whether each value is null or not.

func (*Series) GetValues ¶

func (s *Series) GetValues() interface{}

GetValues returns a copy of the underlying Series data as an interface.

func (*Series) GetValuesAsFloat64 ¶ added in v0.6.0

func (s *Series) GetValuesAsFloat64() []float64

GetValuesAsFloat64 coerces the Series values into []float64.

func (*Series) GetValuesAsString ¶ added in v0.6.0

func (s *Series) GetValuesAsString() []string

GetValuesAsString coerces the Series values into []string.

func (*Series) GetValuesAsTime ¶ added in v0.6.0

func (s *Series) GetValuesAsTime() []time.Time

GetValuesAsTime coerces the Series values into []time.Time.

func (*Series) GroupBy ¶

func (s *Series) GroupBy(names ...string) *GroupedSeries

GroupBy groups the Series rows that share the same stringified value in the container(s) (columns or labels) specified by names. If error occurs, writes error to GroupedSeries.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "bar", "foo", "bar"})
	g := s.GroupBy()
	fmt.Println(g)
}

Output:

	+-----++---+
|  -  || 0 |
|-----||---|
| foo || 1 |
|     || 3 |
| bar || 2 |
|     || 4 |
+-----++---+

Example (CompoundGroup) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}, []string{"foo", "baz", "foo", "baz"}, []string{"bar", "qux", "bar", "qux"})
	g := s.GroupBy()
	fmt.Println(g)
	// +-----+-----++---+
	// |  -  |  -  || 0 |
	// |-----|-----||---|
	// | foo | bar || 1 |
	// |     |     || 3 |
	// | baz | qux || 2 |
	// |     |     || 4 |
	// +-----+-----++---+
}

Output:

func (*Series) HasLabels ¶ added in v0.5.0

func (s *Series) HasLabels(labelNames ...string) error

HasLabels returns an error if the Series does not contain all of the labelNames supplied.

func (*Series) Head ¶

func (s *Series) Head(n int) *Series

Head returns the first n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.

func (*Series) InPlace ¶

func (s *Series) InPlace() *SeriesMutator

InPlace returns a SeriesMutator, which contains most of the same methods as Series but never returns a new Series. If you want to save memory and improve performance and do not need to preserve the original Series, consider using InPlace().

func (*Series) IndexOfLabel ¶

func (s *Series) IndexOfLabel(name string) int

IndexOfLabel returns the index position of the first label level with a name matching name (case-sensitive). If name does not match any container, -1 is returned.

func (*Series) IsNull ¶ added in v0.8.1

func (s *Series) IsNull() *Series

IsNull returns all the rows with null values. Returns a new Series.

func (*Series) Iterator ¶ added in v0.2.0

func (s *Series) Iterator() *SeriesIterator

Iterator returns an iterator which may be used to access the values in each row as map[string]Element.

func (*Series) LabelsAsSeries ¶ added in v0.5.3

func (s *Series) LabelsAsSeries(name string) *Series

LabelsAsSeries finds the first level with matching name and returns as a Series with all existing label levels (including itself). If label level name is default (prefixed with *), removes the prefix. Returns a new Series with shared labels.

func (*Series) Latest ¶

func (s *Series) Latest() time.Time

Latest coerces the Series values to time.Time and calculates the latest timestamp.

func (*Series) Len ¶

func (s *Series) Len() int

Len returns the number of rows in the Series.

func (*Series) ListLabelNames ¶

func (s *Series) ListLabelNames() []string

ListLabelNames returns the name and position of all the label levels in the Series

func (*Series) Lookup ¶

func (s *Series) Lookup(other *Series, options ...JoinOption) (*Series, error)

Lookup performs the lookup portion of a join of other onto df. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Lookup identifies the row alignment between s and other and returns the aligned values. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:

s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). The result of a lookup will be:

FOO BAR bar null baz corge

Returns a new Series.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo").SetLabelNames([]string{"a"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetLabelNames([]string{"a"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	lookup, _ := s.Lookup(s2)
	fmt.Println(lookup)
}

Output:

--original Series--
+---++-----+
| a || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

--Series to lookup--
+----++---+
| a  || 0 |
|----||---|
|  0 || 4 |
| 10 || 5 |
+----++---+

--result--
+---++--------+
| a ||  foo   |
|---||--------|
| 0 ||      4 |
| 1 || (null) |
+---++--------+

Example (WithOptions) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	lookup, _ := s.Lookup(
		s2,
		tada.JoinOptionHow("inner"),
		tada.JoinOptionLeftOn([]string{"a"}),
		tada.JoinOptionRightOn([]string{"b"}),
	)
	fmt.Println(lookup)
}

Output:

--original Series--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| foo | 0 || 1 |
| bar | 1 || 2 |
+-----+---++---+

--Series to lookup--
+----+-----++---+
| a  |  b  || 0 |
|----|-----||---|
|  0 | baz || 4 |
| 10 | bar || 5 |
+----+-----++---+

--result--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| bar | 1 || 5 |
+-----+---++---+

func (*Series) Max ¶

func (s *Series) Max() float64

Max coerces the Series values to float64 and calculates the maximum.

func (*Series) Mean ¶

func (s *Series) Mean() float64

Mean coerces the Series values to float64 and calculates the mean.

func (*Series) Median ¶

func (s *Series) Median() float64

Median coerces the Series values to float64 and calculates the median.

func (*Series) Merge ¶

func (s *Series) Merge(other *Series, options ...JoinOption) (*DataFrame, error)

Merge joins other onto s. Performs a left join unless a different join type is specified as an option. If left and right keys are supplied as options, those are used as lookup keys. Otherwise, the join will automatically use shared label names or return an error if none exist.

Merge identifies the row alignment between s and other and appends aligned values as new columns on s. Rows are aligned when: 1) one or more containers (either column or label level) in other share the same name as one or more containers in s, and 2) the stringified values in the other containers match the values in the s containers. For the following dataframes:

s other FOO BAR FOO QUX bar 0 baz corge baz 1 qux waldo

Row 1 in s is "aligned" with row 0 in other, because those are the rows in which both share the same value ("baz") in a container with the same name ("foo"). After merging, the result will be:

s FOO BAR QUX bar 0 null baz 1 corge

Finally, all container names (either the Series name or label name) are deduplicated after the merge so that they are unique. Returns a new DataFrame.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []int{0, 1}).SetName("foo")
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}).SetName("bar")
	fmt.Println("--Series to merge--")
	fmt.Println(s2)

	fmt.Println("--result--")
	merged, _ := s.Merge(s2)
	fmt.Println(merged)
}

Output:

--original Series--
+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
+---++-----+

--Series to merge--
+----++-----+
| -  || bar |
|----||-----|
|  0 ||   4 |
| 10 ||   5 |
+----++-----+

--result--
+---++-----+--------+
| - || foo |  bar   |
|---||-----|--------|
| 0 ||   1 |      4 |
| 1 ||   2 | (null) |
+---++-----+--------+

Example (WithOptions) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2}, []string{"foo", "bar"}, []int{0, 1}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--original Series--")
	fmt.Println(s)

	s2 := tada.NewSeries([]float64{4, 5}, []int{0, 10}, []string{"baz", "bar"}).SetLabelNames([]string{"a", "b"})
	fmt.Println("--Series to lookup--")
	fmt.Println(s2)

	fmt.Println("--result--")
	merged, _ := s.Merge(s2,
		tada.JoinOptionHow("inner"),
		tada.JoinOptionLeftOn([]string{"a"}),
		tada.JoinOptionRightOn([]string{"b"}),
	)
	fmt.Println(merged)
}

Output:

--original Series--
+-----+---++---+
|  a  | b || 0 |
|-----|---||---|
| foo | 0 || 1 |
| bar | 1 || 2 |
+-----+---++---+

--Series to lookup--
+----+-----++---+
| a  |  b  || 0 |
|----|-----||---|
|  0 | baz || 4 |
| 10 | bar || 5 |
+----+-----++---+

--result--
+-----+---++---+-----+
|  a  | b || 0 | 0_1 |
|-----|---||---|-----|
| bar | 1 || 2 |   5 |
+-----+---++---+-----+

func (*Series) Min ¶

func (s *Series) Min() float64

Min coerces the Series values to float64 and calculates the minimum.

func (*Series) Multiply ¶

func (s *Series) Multiply(other *Series, ignoreNulls bool) *Series

Multiply coerces other and s to float64 values, aligns other with s, and multiplies the values in aligned rows, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) NUnique ¶

func (s *Series) NUnique() int

NUnique counts the number of unique, non-null Series values.

func (*Series) Name ¶

func (s *Series) Name() string

Name returns the name of the Series

func (*Series) NameOfLabel ¶

func (s *Series) NameOfLabel(n int) string

NameOfLabel returns the name of the label level at index position n. If n is out of range, returns "-out of range-"

func (*Series) Percentile ¶

func (s *Series) Percentile() *Series

Percentile coerces the Series values to float64 returns the percentile rank of each value. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it.

func (*Series) PercentileBin ¶ added in v0.4.9

func (s *Series) PercentileBin(bins []float64, config *Binner) (*Series, error)

PercentileBin coerces the Series values to float64 and categorizes each value based on which percentile bin interval it falls within. Uses the "exclusive" definition: a value's percentile is the % of all non-null values in the Series (including itself) that are below it. bins should be a slice of sequential percentile edges (between 0 and 1) that form intervals (left inclusive, right exclusive). NB: left inclusive, right exclusive is the opposite of the interval inclusion rules for the Bin() function. For example, [0, .5, 1] represents the percentile intervals 0-50% (including 0%, excluding 50%) and 50%-100% (including 50%, excluding 100%). If these bins were supplied for a Series with values [1, 1000], the returned Series would have values [0-0.5, 0.5-1], because 1 is in the bottom 50% of values and 1000 is in the top 50% of values. Null values are not categorized. For default behavior, supply nil as config.

To bin values below or above the bin intervals, or to supply custom labels, supply a tada.Binner as config. If custom labels are supplied, the length must be 1 less than the total number of bin edges. Otherwise, bin labels are auto-generated from the bin intervals.

Example ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.PercentileBin([]float64{0, .5, 1}, nil)
	fmt.Println(binned)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
+---++-----+

+---++-------+
| - ||  foo  |
|---||-------|
| 0 || 0-0.5 |
| 1 ||       |
| 2 || 0.5-1 |
| 3 ||       |
+---++-------+

Example (CustomLabels) ¶

package main

import (
	"fmt"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]float64{1, 2, 3, 4}).SetName("foo")
	fmt.Println(s)

	binned, _ := s.PercentileBin([]float64{0, .5, 1}, &tada.Binner{Labels: []string{"Bottom 50%", "Top 50%"}})
	fmt.Println(binned)
}

Output:

+---++-----+
| - || foo |
|---||-----|
| 0 ||   1 |
| 1 ||   2 |
| 2 ||   3 |
| 3 ||   4 |
+---++-----+

+---++------------+
| - ||    foo     |
|---||------------|
| 0 || Bottom 50% |
| 1 ||            |
| 2 ||    Top 50% |
| 3 ||            |
+---++------------+

func (*Series) Range ¶

func (s *Series) Range(first, last int) *Series

Range returns the rows of the Series starting at first and ending immediately prior to last (left-inclusive, right-exclusive). If either first or last is out of range, a Series error is returned. In all cases, returns a new Series.

func (*Series) Rank ¶

func (s *Series) Rank() *Series

Rank coerces the Series values to float64 and returns the rank of each (in ascending order - where 1 is the rank of the lowest value). Rows with the same value share the same rank.

func (*Series) Reduce ¶ added in v0.7.6

func (s *Series) Reduce(lambda ReduceFn) (value interface{}, isNull bool)

Reduce reduces all Series values to a single value and null status using lambda.

func (*Series) Relabel ¶

func (s *Series) Relabel() *Series

Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Returns a new Series.

func (*Series) Resample ¶

func (s *Series) Resample(by Resampler) *Series

Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Returns a new Series.

Example (ByHalfHour) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{
		time.Date(2020, 1, 15, 12, 15, 0, 0, time.UTC),
		time.Date(2020, 1, 15, 12, 45, 0, 0, time.UTC),
	}).SetName("foo")
	fmt.Println(s)

	byHalfHour := tada.Resampler{ByDuration: 30 * time.Minute}
	fmt.Println(s.Resample(byHalfHour))
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:15:00Z |
| 1 || 2020-01-15T12:45:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:00:00Z |
| 1 || 2020-01-15T12:30:00Z |
+---++----------------------+

Example (ByHour) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byHour := tada.Resampler{ByDuration: time.Hour}
	fmt.Println(s.Resample(byHour))
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:00:00Z |
+---++----------------------+

Example (ByMonth) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byMonth := tada.Resampler{ByMonth: true}
	fmt.Println(s.Resample(byMonth))
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-01T00:00:00Z |
+---++----------------------+

Example (ByWeek) ¶

package main

import (
	"fmt"
	"time"

	"github.com/ptiger10/tada"
)

func main() {
	s := tada.NewSeries([]time.Time{time.Date(2020, 1, 15, 12, 30, 0, 0, time.UTC)}).SetName("foo")
	fmt.Println(s)

	byWeek := tada.Resampler{ByWeek: true, StartOfWeek: time.Sunday}
	fmt.Println(s.Resample(byWeek))
}

Output:

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-15T12:30:00Z |
+---++----------------------+

+---++----------------------+
| - ||         foo          |
|---||----------------------|
| 0 || 2020-01-12T00:00:00Z |
+---++----------------------+

func (*Series) RollingDuration ¶

func (s *Series) RollingDuration(d time.Duration) *GroupedSeries

RollingDuration iterates over each row in Series, coerces the values to time.Time, and groups each set of subsequent rows that are within d of the current row.

func (*Series) RollingN ¶

func (s *Series) RollingN(n int) *GroupedSeries

RollingN iterates over each row in Series and groups each set of n subsequent rows after the current row.

func (*Series) SetLabelNames ¶

func (s *Series) SetLabelNames(levelNames []string) *Series

SetLabelNames sets the names of all the label levels in the Series and returns the entire Series. If an error is returned, it is written to the Series.

func (*Series) SetName ¶

func (s *Series) SetName(name string) *Series

SetName modifies the name of a Series in place and returns the original Series.

func (*Series) SetRows ¶ added in v0.7.6

func (s *Series) SetRows(lambda ApplyFn, rows []int) *Series

SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Returns a new Series.

func (*Series) Shift ¶

func (s *Series) Shift(n int) *Series

Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. Returns a new Series.

func (*Series) Shuffle ¶ added in v0.6.11

func (s *Series) Shuffle(seed int64) *Series

Shuffle randomizes the row order of the Series. Returns a new Series.

func (*Series) Sort ¶

func (s *Series) Sort(by ...Sorter) *Series

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. DType is only used for the process of sorting. Once it has been sorted, data retains its original type. Returns a new Series.

func (*Series) StdDev ¶ added in v0.5.3

func (s *Series) StdDev() float64

StdDev coerces the Series values to float64 and calculates the standard deviation.

func (*Series) String ¶

func (s *Series) String() string

func (*Series) Struct ¶ added in v0.6.0

func (s *Series) Struct(structPointer interface{}, options ...WriteOption) error

Struct writes the values of the df containers into structPointer. Returns an error if df does not contain, from left-to-right, the same container names and types as the exported fields that appear, from top-to-bottom, in structPointer. Exported struct fields must be types that are supported by NewDataFrame(). If a "tada" tag is present with the value "isNull", this field must be [][]bool. The null status of each value container in the DataFrame, from left-to-right, will be written into this field in equal-lengthed slices. If df contains additional containers beyond those in structPointer, those are ignored.

func (*Series) Subset ¶

func (s *Series) Subset(index []int) *Series

Subset returns only the rows specified at the index positions, in the order specified. Returns a new Series.

func (*Series) SubsetLabels ¶

func (s *Series) SubsetLabels(index []int) *Series

SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Returns a new Series.

func (*Series) Subtract ¶

func (s *Series) Subtract(other *Series, ignoreNulls bool) *Series

Subtract coerces other and s to float64 values, aligns other with s, and subtracts the aligned values of other from s, using the labels in s as an anchor. If ignoreNulls is true, then missing or null values are treated as 0. Otherwise, if a row in s does not align with any row in other, or if row does align but either value is null, then the resulting value is null.

func (*Series) Sum ¶

func (s *Series) Sum() float64

Sum coerces the Series values float64 and sums them.

func (*Series) SwapLabels ¶

func (s *Series) SwapLabels(i, j string) *Series

SwapLabels swaps the label levels with names i and j. Returns a new Series.

func (*Series) Tail ¶

func (s *Series) Tail(n int) *Series

Tail returns the last n rows of the Series. If n is greater than the length of the Series, returns the entire Series. In either case, returns a new Series.

func (*Series) Type ¶ added in v0.3.8

func (s *Series) Type() reflect.Type

Type returns the slice type of the underlying Series values

func (*Series) Unique ¶

func (s *Series) Unique(includeLabels bool) *Series

Unique returns the first appearance of all non-null values in the Series. If includeLabels is true, a row is considered unique only if its combination of labels and values is unique. Returns a new Series.

func (*Series) ValueCounts ¶

func (s *Series) ValueCounts() map[string]int

ValueCounts counts the number of appearances of each stringified value in the Series.

func (*Series) Where ¶

func (s *Series) Where(filters map[string]FilterFn, ifTrue, ifFalse interface{}) (*Series, error)

Where iterates over the rows in s and evaluates whether each one satisfies filters, which is a map of container names (either the Series name or label name) and tada.FilterFn structs. If yes, returns ifTrue at that row position. If not, returns ifFalse at that row position. Values are coerced from their original type to the selected field type for filtering, but after filtering retains their original type.

Returns an unnamed Series a copy of the labels from the original Series and null status based on the supplied values. If an unsupported value type is supplied as either ifTrue or ifFalse, returns an error.

func (*Series) WithLabels ¶

func (s *Series) WithLabels(name string, input interface{}) *Series

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.

In all cases, returns a new Series.

func (*Series) WithValues ¶

func (s *Series) WithValues(input interface{}) *Series

WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Returns a new Series.

func (*Series) WriteCSV ¶ added in v0.6.0

func (s *Series) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV converts a DataFrame to a csv with rows as the major dimension, and writes the output to w. Null values are replaced with "(null)".

type SeriesIterator ¶ added in v0.2.0

type SeriesIterator struct {
	// contains filtered or unexported fields
}

A SeriesIterator iterates over the rows in a Series.

func (*SeriesIterator) Next ¶ added in v0.2.0

func (iter *SeriesIterator) Next() bool

Next advances to next row. Returns false at end of iteration.

func (*SeriesIterator) Row ¶ added in v0.2.0

func (iter *SeriesIterator) Row() map[string]Element

Row returns the current row in the Series as map[string]Element. The map keys are the names of containers (including label levels). The name of the Series values column is the same as the name of the Series itself. The value in each map is an Element containing an interface value and a boolean denoting if the value is null. If multiple columns have the same header, only the Elements of the left-most column are returned.

type SeriesMutator ¶

type SeriesMutator struct {
	// contains filtered or unexported fields
}

A SeriesMutator is used to change Series values in place.

func (*SeriesMutator) Append ¶

func (s *SeriesMutator) Append(other *Series) error

Append adds the other labels and values as new rows to the Series. If the types of any container do not match, all the values in that container are coerced to string. Returns a new Series.

func (*SeriesMutator) Apply ¶

func (s *SeriesMutator) Apply(lambda ApplyFn) error

Apply applies an anonymous function to every row in a container based on lambda, which is an anonymous function. A row's null status can be changed in-place within the anonymous function. Modifies the underlying Series in place.

func (*SeriesMutator) DropLabels ¶

func (s *SeriesMutator) DropLabels(name string) error

DropLabels removes the first label level matching name. Modifies the underlying Series in place.

func (*SeriesMutator) DropNull ¶

func (s *SeriesMutator) DropNull()

DropNull returns all the rows with non-null values. Modifies the underlying Series.

func (*SeriesMutator) DropRow ¶

func (s *SeriesMutator) DropRow(index int) error

DropRow removes the row at the specified index. Modifies the underlying Series in place.

func (*SeriesMutator) FillNull ¶

func (s *SeriesMutator) FillNull(how NullFiller)

FillNull fills all the null values and makes them not-null. Modifies the underlying Series.

func (*SeriesMutator) Filter ¶

func (s *SeriesMutator) Filter(filters map[string]FilterFn) error

Filter returns a new Series with only rows that satisfy all of the filters, which is a map of container names (either the Series name or label name) and anonymous functions. Filter may be applied to the Series values by supplying either the Series name or an empty string ("") as a key.

Rows with null values never satsify a filter. If no filter is provided, function does nothing. For equality filtering on one or more containers, consider FilterByValue. Modifies the underlying Series in place.

func (*SeriesMutator) FilterByValue ¶ added in v0.3.5

func (s *SeriesMutator) FilterByValue(filters map[string]interface{}) error

FilterByValue returns the rows in the Series satisfying all filters, which is a map of of container names (either the Series name or label name) to interface{} values. A filter is satisfied for a given row value if the stringified value in that container at that row matches the stringified interface{} value. FilterByValue may be applied to the Series values by supplying either the Series name or an empty string ("") as a key. Modifies the underlying Series in place.

func (*SeriesMutator) Relabel ¶

func (s *SeriesMutator) Relabel()

Relabel resets the Series labels to default labels (e.g., []int from 0 to df.Len()-1, with *0 as name). Modifies the underlying Series in place.

func (*SeriesMutator) Resample ¶

func (s *SeriesMutator) Resample(by Resampler)

Resample coerces the Series values to time.Time and truncates them by the logic supplied in tada.Resampler. If slice type is civil.Date or civil.Time before resampling, it will be returned as civil.Date or civil.Time after resampling.

Modifies the underlying Series in place.

func (*SeriesMutator) SetRows ¶ added in v0.7.6

func (s *SeriesMutator) SetRows(lambda ApplyFn, rows []int) error

SetRows applies lambda, an anonymous function, to set the values at the specified row positions. The new values must be the same type as the existing values. Modifies the underlying Series in place.

func (*SeriesMutator) Shift ¶

func (s *SeriesMutator) Shift(n int)

Shift replaces the value in row i with the value in row i - n, or null if that index is out of range. // Modifies the underlying Series.

func (*SeriesMutator) Shuffle ¶ added in v0.6.11

func (s *SeriesMutator) Shuffle(seed int64)

Shuffle randomizes the row order of the Series. Modifies the underlying Series.

func (*SeriesMutator) Sort ¶

func (s *SeriesMutator) Sort(by ...Sorter) error

Sort sorts the values by zero or more Sorter specifications. If no Sorter is supplied, sorts by Series values (as float64) in ascending order. If a Sorter is supplied without a Name or with a name matching the Series name, sorts by Series values. If no DType is supplied in a Sorter, sorts as float64. Modifies the underlying Series in place.

func (*SeriesMutator) Subset ¶

func (s *SeriesMutator) Subset(index []int) error

Subset returns only the rows specified at the index positions, in the order specified. Modifies the underlying Series in place.

func (*SeriesMutator) SubsetLabels ¶

func (s *SeriesMutator) SubsetLabels(index []int) error

SubsetLabels includes only the columns of labels specified at the index positions, in the order specified. Modifies the underlying Series in place.

func (*SeriesMutator) SwapLabels ¶

func (s *SeriesMutator) SwapLabels(i, j string) error

SwapLabels swaps the label levels with names i and j. Modifies the underlying Series in place.

func (*SeriesMutator) WithLabels ¶

func (s *SeriesMutator) WithLabels(name string, input interface{}) error

WithLabels resolves as follows:

If a scalar string is supplied as input and a label level exists that matches name: rename the level to match input. In this case, name must already exist.

If a slice is supplied as input and a label level exists that matches name: replace the values at this level to match input. If a slice is supplied as input and a label level does not exist that matches name: append a new level named name and values matching input. If input is a slice, it must be the same length as the underlying Series.

In all cases, modifies the underlying Series in place.

func (*SeriesMutator) WithValues ¶

func (s *SeriesMutator) WithValues(input interface{}) error

WithValues replaces the Series values with input. input must be a supported slice type of the same length as the original Series. Modifies the underlying Series.

type Sorter ¶

type Sorter struct {
	Name       string
	Descending bool
	DType      DType
}

A Sorter supplies details to the Sort() function. `Name` specifies the container (either label or column name) to sort. If `Descending` is true, values are sorted in descending order. `DType` specifies the data type to which values will be coerced before they are sorted (default: float64). Null values are always sorted to the bottom.

type StructTransposer ¶ added in v0.6.4

type StructTransposer [][]interface{}

A StructTransposer is a row-oriented representation of a DataFrame that can be randomly shuffled or transposed into a column-oriented struct representation of a DataFrame. It is useful for intuitive row-oriented testing.

func (StructTransposer) Shuffle ¶ added in v0.6.5

func (st StructTransposer) Shuffle(seed int64)

Shuffle randomly shuffles the row order in Rows, using a randomizer seeded with seed.

func (StructTransposer) Transpose ¶ added in v0.6.4

func (st StructTransposer) Transpose(structPointer interface{}) error

Transpose reads the values of an untyped, row-oriented struct representation of a DataFrame into a typed, column-oriented struct representation of a DataFrame. If all non-null values in a column have the same type, then the column will be a slice of that type. If any of the non-null values in a column have different types, then the column will be []interface{}. If all values are considered null by tada, then the column will be a slice of the type in the first row (when all values are null and the first row is nil, the column will be []interface{}). If an error is returned, values are still written to structPointer up until the point the error occurred.

type WriteOption ¶ added in v0.4.0

type WriteOption func(*writeConfig)

A WriteOption configures a write function. Available write options: WriteOptionExcludeLabels, WriteOptionDelimiter.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL