gota: github.com/kniren/gota/dataframe Index | Examples | Files

package dataframe

import "github.com/kniren/gota/dataframe"

Package dataframe provides an implementation of data frames and methods to subset, join, mutate, set, arrange, summarize, etc.

Index

Examples

Package Files

dataframe.go

type DataFrame Uses

type DataFrame struct {
    Err error
    // contains filtered or unexported fields
}

DataFrame is a data structure designed for operating on table like data (Such as Excel, CSV files, SQL table results...) where every column have to keep type integrity. As a general rule of thumb, variables are stored on columns where every row of a DataFrame represents an observation for each variable.

On the real world, data is very messy and sometimes there are non measurements or missing data. For this reason, DataFrame has support for NaN elements and allows the most common data cleaning and mungling operations such as subsetting, filtering, type transformations, etc. In addition to this, this library provides the necessary functions to concatenate DataFrames (By rows or columns), different Join operations (Inner, Outer, Left, Right, Cross) and the ability to read and write from different formats (CSV/JSON).

func LoadMaps Uses

func LoadMaps(maps []map[string]interface{}, options ...LoadOption) DataFrame

LoadMaps creates a new DataFrame based on the given maps. This function assumes that every map on the array represents a row of observations.

Code:

df := dataframe.LoadMaps(
    []map[string]interface{}{
        map[string]interface{}{
            "A": "a",
            "B": 1,
            "C": true,
            "D": 0,
        },
        map[string]interface{}{
            "A": "b",
            "B": 2,
            "C": true,
            "D": 0.5,
        },
    },
)
fmt.Println(df)

func LoadMatrix Uses

func LoadMatrix(mat Matrix) DataFrame

LoadMatrix loads the given Matrix as a DataFrame TODO: Add Loadoptions

func LoadRecords Uses

func LoadRecords(records [][]string, options ...LoadOption) DataFrame

LoadRecords creates a new DataFrame based on the given records.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
fmt.Println(df)

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
    dataframe.DetectTypes(false),
    dataframe.DefaultType(series.Float),
    dataframe.WithTypes(map[string]series.Type{
        "A": series.String,
        "D": series.Bool,
    }),
)
fmt.Println(df)

func LoadStructs Uses

func LoadStructs(i interface{}, options ...LoadOption) DataFrame

LoadStructs creates a new DataFrame from arbitrary struct slices.

LoadStructs will ignore unexported fields inside an struct. Note also that unless otherwise specified the column names will correspond with the name of the field.

You can configure each field with the `dataframe:"name[,type]"` struct tag. If the name on the tag is the empty string `""` the field name will be used instead. If the name is `"-"` the field will be ignored.

Examples:

// field will be ignored
field int

// Field will be ignored
Field int `dataframe:"-"`

// Field will be parsed with column name Field and type int
Field int

// Field will be parsed with column name `field_column` and type int.
Field int `dataframe:"field_column"`

// Field will be parsed with column name `field` and type string.
Field int `dataframe:"field,string"`

// Field will be parsed with column name `Field` and type string.
Field int `dataframe:",string"`

If the struct tags and the given LoadOptions contradict each other, the later will have preference over the former.

Code:

type User struct {
    Name     string
    Age      int
    Accuracy float64
}
users := []User{
    User{"Aram", 17, 0.2},
    User{"Juan", 18, 0.8},
    User{"Ana", 22, 0.5},
}
df := dataframe.LoadStructs(users)
fmt.Println(df)

func New Uses

func New(se ...series.Series) DataFrame

New is the generic DataFrame constructor

Code:

df := dataframe.New(
    series.New([]string{"b", "a"}, series.String, "COL.1"),
    series.New([]int{1, 2}, series.Int, "COL.2"),
    series.New([]float64{3.0, 4.0}, series.Float, "COL.3"),
)
fmt.Println(df)

func ReadCSV Uses

func ReadCSV(r io.Reader, options ...LoadOption) DataFrame

ReadCSV reads a CSV file from a io.Reader and builds a DataFrame with the resulting records.

Code:

csvStr := `
Country,Date,Age,Amount,Id
"United States",2012-02-01,50,112.1,01234
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,17,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United Kingdom",2012-02-01,NA,18.2,12345
"United States",2012-02-01,32,321.31,54320
"United States",2012-02-01,32,321.31,54320
Spain,2012-02-01,66,555.42,00241
`
df := dataframe.ReadCSV(strings.NewReader(csvStr))
fmt.Println(df)

func ReadJSON Uses

func ReadJSON(r io.Reader, options ...LoadOption) DataFrame

ReadJSON reads a JSON array from a io.Reader and builds a DataFrame with the resulting records.

Code:

jsonStr := `[{"COL.2":1,"COL.3":3},{"COL.1":5,"COL.2":2,"COL.3":2},{"COL.1":6,"COL.2":3,"COL.3":1}]`
df := dataframe.ReadJSON(strings.NewReader(jsonStr))
fmt.Println(df)

func (DataFrame) Arrange Uses

func (df DataFrame) Arrange(order ...Order) DataFrame

Arrange sort the rows of a DataFrame according to the given Order

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"b", "4", "6.0", "true"},
        []string{"c", "3", "6.0", "false"},
        []string{"a", "2", "7.1", "false"},
    },
)
sorted := df.Arrange(
    dataframe.Sort("A"),
    dataframe.RevSort("B"),
)
fmt.Println(sorted)

func (DataFrame) CBind Uses

func (df DataFrame) CBind(dfb DataFrame) DataFrame

CBind combines the columns of this DataFrame and dfb DataFrame.

func (DataFrame) Capply Uses

func (df DataFrame) Capply(f func(series.Series) series.Series) DataFrame

Capply applies the given function to the columns of a DataFrame

func (DataFrame) Col Uses

func (df DataFrame) Col(colname string) series.Series

Col returns a copy of the Series with the given column name contained in the DataFrame.

func (DataFrame) Copy Uses

func (df DataFrame) Copy() DataFrame

Copy returns a copy of the DataFrame

func (DataFrame) CrossJoin Uses

func (df DataFrame) CrossJoin(b DataFrame) DataFrame

CrossJoin returns a DataFrame containing the cross join of two DataFrames.

func (DataFrame) Describe Uses

func (df DataFrame) Describe() DataFrame

Describe prints the summary statistics for each column of the dataframe

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"b", "4", "6.0", "true"},
        []string{"c", "3", "6.0", "false"},
        []string{"a", "2", "7.1", "false"},
    },
)
fmt.Println(df.Describe())

func (DataFrame) Dims Uses

func (df DataFrame) Dims() (int, int)

Dims retrieves the dimensions of a DataFrame.

func (DataFrame) Drop Uses

func (df DataFrame) Drop(indexes SelectIndexes) DataFrame

Drop the given DataFrame columns

func (DataFrame) Elem Uses

func (df DataFrame) Elem(r, c int) series.Element

Elem returns the element on row `r` and column `c`. Will panic if the index is out of bounds.

func (DataFrame) Filter Uses

func (df DataFrame) Filter(filters ...F) DataFrame

Filter will filter the rows of a DataFrame based on the given filters. All filters on the argument of a Filter call are aggregated as an OR operation whereas if we chain Filter calls, every filter will act as an AND operation with regards to the rest.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
fil := df.Filter(
    dataframe.F{
        Colname:    "A",
        Comparator: series.Eq,
        Comparando: "a",
    },
    dataframe.F{
        Colname:    "B",
        Comparator: series.Greater,
        Comparando: 4,
    },
)
fil2 := fil.Filter(
    dataframe.F{
        Colname:    "D",
        Comparator: series.Eq,
        Comparando: true,
    },
)
fmt.Println(fil)
fmt.Println(fil2)

func (DataFrame) InnerJoin Uses

func (df DataFrame) InnerJoin(b DataFrame, keys ...string) DataFrame

InnerJoin returns a DataFrame containing the inner join of two DataFrames.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
df2 := dataframe.LoadRecords(
    [][]string{
        []string{"A", "F", "D"},
        []string{"1", "1", "true"},
        []string{"4", "2", "false"},
        []string{"2", "8", "false"},
        []string{"5", "9", "false"},
    },
)
join := df.InnerJoin(df2, "D")
fmt.Println(join)

func (DataFrame) LeftJoin Uses

func (df DataFrame) LeftJoin(b DataFrame, keys ...string) DataFrame

LeftJoin returns a DataFrame containing the left join of two DataFrames.

func (DataFrame) Maps Uses

func (df DataFrame) Maps() []map[string]interface{}

Maps return the array of maps representation of a DataFrame.

func (DataFrame) Mutate Uses

func (df DataFrame) Mutate(s series.Series) DataFrame

Mutate changes a column of the DataFrame with the given Series or adds it as a new column if the column name does not exist.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
// Change column C with a new one
mut := df.Mutate(
    series.New([]string{"a", "b", "c", "d"}, series.String, "C"),
)
// Add a new column E
mut2 := df.Mutate(
    series.New([]string{"a", "b", "c", "d"}, series.String, "E"),
)
fmt.Println(mut)
fmt.Println(mut2)

func (DataFrame) Names Uses

func (df DataFrame) Names() []string

Names returns the name of the columns on a DataFrame.

func (DataFrame) Ncol Uses

func (df DataFrame) Ncol() int

Ncol returns the number of columns on a DataFrame.

func (DataFrame) Nrow Uses

func (df DataFrame) Nrow() int

Nrow returns the number of rows on a DataFrame.

func (DataFrame) OuterJoin Uses

func (df DataFrame) OuterJoin(b DataFrame, keys ...string) DataFrame

OuterJoin returns a DataFrame containing the outer join of two DataFrames.

func (DataFrame) RBind Uses

func (df DataFrame) RBind(dfb DataFrame) DataFrame

RBind matches the column names of two DataFrames and returns combined rows from both of them.

func (DataFrame) Rapply Uses

func (df DataFrame) Rapply(f func(series.Series) series.Series) DataFrame

Rapply applies the given function to the rows of a DataFrame. Prior to applying the function the elements of each row are cast to a Series of a specific type. In order of priority: String -> Float -> Int -> Bool. This casting also takes place after the function application to equalize the type of the columns.

func (DataFrame) Records Uses

func (df DataFrame) Records() [][]string

Records return the string record representation of a DataFrame.

func (DataFrame) Rename Uses

func (df DataFrame) Rename(newname, oldname string) DataFrame

Rename changes the name of one of the columns of a DataFrame

func (DataFrame) RightJoin Uses

func (df DataFrame) RightJoin(b DataFrame, keys ...string) DataFrame

RightJoin returns a DataFrame containing the right join of two DataFrames.

func (DataFrame) Select Uses

func (df DataFrame) Select(indexes SelectIndexes) DataFrame

Select the given DataFrame columns

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
sel1 := df.Select([]int{0, 2})
sel2 := df.Select([]string{"A", "C"})
fmt.Println(sel1)
fmt.Println(sel2)

func (DataFrame) Set Uses

func (df DataFrame) Set(indexes series.Indexes, newvalues DataFrame) DataFrame

Set will update the values of a DataFrame for all rows selected via indexes.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
df2 := df.Set(
    series.Ints([]int{0, 2}),
    dataframe.LoadRecords(
        [][]string{
            []string{"A", "B", "C", "D"},
            []string{"b", "4", "6.0", "true"},
            []string{"c", "3", "6.0", "false"},
        },
    ),
)
fmt.Println(df2)

func (DataFrame) SetNames Uses

func (df DataFrame) SetNames(colnames ...string) error

SetNames changes the column names of a DataFrame to the ones passed as an argument.

func (DataFrame) String Uses

func (df DataFrame) String() (str string)

String implements the Stringer interface for DataFrame

func (DataFrame) Subset Uses

func (df DataFrame) Subset(indexes series.Indexes) DataFrame

Subset returns a subset of the rows of the original DataFrame based on the Series subsetting indexes.

Code:

df := dataframe.LoadRecords(
    [][]string{
        []string{"A", "B", "C", "D"},
        []string{"a", "4", "5.1", "true"},
        []string{"k", "5", "7.0", "true"},
        []string{"k", "4", "6.0", "true"},
        []string{"a", "2", "7.1", "false"},
    },
)
sub := df.Subset([]int{0, 2})
fmt.Println(sub)

func (DataFrame) Types Uses

func (df DataFrame) Types() []series.Type

Types returns the types of the columns on a DataFrame.

func (DataFrame) WriteCSV Uses

func (df DataFrame) WriteCSV(w io.Writer, options ...WriteOption) error

WriteCSV writes the DataFrame to the given io.Writer as a CSV file.

func (DataFrame) WriteJSON Uses

func (df DataFrame) WriteJSON(w io.Writer) error

WriteJSON writes the DataFrame to the given io.Writer as a JSON array.

type F Uses

type F struct {
    Colname    string
    Comparator series.Comparator
    Comparando interface{}
}

F is the filtering structure

type LoadOption Uses

type LoadOption func(*loadOptions)

LoadOption is the type used to configure the load of elements

func DefaultType Uses

func DefaultType(t series.Type) LoadOption

DefaultType sets the defaultType option for loadOptions.

func DetectTypes Uses

func DetectTypes(b bool) LoadOption

DetectTypes sets the detectTypes option for loadOptions.

func HasHeader Uses

func HasHeader(b bool) LoadOption

HasHeader sets the hasHeader option for loadOptions.

func NaNValues Uses

func NaNValues(nanValues []string) LoadOption

NaNValues sets the nanValues option for loadOptions.

func Names Uses

func Names(names ...string) LoadOption

Names sets the names option for loadOptions.

func WithComments Uses

func WithComments(b rune) LoadOption

WithComments sets the csv comment line detect to remove lines

func WithDelimiter Uses

func WithDelimiter(b rune) LoadOption

WithDelimiter sets the csv delimiter other than ',', for example '\t'

func WithTypes Uses

func WithTypes(coltypes map[string]series.Type) LoadOption

WithTypes sets the types option for loadOptions.

type Matrix Uses

type Matrix interface {
    Dims() (r, c int)
    At(i, j int) float64
}

Matrix is an interface which is compatible with gonum's mat.Matrix interface

type Order Uses

type Order struct {
    Colname string
    Reverse bool
}

Order is the ordering structure

func RevSort Uses

func RevSort(colname string) Order

RevSort return an ordering structure for reverse column sorting.

func Sort Uses

func Sort(colname string) Order

Sort return an ordering structure for regular column sorting sort.

type SelectIndexes Uses

type SelectIndexes interface{}

SelectIndexes are the supported indexes used for the DataFrame.Select method. Currently supported are:

int              // Matches the given index number
[]int            // Matches all given index numbers
[]bool           // Matches all columns marked as true
string           // Matches the column with the matching column name
[]string         // Matches all columns with the matching column names
Series [Int]     // Same as []int
Series [Bool]    // Same as []bool
Series [String]  // Same as []string

type WriteOption Uses

type WriteOption func(*writeOptions)

WriteOption is the type used to configure the writing of elements

func WriteHeader Uses

func WriteHeader(b bool) WriteOption

WriteHeader sets the writeHeader option for writeOptions.

Package dataframe imports 10 packages (graph) and is imported by 9 packages. Updated 2020-02-28. Refresh now. Tools for package owners.