datatable

package module
v0.3.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Apr 25, 2022 License: Apache-2.0 Imports: 14 Imported by: 0

README

datatable

Circle CI Go Report Card GoDoc GitHub stars GitHub license

datasweet-logo

datatable is a Go package to manipulate tabular data, like an excel spreadsheet. datatable is inspired by the pandas python package and the data.frame R structure. Although it's production ready, be aware that we're still working on API improvements

Installation

go get github.com/datasweet/datatable

Features

  • Create custom Series (ie custom columns). Currently available, serie.Int, serie.String, serie.Time, serie.Float64.
  • Apply expressions
  • Selects (head, tail, subset)
  • Sorting
  • InnerJoin, LeftJoin, RightJoin, OuterJoin, Concats
  • Aggregate
  • Import from CSV
  • Export to map, slice
Creating a DataTable
package main

import (
	"fmt"

	"github.com/datasweet/datatable"
)

func main() {
	dt := datatable.New("test")
	dt.AddColumn("champ", datatable.String, datatable.Values("Malzahar", "Xerath", "Teemo"))
	dt.AddColumn("champion", datatable.String, datatable.Expr("upper(`champ`)"))
	dt.AddColumn("win", datatable.Int, datatable.Values(10, 20, 666))
	dt.AddColumn("loose", datatable.Int, datatable.Values(6, 5, 666))
	dt.AddColumn("winRate", datatable.Float64, datatable.Expr("`win` * 100 / (`win` + `loose`)"))
	dt.AddColumn("winRate %", datatable.String, datatable.Expr(" `winRate` ~ \" %\""))
	dt.AddColumn("sum", datatable.Float64, datatable.Expr("sum(`win`)"))

	fmt.Println(dt)
}

/*
CHAMP <NULLSTRING>      CHAMPION <NULLSTRING>   WIN <NULLINT>   LOOSE <NULLINT> WINRATE <NULLFLOAT64>   WINRATE % <NULLSTRING>  SUM <NULLFLOAT64> 
Malzahar                MALZAHAR                10              6               62.5                    62.5 %                  696              
Xerath                  XERATH                  20              5               80                      80 %                    696              
Teemo                   TEEMO                   666             666             50                      50 %                    696    
*/
Reading a CSV and aggregate
package main

import (
	"fmt"
	"log"
	"os"
	"time"

	"github.com/datasweet/datatable"
	"github.com/datasweet/datatable/import/csv"
)

func main() {
	dt, err := csv.Import("csv", "phone_data.csv",
		csv.HasHeader(true),
		csv.AcceptDate("02/01/06 15:04"),
		csv.AcceptDate("2006-01"),
	)
	if err != nil {
		log.Fatalf("reading csv: %v", err)
	}

	dt.Print(os.Stdout, datatable.PrintMaxRows(24))

	dt2, err := dt.Aggregate(datatable.AggregateBy{Type: datatable.Count, Field: "index"})
	if err != nil {
		log.Fatalf("aggregate COUNT('index'): %v", err)
	}
	fmt.Println(dt2)

	groups, err := dt.GroupBy(datatable.GroupBy{
		Name: "year",
		Type: datatable.Int,
		Keyer: func(row datatable.Row) (interface{}, bool) {
			if d, ok := row["date"]; ok {
				if tm, ok := d.(time.Time); ok {
					return tm.Year(), true
				}
			}
			return nil, false
		},
	})
	if err != nil {
		log.Fatalf("GROUP BY 'year': %v", err)
	}
	dt3, err := groups.Aggregate(
		datatable.AggregateBy{Type: datatable.Sum, Field: "duration"},
		datatable.AggregateBy{Type: datatable.CountDistinct, Field: "network"},
	)
	if err != nil {
		log.Fatalf("Aggregate SUM('duration'), COUNT_DISTINCT('network') GROUP BY 'year': %v", err)
	}
	fmt.Println(dt3)
}
Creating a custom serie

To create a custom serie you must provide:

  • a caster function, to cast a generic value to your serie value. The signature must be func(i interface{}) T
  • a comparator, to compare your serie value. The signature must be func(a, b T) int

Example with a NullInt

// IntN is an alis to create the custom Serie to manage IntN
func IntN(v ...interface{}) Serie {
	s, _ := New(NullInt{}, asNullInt, compareNullInt)
	if len(v) > 0 {
		s.Append(v...)
	}
	return s
}

type NullInt struct {
	Int   int
	Valid bool
}

// Interface() to render the current struct as a value.
// If not provided, the serie.All() or serie.Get() wills returns the embedded value
// IE: NullInt{}
func (i NullInt) Interface() interface{} {
	if i.Valid {
		return i.Int
	}
	return nil
}

// asNullInt is our caster function
func asNullInt(i interface{}) NullInt {
	var ni NullInt
	if i == nil {
		return ni
	}

	if v, ok := i.(NullInt); ok {
		return v
	}

	if v, err := cast.ToIntE(i); err == nil {
		ni.Int = v
		ni.Valid = true
	}
	return ni
}

// compareNullInt is our comparator function
// used to sort
func compareNullInt(a, b NullInt) int {
	if !b.Valid {
		if !a.Valid {
			return Eq
		}
		return Gt
	}
	if !a.Valid {
		return Lt
  }
  if a.Int == b.Int {
		return Eq
	}
	if a.Int < b.Int {
		return Lt
	}
	return Gt
}

Who are we ?

We are Datasweet, a french startup providing full service (big) data solutions.

Questions ? problems ? suggestions ?

If you find a bug or want to request a feature, please create a GitHub Issue.

Contributors


Cléo Rebert

License

This software is licensed under the Apache License, version 2 ("ALv2"), quoted below.

Copyright 2017-2020 Datasweet <http://www.datasweet.fr>

Licensed under the Apache License, Version 2.0 (the "License"); you may not
use this file except in compliance with the License. You may obtain a copy of
the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS, WITHOUT
WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the
License for the specific language governing permissions and limitations under
the License.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var (
	ErrOpenFile           = errors.New("open file")
	ErrCantReadHeaders    = errors.New("can't read headers")
	ErrReadingLine        = errors.New("could not read line")
	ErrNilDatas           = errors.New("nil datas")
	ErrWrongNumberOfTypes = errors.New("expected different number of types")
	ErrAddingColumn       = errors.New("could not add column with given type")
)

Errors in import/csv

View Source
var (
	ErrNoGroupBy      = errors.New("no groupby")
	ErrNoGroups       = errors.New("no groups")
	ErrNilDatatable   = errors.New("nil datatable")
	ErrColumnNotFound = errors.New("column not found")
	ErrUnknownAgg     = errors.New("unknown agg")
	ErrCantAddColumn  = errors.New("can't add column")
)

Errors in aggregate.go

View Source
var (
	ErrEmptyName         = errors.New("empty name")
	ErrNilFactory        = errors.New("nil factory")
	ErrTypeAlreadyExists = errors.New("type already exists")
	ErrUnknownColumnType = errors.New("unknown column type")
)

Errors in column.go

View Source
var (
	ErrNilOutputDatatable  = errors.New("nil output datatable")
	ErrNoOutput            = errors.New("no output")
	ErrNilTable            = errors.New("table is nil")
	ErrNotEnoughDatatables = errors.New("not enough datatables")
	ErrNoOnClauses         = errors.New("no on clauses")
	ErrOnClauseIsNil       = errors.New("on clause is nil")
	ErrUnknownMode         = errors.New("unknown mode")
)

Errors in join.go

View Source
var (
	ErrNilColumn           = errors.New("nil column")
	ErrNilColumnName       = errors.New("nil column name")
	ErrNilColumnType       = errors.New("nil column type")
	ErrColumnAlreadyExists = errors.New("column already exists")
	ErrFormulaeSyntax      = errors.New("formulae syntax")
	ErrNilSerie            = errors.New("nil serie")
	ErrCreateSerie         = errors.New("create serie")
)

Errors in mutate_column.go

View Source
var (
	ErrLengthMismatch = errors.New("length mismatch")
	ErrUpdateRow      = errors.New("update row")
)

Errors in mutate_rows.go

View Source
var (
	ErrEvaluateExprSizeMismatch = errors.New("size mismatch")
)

Errors in eval_expr

View Source
var (
	ErrNoTables = errors.New("no tables")
)

Errors in concat.go

Functions

func RegisterColumnType added in v0.3.0

func RegisterColumnType(name ColumnType, serier ColumnSerier) error

RegisterColumnType to extends the known type

Types

type AggregateBy added in v0.3.0

type AggregateBy struct {
	Type  AggregationType
	Field string
	As    string
}

AggregateBy defines the aggregation

type AggregationType added in v0.3.0

type AggregationType uint8

AggregationType defines the avalaible aggregation

const (
	Avg AggregationType = iota
	Count
	CountDistinct
	Cusum
	Max
	Min
	Median
	Stddev
	Sum
	Variance
)

func (AggregationType) String added in v0.3.0

func (a AggregationType) String() string

type Column

type Column interface {
	Name() string
	Type() ColumnType
	UnderlyingType() reflect.Type
	IsVisible() bool
	IsComputed() bool
}

Column describes a column in our datatable

type ColumnOption added in v0.3.0

type ColumnOption func(opts *ColumnOptions)

ColumnOption sets column options

func ColumnHidden added in v0.3.0

func ColumnHidden(v bool) ColumnOption

ColumnHidden sets the visibility

func Expr added in v0.3.0

func Expr(v string) ColumnOption

Expr sets the expr for the column <!> Incompatible with ColumnValues

func TimeFormats added in v0.3.0

func TimeFormats(v ...string) ColumnOption

TimeFormats sets the valid time formats. <!> Only for Time Column

func Values added in v0.3.0

func Values(v ...interface{}) ColumnOption

Values fills the column with the values <!> Incompatible with ColumnExpr

type ColumnOptions added in v0.3.0

type ColumnOptions struct {
	Hidden      bool
	Expr        string
	Values      []interface{}
	TimeFormats []string
}

ColumnOptions describes options to be apply on a column

type ColumnSerier added in v0.3.0

type ColumnSerier func(ColumnOptions) serie.Serie

ColumnSerier to create a serie from column options

type ColumnType added in v0.3.0

type ColumnType string

ColumnType defines the valid column type in datatable

const (
	Bool   ColumnType = "bool"
	String ColumnType = "string"
	Int    ColumnType = "int"
	// Int8     ColumnType = "int8"
	// Int16    ColumnType = "int16"
	Int32 ColumnType = "int32"
	Int64 ColumnType = "int64"
	// Uint  ColumnType = "uint"
	// Uint8     ColumnType = "uint8"
	// Uint16    ColumnType = "uint16"
	// Uint32    ColumnType = "uint32"
	// Uint64    ColumnType = "uint64"
	Float32 ColumnType = "float32"
	Float64 ColumnType = "float64"
	Time    ColumnType = "time"
	Raw     ColumnType = "raw"
)

func ColumnTypes added in v0.3.0

func ColumnTypes() []ColumnType

ColumnTypes to list all column type

type DataTable

type DataTable struct {
	// contains filtered or unexported fields
}

DataTable is our main struct

func Concat

func Concat(tables []*DataTable) (*DataTable, error)

Concat datatables

func InnerJoin

func InnerJoin(tables []*DataTable, on []JoinOn) (*DataTable, error)

InnerJoin selects records that have matching values in both tables. tables[0] is used as reference datatable.

func LeftJoin

func LeftJoin(tables []*DataTable, on []JoinOn) (*DataTable, error)

LeftJoin the tables. tables[0] is used as reference datatable.

func New

func New(name string) *DataTable

New creates a new datatable

func OuterJoin

func OuterJoin(tables []*DataTable, on []JoinOn) (*DataTable, error)

OuterJoin the tables. tables[0] is used as reference datatable.

func RightJoin

func RightJoin(tables []*DataTable, on []JoinOn) (*DataTable, error)

RightJoin the tables. tables[0] is used as reference datatable.

func (*DataTable) AddColumn

func (t *DataTable) AddColumn(name string, ctyp ColumnType, opt ...ColumnOption) error

AddColumn to datatable with a serie of T

func (*DataTable) Aggregate added in v0.3.0

func (dt *DataTable) Aggregate(by ...AggregateBy) (*DataTable, error)

Aggregate aggregates some field

func (*DataTable) Append

func (t *DataTable) Append(row ...Row)

Append rows to the table

func (*DataTable) AppendRow

func (t *DataTable) AppendRow(v ...interface{}) error

AppendRow creates a new row and append cells to this row

func (*DataTable) Column

func (t *DataTable) Column(name string) Column

Column gets the column with name returns nil if not found

func (*DataTable) ColumnIndex

func (t *DataTable) ColumnIndex(name string) int

ColumnIndex gets the index of the column with name returns -1 if not found

func (*DataTable) Columns

func (t *DataTable) Columns() []string

Columns returns the visible column names in datatable

func (*DataTable) Concat

func (left *DataTable) Concat(table ...*DataTable) (*DataTable, error)

Concat datatables

func (*DataTable) Copy

func (t *DataTable) Copy() *DataTable

Copy the datatable

func (*DataTable) EmptyCopy

func (t *DataTable) EmptyCopy() *DataTable

EmptyCopy copies the structure of datatable (no values)

func (*DataTable) GroupBy added in v0.3.0

func (dt *DataTable) GroupBy(by ...GroupBy) (*Groups, error)

GroupBy splits our datatable by group

func (*DataTable) Grow

func (t *DataTable) Grow(size int)

Grow the table by size

func (*DataTable) Head

func (t *DataTable) Head(size int) *DataTable

Head selects {size} first rows

func (*DataTable) HiddenColumns

func (t *DataTable) HiddenColumns() []string

HiddenColumns returns the hidden column names in datatable

func (*DataTable) HideAll added in v0.3.0

func (t *DataTable) HideAll()

HideAll to hides all column a hidden column will not be exported

func (*DataTable) HideColumn

func (t *DataTable) HideColumn(name string)

HideColumn hides a column a hidden column will not be exported

func (*DataTable) InnerJoin

func (left *DataTable) InnerJoin(right *DataTable, on []JoinOn) (*DataTable, error)

InnerJoin selects records that have matching values in both tables. left datatable is used as reference datatable. <!> InnerJoin transforms an expr column to a raw column

func (*DataTable) LeftJoin

func (left *DataTable) LeftJoin(right *DataTable, on []JoinOn) (*DataTable, error)

LeftJoin returns all records from the left table (table1), and the matched records from the right table (table2). The result is NULL from the right side, if there is no match. <!> LeftJoin transforms an expr column to a raw column

func (*DataTable) Name

func (t *DataTable) Name() string

Name returns the datatable's name

func (*DataTable) NewRow

func (t *DataTable) NewRow() Row

NewRow create a new row

func (*DataTable) NumCols

func (t *DataTable) NumCols() int

NumCols returns the number of visible columns in datatable

func (*DataTable) NumRows

func (t *DataTable) NumRows() int

NumRows returns the number of rows in datatable

func (*DataTable) OuterJoin

func (left *DataTable) OuterJoin(right *DataTable, on []JoinOn) (*DataTable, error)

OuterJoin returns all records when there is a match in either left or right table <!> OuterJoin transforms an expr column to a raw column

func (*DataTable) Print

func (t *DataTable) Print(writer io.Writer, opt ...PrintOption)

Print the tables with options

func (*DataTable) Records

func (t *DataTable) Records() [][]string

Records returns the rows in datatable as string Computes all expressions.

func (*DataTable) Rename

func (t *DataTable) Rename(name string)

Rename the datatable

func (*DataTable) RenameColumn

func (t *DataTable) RenameColumn(old, name string) error

RenameColumn to rename a column

func (*DataTable) RightJoin

func (left *DataTable) RightJoin(right *DataTable, on []JoinOn) (*DataTable, error)

RightJoin returns all records from the right table (table2), and the matched records from the left table (table1). The result is NULL from the left side, when there is no match. <!> RightJoin transforms an expr column to a raw column

func (*DataTable) Row

func (t *DataTable) Row(at int, opt ...ExportOption) Row

Row gets the row at index

func (*DataTable) Rows

func (t *DataTable) Rows(opt ...ExportOption) []Row

Rows returns the rows in datatable Computes all expressions.

func (*DataTable) ShowAll added in v0.3.0

func (t *DataTable) ShowAll()

ShowAll to show all column a shown column will be exported

func (*DataTable) ShowColumn

func (t *DataTable) ShowColumn(name string)

ShowColumn shows a column a shown column will be exported

func (*DataTable) Sort

func (t *DataTable) Sort(by ...SortBy) *DataTable

Sort the table

func (*DataTable) String

func (t *DataTable) String() string

func (*DataTable) Subset

func (t *DataTable) Subset(at, size int) *DataTable

Subset selects rows at index with size

func (*DataTable) SwapColumn

func (t *DataTable) SwapColumn(a, b string) error

SwapColumn to swap 2 columns

func (*DataTable) SwapRow

func (t *DataTable) SwapRow(i, j int)

SwapRow in table

func (*DataTable) Tail

func (t *DataTable) Tail(size int) *DataTable

Tail selects {size} last rows

func (*DataTable) ToMap

func (t *DataTable) ToMap(opt ...ExportOption) []map[string]interface{}

ToMap to export the datatable to a json-like struct

func (*DataTable) ToSchema

func (t *DataTable) ToSchema(opt ...ExportOption) *Schema

ToSchema to export the datatable to a schema struct

func (*DataTable) ToTable

func (t *DataTable) ToTable(opt ...ExportOption) [][]interface{}

ToTable to export the datatable to a csv-like struct

func (*DataTable) Update

func (t *DataTable) Update(at int, row Row) error

Update the row at index

func (*DataTable) Where added in v0.3.0

func (t *DataTable) Where(predicate func(row Row) bool) *DataTable

Where filters the datatable based on a predicate

type ExportOption added in v0.3.1

type ExportOption func(*ExportOptions)

func ExportHidden added in v0.3.1

func ExportHidden(v bool) ExportOption

ExportHidden to show a column when exporting (default false)

type ExportOptions added in v0.3.1

type ExportOptions struct {
	WithHiddenCols bool
}

ExportOptions to add options for exporting (like showing hidden columns)

type GroupBy added in v0.3.0

type GroupBy struct {
	Name  string
	Type  ColumnType
	Keyer func(row Row) (interface{}, bool)
}

GroupBy defines the group by configuration Name is the name of the output column Type is the type of the output column Keyer is our main function to aggregate

type Groups added in v0.3.0

type Groups struct {
	// contains filtered or unexported fields
}

Groups

func (*Groups) Aggregate added in v0.3.0

func (g *Groups) Aggregate(aggs ...AggregateBy) (*DataTable, error)

Aggregate our groups

type JoinOn

type JoinOn struct {
	Table string
	Field string
}

func On

func On(fields ...string) []JoinOn

On creates a "join on" expression ie, as SQL, SELECT * FROM A INNER JOIN B ON B.id = A.user_id Syntax: "[table].[field]", "field"

func Using

func Using(fields ...string) []JoinOn

Using creates a "join using" expression ie, as SQL, SELECT * FROM A INNER JOIN B USING 'field'

type PrintOption

type PrintOption func(opts *PrintOptions)

func PrintColumnName

func PrintColumnName(v bool) PrintOption

func PrintColumnType

func PrintColumnType(v bool) PrintOption

func PrintMaxRows

func PrintMaxRows(v int) PrintOption

func PrintRowNumber

func PrintRowNumber(v bool) PrintOption

type PrintOptions

type PrintOptions struct {
	ColumnName bool
	ColumnType bool
	RowNumber  bool
	MaxRows    int
}

PrintOptions to control the printer

type Row

type Row map[string]interface{}

Row contains a row relative to columns

func (Row) Get

func (r Row) Get(k string) interface{}

Get cell

func (Row) Hash

func (r Row) Hash() uint64

Hash computes the hash code from this datarow can be used to filter the table (distinct rows)

func (Row) Set

func (r Row) Set(k string, v interface{}) Row

Set cell

type Schema

type Schema struct {
	Name    string          `json:"name"`
	Columns []SchemaColumn  `json:"cols"`
	Rows    [][]interface{} `json:"rows"`
}

Schema describes a datatable

type SchemaColumn

type SchemaColumn struct {
	Name string `json:"name"`
	Type string `json:"type"`
}

type SortBy added in v0.3.0

type SortBy struct {
	Column string
	Desc   bool
	// contains filtered or unexported fields
}

SortBy defines a sort to be applied

Directories

Path Synopsis
import
csv

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL