po

package module
v0.0.0-...-f67889f Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 9, 2018 License: MIT Imports: 8 Imported by: 0

README

Po - Pandas for Go

license godoc go report

Data science library inspired by the python pandas library and named after the Po from Kung Fu Panda. Po allows the user to perform data munging/wrangling techniques by modeling the data as a DataFrame. Much like the dataframe from the Pandas library, the DataFrame construct in Po is comprised of 0..n Series. However, unlike the Pandas implementation, all values in the Po Series are instantiated as strings. The rationale behind this design was to minimize the end user learning curve by mimicing the panda's api as close as possible. Without the aid of overly complicated generics and under the beautifully strict enforcement of types, binding data to strings using go's native slice and map types seemed like the best approach to achieve that emulation. So in accordance with that reasoning, Series in Po are types of slice[string] and DataFrames are types of map[string]string.

Despite this seemingly limited binding, Po still allows users to perform analytical operations on DataFrames and Series. This is achieved by converting the Series in question to other defined types in Po, namely IntSlice and FloatSlice. Series, IntSlice, and FloatSlice all have methods allowing the user to convert from one to the other in order to perform operations on a Series that would apply to the nature of the respective datatype. Future iterations will expand on the these defined types to include types such as boolean, time.Time, map, interface, rune, and possible others.

Install

go get github.com/asidlo/po

Examples

DataFrame Example

s1 := po.Series{"1", "2", "3"}
s2 := po.Series{"8", "7", "6"}
df3 := po.NewDataFrame([]po.Series{s1, s2}, []string{"a", "b", "c"})
x, y = df3.Dims()
fmt.Printf("Normal DataFrame example: (%d x %d)\n%s\n", x, y, df3)

Series Example

s1 := po.Series{
  "This", "Is", "1", "Example", "Of", "A", "Series",
}
fmt.Println(s1)

More examples can be found in the examples directory.

Alternatives

Documentation

Overview

Package po implements a Pandas-like library for go. Po provides Series and DataFrame datastructures for data munging and preparation. Inspired by https://github.com/pandas-dev/pandas and https://github.com/kniren/gota

Index

Constants

View Source
const (
	// HeadSize is the default return length of the Head()
	// for both a Series and a DataFrame
	HeadSize = 5

	// RandStringLen is the default length for the randomly generated
	// column names, if and when they are needed.
	RandStringLen = 5
)

Variables

This section is empty.

Functions

func IntGenerator

func IntGenerator(start int, end int, step int, exclude []int) []int

IntGenerator generates a slice of integers that can then be used to apply different functions on DataFrames or Series, such as Subset() or Pick().

func WriteCsv

func WriteCsv(w io.Writer, df DataFrame) error

WriteCsv writes a dataframe to the given io.Writer in csv format.

Types

type DataFrame

type DataFrame map[string]Series

DataFrame is a map datastructure containing Series values. It is intended to represent a generic table where the keys correspond to the individual column names and the rows correspond to the original input series. Column names can be generated on DataFrame instantiation via literal construct, or via the NewDataFrame()

func GenerateDataFrame

func GenerateDataFrame(n int) DataFrame

GenerateDataFrame generates a DataFrame with size n of randomized user profile data. The data is generated using the github.com/Pallinder/go-randomdata package

func NewDataFrame

func NewDataFrame(ss []Series, Columns []string) DataFrame

NewDataFrame returns a new DataFrame object with rows corresponding to provided ss and column names corresponding to provided Columns. If no ss is provided, then an empty dataframe will be created using any provided column names. If the number of Columns provided < len(ss) for any given ss, then the column names will be auto generated for the remaining entries. If the len(Columns) > len(ss) for any given ss, then the ss will be extended with empty string values for each remaining col.

func ReadCsv

func ReadCsv(r io.Reader) (DataFrame, error)

ReadCsv reads in csv data and returns a DataFrame with randomly generated column names for each input column.

func (DataFrame) Columns

func (df DataFrame) Columns() []string

Columns returns all of the column names sorted for maintaining order. The sorting is done using the sort.String() method from the std lib.

func (DataFrame) Copy

func (df DataFrame) Copy() DataFrame

Copy returns a copy of a dataframe.

func (DataFrame) Dims

func (df DataFrame) Dims() (int, int)

Dims returns the number of rows, number of columns in a DataFrame

func (DataFrame) DropColumns

func (df DataFrame) DropColumns(c ...string) DataFrame

DropColumns removes columns from the DataFrame.

func (DataFrame) Head

func (df DataFrame) Head(i ...int) DataFrame

Head returns the first i entries for each column in a DataFrame. If a slice of ints is passed, only the first entry is used. I used the varargs operator to allow for optional entry. In the case where no i is passed, then default to returning the HeadSize or the len DataFrame rows whichever is smaller. If a neg value is passed then the abs value is used.

func (DataFrame) Pick

func (df DataFrame) Pick(i ...int) DataFrame

Pick returns a subset DataFrame comprised only of rows indices specified.

func (DataFrame) Rename

func (df DataFrame) Rename(c map[string]string) DataFrame

Rename renames the columns in the dataframe that correspond to the provided keys in the map parameter.

func (DataFrame) Select

func (df DataFrame) Select(c ...string) DataFrame

Select returns a subset of the original DataFrame with only the given column names n represented.

func (DataFrame) Shape

func (df DataFrame) Shape() (int, int)

Shape returns the number of rows, number of columns in a DataFrame. Same as po.DataFrame.Dims()

func (DataFrame) String

func (df DataFrame) String() string

String returns the string representation of the DataFrame. Columns are ordered via sort.Strings() method. It uses the olekukonko/tablewriter library to render the table.

func (DataFrame) Subset

func (df DataFrame) Subset(start int, end int, step int, exclude []int) DataFrame

Subset returns a subset of the original DataFrame. It grabs entries from each column by their indices, starting from the start int to the end int using the specified step size and excluding any indices specified.

func (DataFrame) Transpose

func (df DataFrame) Transpose() DataFrame

Transpose returns a transposed DataFrame of the original DataFrame. The transposed column names become a string of the former row index.

type Series

type Series []string

Series is a generic datastructure that contains a slice of strings. Strings were chosen as the type of choice since I wanted to make the api simple, easy to use, and as close to the pandas api as possible so that the learning curve would be small. This allows the user to input any type of data they want into a Series, (so long as it is surrounded by ""). There are casting operations that can be performed on a Series to perform different mathmatical operations which require non string types.

func NewSeries

func NewSeries(s ...string) Series

NewSeries is a variadic function that returns a Series comprised of the provided strings

func (Series) Head

func (s Series) Head(i ...int) Series

Head returns the first i entries in a series. If a slice of ints is passed, only the first entry is used. I used the varargs operator to allow for optional entry. In the case where no i is passed, then default to returning the HeadSize or the len(Series) whichever is smaller. If a neg value is passed then the abs value is used.

func (Series) Pick

func (s Series) Pick(i ...int) Series

Pick returns a subset DataFrame comprised only of rows indices specified.

func (Series) String

func (s Series) String() string

String returns the string representation of the Series. It uses the olekukonko/tablewriter library to render the table.

func (Series) Subset

func (s Series) Subset(start int, end int, step int, exclude []int) Series

Subset returns a subset of the original DataFrame. It grabs entries from each column by their indices, starting from the start int to the end int using the specified step size and excluding any indices specified.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL