po

package module

v0.0.0-...-f67889f Latest Latest Go to latest Published: Sep 9, 2018 License: MIT Imports: 8 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/asidlo/po

Links

Open Source Insights

README ¶

Po - Pandas for Go

Data science library inspired by the python pandas library and named after the Po from Kung Fu Panda. Po allows the user to perform data munging/wrangling techniques by modeling the data as a DataFrame. Much like the dataframe from the Pandas library, the DataFrame construct in Po is comprised of 0..n Series. However, unlike the Pandas implementation, all values in the Po Series are instantiated as strings. The rationale behind this design was to minimize the end user learning curve by mimicing the panda's api as close as possible. Without the aid of overly complicated generics and under the beautifully strict enforcement of types, binding data to strings using go's native slice and map types seemed like the best approach to achieve that emulation. So in accordance with that reasoning, Series in Po are types of slice[string] and DataFrames are types of map[string]string.

Despite this seemingly limited binding, Po still allows users to perform analytical operations on DataFrames and Series. This is achieved by converting the Series in question to other defined types in Po, namely IntSlice and FloatSlice. Series, IntSlice, and FloatSlice all have methods allowing the user to convert from one to the other in order to perform operations on a Series that would apply to the nature of the respective datatype. Future iterations will expand on the these defined types to include types such as boolean, time.Time, map, interface, rune, and possible others.

Install

go get github.com/asidlo/po

Examples

DataFrame Example

s1 := po.Series{"1", "2", "3"}
s2 := po.Series{"8", "7", "6"}
df3 := po.NewDataFrame([]po.Series{s1, s2}, []string{"a", "b", "c"})
x, y = df3.Dims()
fmt.Printf("Normal DataFrame example: (%d x %d)\n%s\n", x, y, df3)

Series Example

s1 := po.Series{
  "This", "Is", "1", "Example", "Of", "A", "Series",
}
fmt.Println(s1)

More examples can be found in the examples directory.

Alternatives

github.com/kniren/gota

Documentation ¶

Overview ¶

Package po implements a Pandas-like library for go. Po provides Series and DataFrame datastructures for data munging and preparation. Inspired by https://github.com/pandas-dev/pandas and https://github.com/kniren/gota

Index ¶

Constants
func IntGenerator(start int, end int, step int, exclude []int) []int
func WriteCsv(w io.Writer, df DataFrame) error
type DataFrame
type Series
- func NewSeries(s ...string) Series

Constants ¶

View Source

const (
	// HeadSize is the default return length of the Head()
	// for both a Series and a DataFrame
	HeadSize = 5

	// RandStringLen is the default length for the randomly generated
	// column names, if and when they are needed.
	RandStringLen = 5
)

Variables ¶

This section is empty.

Functions ¶

func IntGenerator ¶

func IntGenerator(start int, end int, step int, exclude []int) []int

IntGenerator generates a slice of integers that can then be used to apply different functions on DataFrames or Series, such as Subset() or Pick().

func WriteCsv ¶

func WriteCsv(w io.Writer, df DataFrame) error

WriteCsv writes a dataframe to the given io.Writer in csv format.

Types ¶

type DataFrame ¶

type DataFrame map[string]Series

DataFrame is a map datastructure containing Series values. It is intended to represent a generic table where the keys correspond to the individual column names and the rows correspond to the original input series. Column names can be generated on DataFrame instantiation via literal construct, or via the NewDataFrame()

func GenerateDataFrame ¶

func GenerateDataFrame(n int) DataFrame

GenerateDataFrame generates a DataFrame with size n of randomized user profile data. The data is generated using the github.com/Pallinder/go-randomdata package

func NewDataFrame ¶

func NewDataFrame(ss []Series, Columns []string) DataFrame

NewDataFrame returns a new DataFrame object with rows corresponding to provided ss and column names corresponding to provided Columns. If no ss is provided, then an empty dataframe will be created using any provided column names. If the number of Columns provided < len(ss) for any given ss, then the column names will be auto generated for the remaining entries. If the len(Columns) > len(ss) for any given ss, then the ss will be extended with empty string values for each remaining col.

func ReadCsv ¶

func ReadCsv(r io.Reader) (DataFrame, error)

ReadCsv reads in csv data and returns a DataFrame with randomly generated column names for each input column.

func (DataFrame) Columns ¶

func (df DataFrame) Columns() []string

Columns returns all of the column names sorted for maintaining order. The sorting is done using the sort.String() method from the std lib.

func (DataFrame) Copy ¶

func (df DataFrame) Copy() DataFrame

Copy returns a copy of a dataframe.

func (DataFrame) Dims ¶

func (df DataFrame) Dims() (int, int)

Dims returns the number of rows, number of columns in a DataFrame

func (DataFrame) DropColumns ¶

func (df DataFrame) DropColumns(c ...string) DataFrame

DropColumns removes columns from the DataFrame.

func (DataFrame) Head ¶

func (df DataFrame) Head(i ...int) DataFrame

Head returns the first i entries for each column in a DataFrame. If a slice of ints is passed, only the first entry is used. I used the varargs operator to allow for optional entry. In the case where no i is passed, then default to returning the HeadSize or the len DataFrame rows whichever is smaller. If a neg value is passed then the abs value is used.

func (DataFrame) Pick ¶

func (df DataFrame) Pick(i ...int) DataFrame

Pick returns a subset DataFrame comprised only of rows indices specified.

func (DataFrame) Rename ¶

func (df DataFrame) Rename(c map[string]string) DataFrame

Rename renames the columns in the dataframe that correspond to the provided keys in the map parameter.

func (DataFrame) Select ¶

func (df DataFrame) Select(c ...string) DataFrame

Select returns a subset of the original DataFrame with only the given column names n represented.

func (DataFrame) Shape ¶

func (df DataFrame) Shape() (int, int)

Shape returns the number of rows, number of columns in a DataFrame. Same as po.DataFrame.Dims()

func (DataFrame) String ¶

func (df DataFrame) String() string

String returns the string representation of the DataFrame. Columns are ordered via sort.Strings() method. It uses the olekukonko/tablewriter library to render the table.

func (DataFrame) Subset ¶

func (df DataFrame) Subset(start int, end int, step int, exclude []int) DataFrame

Subset returns a subset of the original DataFrame. It grabs entries from each column by their indices, starting from the start int to the end int using the specified step size and excluding any indices specified.

func (DataFrame) Transpose ¶

func (df DataFrame) Transpose() DataFrame

Transpose returns a transposed DataFrame of the original DataFrame. The transposed column names become a string of the former row index.

type Series ¶

type Series []string

Series is a generic datastructure that contains a slice of strings. Strings were chosen as the type of choice since I wanted to make the api simple, easy to use, and as close to the pandas api as possible so that the learning curve would be small. This allows the user to input any type of data they want into a Series, (so long as it is surrounded by ""). There are casting operations that can be performed on a Series to perform different mathmatical operations which require non string types.

func NewSeries ¶

func NewSeries(s ...string) Series

NewSeries is a variadic function that returns a Series comprised of the provided strings

func (Series) Head ¶

func (s Series) Head(i ...int) Series

Head returns the first i entries in a series. If a slice of ints is passed, only the first entry is used. I used the varargs operator to allow for optional entry. In the case where no i is passed, then default to returning the HeadSize or the len(Series) whichever is smaller. If a neg value is passed then the abs value is used.

func (Series) Pick ¶

func (s Series) Pick(i ...int) Series

Pick returns a subset DataFrame comprised only of rows indices specified.

func (Series) String ¶

func (s Series) String() string

String returns the string representation of the Series. It uses the olekukonko/tablewriter library to render the table.

func (Series) Subset ¶

func (s Series) Subset(start int, end int, step int, exclude []int) Series

Subset returns a subset of the original DataFrame. It grabs entries from each column by their indices, starting from the start int to the end int using the specified step size and excluding any indices specified.

Source Files ¶

View all Source files

po.go

Directories ¶

Path	Synopsis
examples
dataframe/accessing
dataframe/create
dataframe/csv
dataframe/describe
dataframe/generator
dataframe/subsets
dataframe/transform
dataframe/view
experimental
series/create
series/generators
series/subsets
series/view

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL