jsonbase

package module
v0.0.0-...-2f13916 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Aug 9, 2021 License: MIT Imports: 15 Imported by: 0

README

jsonbase

todo - needs linting

minimalistic declarative data manipulation library developed 27th Sep onwards. Primary reason for writing this was to learn more about data engineering & data science. You might find it useful. I aim to make it as easy to use as possible & plan to extend on this and make it more useful over time.

Advantages: Faster than Pandas (for most things that this tool can do - will upload benchmark someday). Intuitive simple docs. Does more i.e concurrent KNN and ANN prediction.

Disadvantages: Only does a tiny slice of what pandas is capable of - but of the things it can do, it does well!

Docs: https://pkg.go.dev/github.com/tbal999/jsonbase

Below is an example of it being used as a simple console application. Demo

What can you do with it?

Import flat files, 1D or 2D slices.

  • You can import csv files for example, or alternatively you could import a SQL query.

Use Buffer

  • Most of the queries get passed to a buffer table, which you print out via 'Print()'.
  • In between transformations, you can save the buffer as a new table, and do further transformations. It's like a temporary table in SQL.
  • You could join one table to another, save this output, then join the output to another table etc etc.

Sum

  • You can count sum of total in a column of integers.

Count

  • You can find the individual count of each unique item in a column.

Regex

  • You can use regular expressions on columns to find either matches or not-matches.

Order

  • You can order specific columns of text or integers in either ascending or descending order.

Row

  • You can grab items (after ordering them) at a specific row. I.e the second oldest instance of each unique item.

Unpivot

  • You can unpivot data.

Normalize

  • You can normalize a set of data.

Add Index

  • You can add an index to the item.

Left Join

  • You can perform a left join on two tables - on one column match

Replace strings

  • You can use regex to find items in a column, and then replace them with new strings
  • I.e find all items with unnecessary whitespace, and then delete that whitespace.

Functions

  • You can apply functions directly to integer columns and choose how many decimal places you want back.

Conditionals

  • You can apply conditional functions to integers in integer columns, and find either matches or not-matches.

Date to days

  • You can convert dates (in many different formats!) to days from delta.

Column iteration

  • You can grab the columns of a table and then pass a single column query through all columns.
  • For example you could remove unnecessary whitespace from every single column.

ML / stats

  • You can describe a dataset on a specific column via 'Describe()' (gives you average / standard deviation for each item in a column)
  • You can plot scatterplots (making use of termui library)
  • You can use KNN machine learning algorithm - bruteforce algorithm with concurrency to make it much faster (try low numbers first!).
  • For larger datasets you can use artificial neural network via NNtrain and NNpredict.

Using a combination of the above alongside the buffer, you can perform a lot of data analysis tasks. Once you've put together a pipeline that you want to automate, you can then build it and deploy it as an application.

Documentation

Index

Constants

This section is empty.

Variables

View Source
var Temptable [][]string

Temptable is the internal buffer for storing any queries - printable via 'Output' function. Exported just in case you want to access it directly.

Functions

This section is empty.

Types

type Database

type Database struct {
	Table []table.Table
	Net   models.Network
}

Database is a struct that stores all tables and one neural network (if you want to save and use ANN later on)

func (Database) AddIndex

func (d Database) AddIndex(start int)

AddIndex adds index column to wahtever is stored currently in Temptable Integer is where index starts from i.e 0 = starts from 0.

func (Database) AddString

func (d Database) AddString(columnname, addstring string)

AddString adds a column containing string Passes to Buffer

func (Database) Calculation2DInt

func (d Database) Calculation2DInt(tablename, column1name, column2name string, function table.Gl2func, decimals int)

Calculation2DInt lets you do calculations on the rows of two numerical columns. You can pass a function of type func(x, y float64) float64. Affects table directly & adds another column (x name + _2D_ + y name) Digits is how many decimal places the results will be

func (Database) CalculationInt

func (d Database) CalculationInt(tablename, columnname string, function table.Glfunc, decimals int)

CalculationInt lets you do calculations on the rows of a numerical column. You can pass a function of type func(column float64) float64. Affects table directly & adds another column (column name + _1D) Digits is how many decimal places the results will be

func (Database) Clear

func (d Database) Clear()

Clear deletes all data currently in temptable buffer

func (Database) Columns

func (d Database) Columns(table string) []string

Columns returns a string array of the columns in a specific table Used for when you want to do adjustments on each column via a loop

func (Database) Conditional2DInt

func (d Database) Conditional2DInt(tablename, column1name, column2name string, function table.Cn2func, decimals int)

Conditional2DInt lets you do conditionals on two integer columns at once. You can pass a function of type func(x, y float64) bool. Passes to Buffer Results that are true will be returned

func (Database) Conditional2DText

func (d Database) Conditional2DText(tablename, column1name, column2name string, function table.Txt2func)

Conditional2DText lets you do conditionals on two string columns at once. You can pass a function of type func(x, y string) bool. Passes to Buffer Results that are true will be returned

func (Database) ConditionalInt

func (d Database) ConditionalInt(tablename, column string, function table.Cnfunc, decimals int)

ConditionalInt lets you do conditionals on an integer column. You can pass a function of type func(x float64) bool. Passes to Buffer Results that are true will be returned

func (Database) Count

func (d Database) Count(tablename, columnname string)

Count lets you count the number of instances of all unique row items in column 'column' in table 'table'. Passes to Buffer

func (Database) DateToDays

func (d Database) DateToDays(tablename, columnname, parse string, delta float64)

DateToDays converts all dates in a column to days from today and then adds a new column called 'DateToDays' Directly Affects Table just choose table and column, parse is for what the date layout is i.e example - if it is SQL layout 2006-01-02T15:04:05-0700 then you want to write in '2006-01-02T15:04:05-0700'

func (Database) Describe

func (d Database) Describe(table, dependentcolumn string)

Describe - pass a table & a dependent column to describe table around that column.

func (*Database) ExportAsCSV

func (d *Database) ExportAsCSV(filename string)

ExportAsCSV lets you export temptable buffer as a CSV file.

func (Database) GrabTable

func (d Database) GrabTable(tablename string)

GrabTable lets you grab a table by name and then places the table columns/rows in the Temptable buffer. Passes to Buffer

func (*Database) Import1DString

func (d *Database) Import1DString(input []string, tablename, delimiter string, header bool)

Import1DString lets you import 1D string arrays. filename is the name of file, delimiter is the delimiter that the string array is delimited by. header is true or false depending on whether there's columns in the data, if not, auto generated columns will be added.

func (*Database) Import2DString

func (d *Database) Import2DString(input [][]string, tablename, delimiter string, header bool)

Import2DString lets you import 2D string arrays. filename is the name of file, delimiter is the delimiter that the string array is delimited by. header is true or false depending on whether there's columns in the data, if not, auto generated columns will be added.

func (*Database) ImportFile

func (d *Database) ImportFile(filename, tablename, delimiter string, header bool)

ImportFile lets you import delimited flat files. filename is the name of file, delimiter is the delimiter that the file is delimited by. Set 'header' to true if there's a header for the file, otherwise set to false. Header is only for rune delimited files i.e a comma, it doesn't matter what you set it to if the file is delimited by '\n'

func (*Database) ImportString

func (d *Database) ImportString(input string, tablename, delimiter string, header bool)

ImportString lets you import strings delimited by EOL. filename is the name of file, delimiter is the delimiter that the string array is delimited by. header is true or false depending on whether there's columns in the data, if not, auto generated columns will be added.

func (Database) Join

func (d Database) Join(table1, column1, table2, column2 string)

Join is: left join table1 on table1.column1 = table2.column2 Passes to Buffer

func (Database) KNNclass

func (d Database) KNNclass(trainingtable, testtable, identifiercolumn string, knumber, threads int, trainingmode bool)

KNNclass classifier using a training table to predict output on test table using identifier column. Passes to Buffer

func (Database) KNNreg

func (d Database) KNNreg(trainingtable, testtable, identifiercolumn string, knumber, threads int)

KNNreg regression using a training table to predict numerical output on test table using identifier column. Passes to Buffer

func (*Database) LoadDBase

func (d *Database) LoadDBase(filename string)

LoadDBase lets you load a database that you have previously saved

func (Database) NNpredict

func (d Database) NNpredict(trainingtable string)

NNpredict use a trained neural network to predict another dataset, passing the table & identifier column.

func (*Database) NNtrain

func (d *Database) NNtrain(trainingtable, identifiercolumn string, hidden, epochs int, learningrate float64)

NNtrain train a neural network using a training table. Need to provide training table, identifier column as well as number of hidden weights, epochs and learning rate. No need to provide number of inputs and outputs - this is calculated automatically to save troubles.

func (Database) Normalize

func (d Database) Normalize(tablename string)

Normalize - normalises the data in a table. Affects table directly

func (Database) Order

func (d Database) Order(tablename, columnname string, order bool)

Order re-orders a disorderly set of data by one column of integers Directly affects table. boolean true for ASC false for DESC

func (Database) Plot

func (d Database) Plot(table, namecolumn string)

Plot - pass a table and a column to generate a plot of all fields against the column items. Max sample size is 155 - if one item has more than 155 samples it will display only a sample of the dataset.

func (Database) Print

func (d Database) Print(clear bool, howmany int)

Print prints out the Temptable, bool lets you determine if table is cleared after print. Howmany is how many rows you want printed (0 for all rows)

func (Database) Regex

func (d Database) Regex(tablename, columnname, regexquery string, boolean bool)

Regex lets you grab a table where items within a column match a regular expression that you can pass in to the function and pulls whether they do or don't match (true/false) Passes to Buffer

func (Database) RegexReplace

func (d Database) RegexReplace(tablename, columnname, regexquery, oldstring, newstring string)

RegexReplace lets you replace substrings in strings with new string for rows that match a Regex. Affects table directly

func (Database) Row

func (d Database) Row(tablename, columnname string, rownumber int)

Row lets you grab a specific row number of items that have already been orderered. Passes to Buffer

func (*Database) SaveBuffer

func (d *Database) SaveBuffer(name string, howmany int, clear bool)

SaveBuffer lets you save the current Temptable as a jsonbase table - name is the name of the table, howmany is how many rows you want to save clear is whether you want to clear the buffer after you've saved it.

func (Database) SaveDBase

func (d Database) SaveDBase(filename string)

SaveDBase lets you save a database that you are currently working with

func (Database) Select

func (d Database) Select(columns []string)

Select lets you trim columns in temptable buffer to specific columns. You must pass in a 1D string array of column headers. Passes to Buffer

func (*Database) Shuffle

func (d *Database) Shuffle(tablename string)

Shuffle lets you shuffle data in a table randomly Affects table directly

func (*Database) Split

func (d *Database) Split(tablename, trainingname, testname string, ratio int)

Split - split a set of data up into two new tables (training / testing) at a certain ratio i.e 2 will be 50/50.

func (Database) Sum

func (d Database) Sum(tablename, columnname string)

Sum lets you count sum of total in a column of integers Passes to Buffer

func (Database) Timer

func (d Database) Timer(start time.Time)

Timer lets you track how long queries take (try defer Timer(time.Now()))

func (Database) Transpose

func (d Database) Transpose()

Transpose flips the Temptable so columns are rows and rows are columns.

func (Database) Unpivot

func (d Database) Unpivot(tablename, pivotcolumn string)

Unpivot lets you unpivot data in table. Passes to Buffer

Directories

Path Synopsis

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL