encoder

package
v0.6.11 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 30, 2022 License: BSD-3-Clause, GPL-3.0 Imports: 6 Imported by: 4

Documentation

Index

Constants

This section is empty.

Variables

This section is empty.

Functions

func GetIndex added in v0.6.6

func GetIndex(arr []string, val string) float64

GetIndex returns the index for a value in a string array.

func LoadValueEncoders added in v0.6.6

func LoadValueEncoders()

LoadValueEncoders loads all value encoders from disk.

func MinMax added in v0.6.6

func MinMax(value float64, sum *ColumnSummary) (result string)

MinMax will apply the minmax encoding.

func MinMaxIntArr added in v0.6.6

func MinMaxIntArr(array []float64) (float64, float64)

MinMaxIntArr returns the highest and the lowest numbers from a float64 array.

func SetConfig added in v0.6.6

func SetConfig(c *Config)

SetConfig will set the config for all registered encoders.

func StoreValueEncoders added in v0.6.6

func StoreValueEncoders()

StoreValueEncoders stores all value encoders on disk.

func ZScore added in v0.6.6

func ZScore(value float64, sum *ColumnSummary) (result string)

ZScore will apply the Zscore encoding.

Types

type ColumnSummary added in v0.6.6

type ColumnSummary struct {
	Version string `json:"version"`
	Col     string `json:"col"`

	// Data type of the column, eg: string or numeric
	Typ ColumnType `json:"typ"`

	// Map of strings mapped to their index
	// tracked as float64 to avoid additional type casts
	UniqueStrings map[string]float64 `json:"uniqueStrings"`

	// Current string index
	// tracked as float64 to avoid additional type casts
	Index float64

	// standard deviation and mean
	Std  float64 `json:"std"`
	Mean float64 `json:"mean"`

	// min, max
	Min float64 `json:"min"`
	Max float64 `json:"max"`

	sync.Mutex
}

ColumnSummary collects statistical information about a column in the dataset.

type ColumnType added in v0.6.6

type ColumnType int

ColumnType is the data type of the column

const (
	// TypeString is a data type for text columns
	TypeString ColumnType = iota

	// TypeNumeric is a data type for numeric columns
	TypeNumeric
)

func (ColumnType) String added in v0.6.6

func (c ColumnType) String() string

type Config

type Config struct {

	// use zscore for normalization
	ZScore bool

	// use minmax for normalization
	MinMax bool

	// normalize the categorical values after encoding them to numeric format
	NormalizeCategoricals bool
}

Config holds configuration parameters.

type ValueEncoder added in v0.6.6

type ValueEncoder struct {
	sync.Mutex
	// contains filtered or unexported fields
}

ValueEncoder handles online encoding of incoming data and keeps the required state for each feature.

func NewValueEncoder added in v0.6.6

func NewValueEncoder() *ValueEncoder

NewValueEncoder returns a new encoding manager instance.

func (*ValueEncoder) Bool added in v0.6.6

func (m *ValueEncoder) Bool(b bool) string

Bool handles encoding of boolean values to numeric format.

func (*ValueEncoder) Float64 added in v0.6.6

func (m *ValueEncoder) Float64(field string, val float64) string

Float64 handles encoding of 64bit float values according to the ValueEncoder configuration.

func (*ValueEncoder) GetSummary added in v0.6.6

func (m *ValueEncoder) GetSummary(colType ColumnType, field string) *ColumnSummary

GetSummary returns the summary for the given column type and field name. It will create a new one if none is being tracked yet.

func (*ValueEncoder) Int added in v0.6.6

func (m *ValueEncoder) Int(field string, val int) string

Int handles encoding of integer values according to the ValueEncoder configuration.

func (*ValueEncoder) Int32 added in v0.6.6

func (m *ValueEncoder) Int32(field string, val int32) string

Int32 handles encoding of 32bit integer values according to the ValueEncoder configuration.

func (*ValueEncoder) Int64 added in v0.6.6

func (m *ValueEncoder) Int64(field string, val int64) string

Int64 handles encoding of 64bit integer values according to the ValueEncoder configuration.

func (*ValueEncoder) String added in v0.6.6

func (m *ValueEncoder) String(field string, val string) string

String handles encoding of categorical values according to the ValueEncoder configuration.

func (*ValueEncoder) Uint32 added in v0.6.6

func (m *ValueEncoder) Uint32(field string, val uint32) string

Uint32 handles encoding of unsigned 32bit integer values according to the ValueEncoder configuration.

func (*ValueEncoder) Uint64 added in v0.6.6

func (m *ValueEncoder) Uint64(field string, val uint64) string

Uint64 handles encoding of unsigned 64bit integer values according to the ValueEncoder configuration.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL