feature

package

v0.5.0 Latest Latest Go to latest Published: Nov 6, 2022 License: AGPL-3.0 Imports: 5 Imported by: 0

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

github.com/auxten/edgeRec

Links

Open Source Insights

Documentation ¶

Index ¶

func HashOneHot(buf []byte, size int) []float64
func HashOneHot32(buf []byte, size int) []float32
func SimpleOneHot(value int, size int) []float64
func StringSplitMultiHot(str string, sep string, size int) []float64
type CountVectorizer
- func (t *CountVectorizer) FeatureNames() []string
- func (t *CountVectorizer) Fit(vals []string)
- func (t *CountVectorizer) NumFeatures() int
- func (t *CountVectorizer) Transform(v string) []float64
- func (t *CountVectorizer) TransformInplace(dest []float64, v string)
type Identity
- func (t *Identity) Fit(_ []float64)
- func (t *Identity) Transform(v float64) float64
type KBinsDiscretizer
- func (t *KBinsDiscretizer) Fit(vals []float64)
- func (t *KBinsDiscretizer) Transform(v float64) float64
type MaxAbsScaler
- func (t *MaxAbsScaler) Fit(vals []float64)
- func (t *MaxAbsScaler) Transform(v float64) float64
type MinMaxScaler
- func (t *MinMaxScaler) Fit(vals []float64)
- func (t *MinMaxScaler) Transform(v float64) float64
type OneHotEncoder
- func (t *OneHotEncoder) FeatureNames() []string
- func (t *OneHotEncoder) Fit(vs []string)
- func (t *OneHotEncoder) NumFeatures() int
- func (t *OneHotEncoder) Transform(v string) []float64
- func (t *OneHotEncoder) TransformInplace(dest []float64, v string)
type OrdinalEncoder
- func (t *OrdinalEncoder) Fit(vals []string)
- func (t *OrdinalEncoder) Transform(v string) float64
type QuantileScaler
- func (t *QuantileScaler) Fit(vals []float64)
- func (t *QuantileScaler) Transform(v float64) float64
type SampleNormalizerL1
- func (t *SampleNormalizerL1) Fit(_ []float64)
- func (t *SampleNormalizerL1) Transform(vs []float64) []float64
- func (t *SampleNormalizerL1) TransformInplace(dest []float64, vs []float64)
type SampleNormalizerL2
- func (t *SampleNormalizerL2) Fit(_ []float64)
- func (t *SampleNormalizerL2) Transform(vs []float64) []float64
- func (t *SampleNormalizerL2) TransformInplace(dest []float64, vs []float64)
type StandardScaler
- func (t *StandardScaler) Fit(vals []float64)
- func (t *StandardScaler) Transform(v float64) float64
type StructTransformer
- func (s *StructTransformer) Fit(_ []interface{})
- func (s *StructTransformer) Transform(v interface{}) []float64
type TFIDFVectorizer
- func (t *TFIDFVectorizer) FeatureNames() []string
- func (t *TFIDFVectorizer) Fit(vals []string)
- func (t *TFIDFVectorizer) NumFeatures() int
- func (t *TFIDFVectorizer) Transform(v string) []float64
- func (t *TFIDFVectorizer) TransformInplace(dest []float64, v string)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func HashOneHot ¶

func HashOneHot(buf []byte, size int) []float64

func HashOneHot32 ¶ added in v0.3.0

func HashOneHot32(buf []byte, size int) []float32

func SimpleOneHot ¶

func SimpleOneHot(value int, size int) []float64

func StringSplitMultiHot ¶

func StringSplitMultiHot(str string, sep string, size int) []float64

Types ¶

type CountVectorizer ¶

type CountVectorizer struct {
	Mapping   map[string]uint // word to index
	Separator string          // default space
}

CountVectorizer performs bag of words encoding of text.

Separator should not be a part of any word. Responsibility to ensure this is on caller. Words that have separator as its substring will be ommited.

Mapping should contain all values from 0 to N where N is len(Mapping). Responsibility to ensure this is on caller. If some index is higher than N or lower than 0, then code will panic. If some index is not set, then that index will be skipped. If some index is set twice, then index will have sum of words.

func (*CountVectorizer) FeatureNames ¶

func (t *CountVectorizer) FeatureNames() []string

FeatureNames returns slice with produced feature names

func (*CountVectorizer) Fit ¶

func (t *CountVectorizer) Fit(vals []string)

Fit assigns a number from 0 to N for each word in input, where N is number of words

func (*CountVectorizer) NumFeatures ¶

func (t *CountVectorizer) NumFeatures() int

NumFeatures returns num of features made for single input field

func (*CountVectorizer) Transform ¶

func (t *CountVectorizer) Transform(v string) []float64

Transform counts how many times each word appeared in input

func (*CountVectorizer) TransformInplace ¶

func (t *CountVectorizer) TransformInplace(dest []float64, v string)

TransformInplace counts how many times each word appeared in input, inplace version. It is responsibility of caller to zero-out destination. Using zero memory allocation algorithm based on `strings.Split`. Utilizing that string is slice of bytes. Works fine with UTF-8.

type Identity ¶

type Identity struct{}

Identity is a transformer that returns unmodified input value

func (*Identity) Fit ¶

func (t *Identity) Fit(_ []float64)

Fit is not used, it is here only to keep same interface as rest of transformers

func (*Identity) Transform ¶

func (t *Identity) Transform(v float64) float64

Transform returns same value as input

type KBinsDiscretizer ¶

type KBinsDiscretizer struct {
	QuantileScaler
}

KBinsDiscretizer based on quantile strategy

func (*KBinsDiscretizer) Fit ¶

func (t *KBinsDiscretizer) Fit(vals []float64)

Fit fits quantile scaler

func (*KBinsDiscretizer) Transform ¶

func (t *KBinsDiscretizer) Transform(v float64) float64

Transform finds index of matched quantile for input

type MaxAbsScaler ¶

type MaxAbsScaler struct {
	Max float64
}

MaxAbsScaler transforms value into -1 to +1 range linearly

func (*MaxAbsScaler) Fit ¶

func (t *MaxAbsScaler) Fit(vals []float64)

Fit finds maximum abssolute value

func (*MaxAbsScaler) Transform ¶

func (t *MaxAbsScaler) Transform(v float64) float64

Transform scales value into -1 to +1 range

type MinMaxScaler ¶

type MinMaxScaler struct {
	Min float64
	Max float64
}

MinMaxScaler is a transformer that rescales value into range between min and max

func (*MinMaxScaler) Fit ¶

func (t *MinMaxScaler) Fit(vals []float64)

Fit findx min and max value in range

func (*MinMaxScaler) Transform ¶

func (t *MinMaxScaler) Transform(v float64) float64

Transform scales value from 0 to 1 linearly

type OneHotEncoder ¶

type OneHotEncoder struct {
	Mapping map[string]uint // word to index
}

OneHotEncoder encodes string value to corresponding index

Mapping should contain all values from 0 to N where N is len(Mapping). Responsibility to ensure this is on caller. If some index is higher than N or lower than 0, then code will panic. If some index is not set, then that index will be skipped. If some index is set twice, then index will have effect of either of words.

func (*OneHotEncoder) FeatureNames ¶

func (t *OneHotEncoder) FeatureNames() []string

FeatureNames returns names of each produced value.

func (*OneHotEncoder) Fit ¶

func (t *OneHotEncoder) Fit(vs []string)

Fit assigns each value from inputs a number based on order of occurrence in input data. Ignoring empty strings in input.

func (*OneHotEncoder) NumFeatures ¶

func (t *OneHotEncoder) NumFeatures() int

NumFeatures returns number of features one field is expanded

func (*OneHotEncoder) Transform ¶

func (t *OneHotEncoder) Transform(v string) []float64

Transform assigns 1 to value that is found

func (*OneHotEncoder) TransformInplace ¶

func (t *OneHotEncoder) TransformInplace(dest []float64, v string)

TransformInplace assigns 1 to value that is found, inplace. It is responsibility of a caller to reset destination to 0.

type OrdinalEncoder ¶

type OrdinalEncoder struct {
	Mapping map[string]uint
}

OrdinalEncoder returns 0 for string that is not found, or else a number for that string

Mapping should contain all values from 0 to N where N is len(Mapping). Responsibility to ensure this is on caller. If some index is higher than N or lower than 0, then code will panic. If some index is not set, then that index will be skipped. If some index is set twice, then index will have effect of either of words.

func (*OrdinalEncoder) Fit ¶

func (t *OrdinalEncoder) Fit(vals []string)

Fit assigns each word value from 1 to N Ignoring empty strings in input.

func (*OrdinalEncoder) Transform ¶

func (t *OrdinalEncoder) Transform(v string) float64

Transform returns number of input, if not found returns zero value which is 0

type QuantileScaler ¶

type QuantileScaler struct {
	Quantiles []float64
}

QuantileScaler transforms any distribution to uniform distribution This is done by mapping values to quantiles they belong to.

func (*QuantileScaler) Fit ¶

func (t *QuantileScaler) Fit(vals []float64)

Fit sets parameters for quantiles based on input. Number of quantiles are specified by size of Quantiles slice. If it is empty or nil, then 100 is used as default. If input is smaller than number of quantiles, then using length of input.

func (*QuantileScaler) Transform ¶

func (t *QuantileScaler) Transform(v float64) float64

Transform changes distribution into uniform one from 0 to 1

type SampleNormalizerL1 ¶

type SampleNormalizerL1 struct{}

SampleNormalizerL1 transforms features for single sample to have norm L1=1

func (*SampleNormalizerL1) Fit ¶

func (t *SampleNormalizerL1) Fit(_ []float64)

Fit is empty, kept only to keep same interface

func (*SampleNormalizerL1) Transform ¶

func (t *SampleNormalizerL1) Transform(vs []float64) []float64

Transform returns L1 normalized vector

func (*SampleNormalizerL1) TransformInplace ¶

func (t *SampleNormalizerL1) TransformInplace(dest []float64, vs []float64)

TransformInplace returns L1 normalized vector, inplace

type SampleNormalizerL2 ¶

type SampleNormalizerL2 struct{}

SampleNormalizerL2 transforms features for single sample to have norm L2=1

func (*SampleNormalizerL2) Fit ¶

func (t *SampleNormalizerL2) Fit(_ []float64)

Fit is empty, kept only to keep same interface

func (*SampleNormalizerL2) Transform ¶

func (t *SampleNormalizerL2) Transform(vs []float64) []float64

Transform returns L2 normalized vector

func (*SampleNormalizerL2) TransformInplace ¶

func (t *SampleNormalizerL2) TransformInplace(dest []float64, vs []float64)

TransformInplace returns L2 normalized vector, inplace

type StandardScaler ¶

type StandardScaler struct {
	Mean float64
	STD  float64
}

StandardScaler transforms feature into normal standard distribution.

func (*StandardScaler) Fit ¶

func (t *StandardScaler) Fit(vals []float64)

Fit computes mean and standard deviation

func (*StandardScaler) Transform ¶

func (t *StandardScaler) Transform(v float64) float64

Transform centralizes and scales based on standard deviation and mean

type StructTransformer ¶

type StructTransformer struct {
	Transformers []interface{}
}

StructTransformer uses reflection to encode struct into feature vector. It uses struct tags to create feature transformers for each field. Since it is using reflection, there is a slight overhead for large structs, which can be seen in benchmarks. For better performance, use codegen version for your struct, refer to README of this repo.

func (*StructTransformer) Fit ¶

func (s *StructTransformer) Fit(_ []interface{})

Fit will fit all field transformers

func (*StructTransformer) Transform ¶

func (s *StructTransformer) Transform(v interface{}) []float64

Transform applies all field transformers

type TFIDFVectorizer ¶

type TFIDFVectorizer struct {
	CountVectorizer
	DocCount     []uint // number of documents where i-th word from CountVectorizer appeared in
	NumDocuments int
	Normalizer   SampleNormalizerL2
}

TFIDFVectorizer performs tf-idf vectorization on top of count vectorization. Based on: https://scikit-learn.org/stable/modules/feature_extraction.html Using non-smooth version, adding 1 to log instead of denominator in idf.

DocCount should have len of len(CountVectorizer.Mapping). It is responsibility of a caller to sensure it is so.

func (*TFIDFVectorizer) FeatureNames ¶

func (t *TFIDFVectorizer) FeatureNames() []string

FeatureNames returns slice with produced feature names.

func (*TFIDFVectorizer) Fit ¶

func (t *TFIDFVectorizer) Fit(vals []string)

Fit fits CountVectorizer and extra information for tf-idf computation

func (*TFIDFVectorizer) NumFeatures ¶

func (t *TFIDFVectorizer) NumFeatures() int

NumFeatures returns number of features for single field

func (*TFIDFVectorizer) Transform ¶

func (t *TFIDFVectorizer) Transform(v string) []float64

Transform performs tf-idf computation

func (*TFIDFVectorizer) TransformInplace ¶

func (t *TFIDFVectorizer) TransformInplace(dest []float64, v string)

TransformInplace performs tf-idf computation, inplace. It is responsibility of caller to zero-out destination.

Source Files ¶

View all Source files

Directories ¶

Path	Synopsis
embedding
corpus
corpus/cpsutil
corpus/dictionary
corpus/dictionary/node
corpus/filter
corpus/fs
corpus/memory
emb
emb/embutil
model
model/modelutil
model/modelutil/matrix
model/modelutil/subsample
model/modelutil/vector
model/word2vec
search
search/console
search/searchutil
util/clock
util/verbose
preprocessing Package preprocessing includes scaling, centering, normalization, binarization and imputation methods.	Package preprocessing includes scaling, centering, normalization, binarization and imputation methods.
ubcache

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL