Documentation ¶
Index ¶
- Variables
- func GeneralizedVariance(rows, columns int, data []float64) float64
- type Clustering
- func (c *Clustering) Assign(data [][]float64) (*Clustering, error)
- func (c *Clustering) NearestNeighbor() *Clustering
- func (c *Clustering) OutlierClustering() *Clustering
- func (c *Clustering) OutlierDetection() *Clustering
- func (c *Clustering) Run(distanceFunc DistanceFunc, score string, mst bool) error
- func (c *Clustering) Subsample(n int) *Clustering
- func (c *Clustering) Verbose() *Clustering
- func (c *Clustering) Voronoi() *Clustering
- type DistanceFunc
- type Outlier
- type Outliers
Constants ¶
This section is empty.
Variables ¶
var ( // VarianceScore will select an optimal clustering // that minimizes the generalized variance across each cluster. VarianceScore = "variance_score" // StabilityScore will select an optimal clustering that // maximized the stability across all clusters. StabilityScore = "stability_score" )
var ( // ErrMCS ... ErrMCS = errors.New("minimum cluster size is too small") // ErrDataLen ... ErrDataLen = errors.New("length of data is less than minimum cluster size") // ErrRowLength ... ErrRowLength = errors.New("row is incorrect length") )
var EuclideanDistance = func(v1, v2 []float64) float64 { acc := 0.0 for i, v := range v1 { acc += math.Pow((v - v2[i]), 2) } return math.Pow(acc, 0.5) }
EuclideanDistance ...
Functions ¶
func GeneralizedVariance ¶
GeneralizedVariance will return the determinant of the covariance matrix of the supplied data. The supplied data is a list of 'rows' observations of length 'columns'.
Types ¶
type Clustering ¶
type Clustering struct { Clusters clusters // contains filtered or unexported fields }
Clustering struct which holds all final results.
func NewClustering ¶
func NewClustering(data [][]float64, minimumClusterSize int) (*Clustering, error)
NewClustering creates (a pointer to) a new clustering struct. This function does not automatically start the clustering process. The `Run` method needs to be called to do that. Make sure to apply all options *before* calling `Run`.
func (*Clustering) Assign ¶
func (c *Clustering) Assign(data [][]float64) (*Clustering, error)
Assign will assign a list of data points to an existing cluster. If the original clustering had OutlierDetection option enabled then it will perform outlier detection based on existing outliers. The results are returned as a new clustering object with only the indexes from the supplied data. All clusters returned have the same ID as they had in the original clustering. This method can be useful if a sampling was used for the initial clustering and the data points outside of the sample need to be assigned to a cluster as well.
func (*Clustering) NearestNeighbor ¶
func (c *Clustering) NearestNeighbor() *Clustering
NearestNeighbor specifies if nearest-neighbor distances should be used for outlier detection and for voronoi clustering instead of centroid-based distances. NearestNeighbor will find the closest assigned data point to an unassigned data point and consider the unassigned data point to be of that same cluster (as an outlier and/or a point).
func (*Clustering) OutlierClustering ¶
func (c *Clustering) OutlierClustering() *Clustering
OutlierClustering is an option to group the outliers of a cluster into a new cluster if there are at least a minimum-cluster-size number of them. This option will automatically perform outlier detection on the clustering as well.
func (*Clustering) OutlierDetection ¶
func (c *Clustering) OutlierDetection() *Clustering
OutlierDetection will track all unassigned points as outliers of their nearest cluster. It provides a `NormalizedDistance` value for each outlier which can be interpreted as the probability of the point being an outlier (relative to all other outliers).
func (*Clustering) Run ¶
func (c *Clustering) Run(distanceFunc DistanceFunc, score string, mst bool) error
Run will run the clustering.
func (*Clustering) Subsample ¶
func (c *Clustering) Subsample(n int) *Clustering
Subsample will take the first 'n' data points and perform clustering on those. 'n' is a provided argument and should be between 0 and the total data size. Voronoi clustering will be performed after the clusters have been found for all points that are not in the subsample.
func (*Clustering) Verbose ¶
func (c *Clustering) Verbose() *Clustering
Verbose will set verbosity to true for clustering process and the internals of a clustering run will be logged to stdout.
func (*Clustering) Voronoi ¶
func (c *Clustering) Voronoi() *Clustering
Voronoi will set voronoi-clustering to true, and after density clustering is performed, all points not assigned to a cluster will be placed into their nearest cluster (by centroid distance).