Documentation ¶
Overview ¶
Package goxmeans implements a library for the xmeans algorithm. See Dan Pelleg and Andrew Moore - X-means: Extending K-means with Efficient Estimation of the Number of Clusters. D = the input set of points R = |D| the number of points in a model. M = number of dimensions assuming spherical Gaussians.
The algorithm consists of two operations repeated until completion.
1. Improve parameters
2. Improve structure
3. If K > Kmax then stop and return a slice of Models with BIC scores, else goto 1.
Index ¶
Constants ¶
This section is empty.
Variables ¶
This section is empty.
Functions ¶
Types ¶
type CentroidChooser ¶
type CentroidChooser interface {
ChooseCentroids(mat *matrix.DenseMatrix, k int) *matrix.DenseMatrix
}
CentroidChooser is the interface that wraps CentroidChooser function.
CetnroidChooser returns a matrix of K coordinates in M dimensions.
type CentroidPoint ¶
type CentroidPoint struct {
// contains filtered or unexported fields
}
CentroidPoint stores the row number in the centroids matrix and the distance squared between the centroid and the point.
type DataCentroids ¶
type DataCentroids struct{}
DataCentroids picks k distinct points from the dataset as initial centroids.
func (DataCentroids) ChooseCentroids ¶
func (c DataCentroids) ChooseCentroids(mat *matrix.DenseMatrix, k int) *matrix.DenseMatrix
DataCentroids picks k distinct points from the dataset. If k is > points in the matrix then k is set to the number of points.
type EllipseCentroids ¶
type EllipseCentroids struct {
Frac float64 // must be btw 0 and 1, this will be what fraction of a truly inscribing ellipse this is
}
EllipseCentroids lays out the initial centroids evenly along an elipse inscribed and centered within the boundaries of the dataset. It is only defined for M=2
- Frac: This must be a float between 0 and 1. It determines the scale of the inscribing ellipse relative to the dataset, so Frac==1.0 produces an ellipse that spans the entire dataset, while Frac==0.5 produces an ellipse spanning half the dataset.
func (EllipseCentroids) ChooseCentroids ¶
func (c EllipseCentroids) ChooseCentroids(mat *matrix.DenseMatrix, k int) *matrix.DenseMatrix
EllipseCentroids lays out the initial centroids evenly along an elipse inscribed and centered within the boundaries of the dataset. It is only defined for M=2
- Frac: This must be a float between 0 and 1. It determines the scale of the inscribing ellipse relative to the dataset, so Frac==1.0 produces an ellipse that spans the entire dataset, while Frac==0.5 produces an ellipse spanning half the dataset.
type EuclidDist ¶
type EuclidDist vectorDistance
func (EuclidDist) CalcDist ¶
func (ed EuclidDist) CalcDist(p, q *matrix.DenseMatrix) float64
CalcDist finds the Euclidean distance between points.
type ManhattanDist ¶
type ManhattanDist struct{}
func (ManhattanDist) CalcDist ¶
func (md ManhattanDist) CalcDist(a, b *matrix.DenseMatrix) float64
CalcDist finds the ManhattanDistance which is the sum of the aboslute difference of the coordinates. Also known as rectilinear distance, city block distance, or taxicab distance.
type Model ¶
type Model struct { Bic float64 Clusters []cluster }
Model is a statistical model with a BIC score and a collection of clusters.
func Xmeans ¶
func Xmeans(datapoints, centroids *matrix.DenseMatrix, k, kmax int, cc, bisectcc CentroidChooser, measurer VectorMeasurer) ([]Model, map[string]error)
Xmeans runs k-means for k lower bound to k upper bound on a data set. Once the k centroids have converged each cluster is bisected and the BIC of the orginal cluster (parent = a model with one centroid) to the the bisected model which consists of two centroids and whichever is greater is committed to the set of clusters for this larger model k.
func (Model) Numcentroids ¶
type PairPointCentroidJob ¶
type PairPointCentroidJob struct {
// contains filtered or unexported fields
}
PairPointCentroidJobs stores the elements that define the job that pairs a point with a centroid.
func (PairPointCentroidJob) PairPointCentroid ¶
func (job PairPointCentroidJob) PairPointCentroid()
PairPointCentroid pairs a point with the closest centroids.
type PairPointCentroidResult ¶
type PairPointCentroidResult struct {
// contains filtered or unexported fields
}
PairPointCentroidResult stores the results of pairing a point with a centroid.
type VectorMeasurer ¶
type VectorMeasurer interface {
CalcDist(a, b *matrix.DenseMatrix) (dist float64)
}
Measurer finds the distance between the points in the columns