dbscan

package module

v0.0.0-...-f30c2f0 Latest Latest Go to latest Published: Jul 21, 2015 License: BSD-3-Clause Imports: 6 Imported by: 1

Details

Valid go.mod file

The Go module system was introduced in Go 1.11 and is the official dependency management solution for Go.
Redistributable license

Redistributable licenses place minimal restrictions on how software can be used, modified, and redistributed.
Tagged version

Modules with tagged versions give importers more predictable builds.
Stable version

When a project reaches major version v1 it is considered stable.
Learn more about best practices

Repository

bitbucket.org/sjbog/go-dbscan

Links

Open Source Insights

README ¶

DBSCAN (go package)

DBSCAN (Density-based spatial clustering) clustering optimized for multicore processing.

Idea

If the distance of two points in any dimension is more than eps, than the total distance is more than eps

Calculate variance for each dimension (in parallel), find & store dimension with max variance
Sort by the dimension with max variance
Build a neighborhood map in parallel (Euclidean distance)

Slide through sorted data, from lowest to highest. Sliding performed until neighbor_value <= curr_value + eps
Use array / indices to store neighbors lists; ConcurrentLinkedQueue holds density reachable points.

Use DFS (Depth First Search) to find clusters

Note: Unless go1.5 (or newer), need to set GOMAXPROCS=XX to utilize all cores

Example explaining how it works

Consider the following points:

Name => { X, Y }
"0" => { 2, 4 }
"1" => { 7, 3 }
"2" => { 3, 5 }
"3" => { 5, 3 }
"4" => { 7, 4 }
"5" => { 6, 8 }
"6" => { 6, 5 }
"7" => { 8, 4 }
"8" => { 2, 5 }
"9" => { 3, 7 }

Let eps = 2.0, minPts = 2:

  Y|
   |
  8|                       x <- Noise
   |
  7|           x
   |           |
  6|           | <- Points are density connected (from [3,5])
   |           |
  5|       x---x           x
   |         /               \
  4|       x                   x---x
   |                           |
  3|                   x-------x
   |                        /\
  2|                   Points are density reachable
   |
  1|
___|___________________________________
  0|   1   2   3   4   5   6   7   8  X

If we'll consider only X dimension, points sorted by this dimension will look like:

X  0  1  2  3  4  5  6  7  8  9
P        o  o     o  o  o  o

"0": 2.0, "8": 2.0, "2": 3.0, "9": 3.0, "3": 5.0, "5": 6.0, "6": 6.0, "1": 7.0, "4": 7.0, "7": 8.0
or
"0": [2.0, 4.0], "8": [2.0, 5.0], "2": [3.0, 5.0], "9": [3.0, 7.0], "3": [5.0, 3.0], "5": [6.0, 8.0], "6": [6.0, 5.0], "1": [7.0, 3.0], "4": [7.0, 4.0], "7": [8.0, 4.0]

We build neighborhood map (array containing concurrent non-blocking queues of density connected points' indices), by considering forward points, which are reachable by eps:

                       0    1    2    3    4    5    6    7    8    9
Sorted data points: [ "0", "8", "2", "9", "3", "5", "6", "1", "4", "7" ]

Point "0": 2.0, sees 3 points in range dimensionValue + eps - "8": 2.0, "2": 3.0, "9": 3.0
(e.g. 2.0 + 2.0 =4 - every point with X dimension value above 4 is definitely not reachable)
Then we check if they are reachable with Euclidean distance and add to neighbors.
Point "9": [3.0, 7.0] isn't density reachable to point "0": [2.0, 4.0]

Neighborhood map:
0: [ 1, 2 ]
1: [ 0 ]
2: [ 0 ]

Point "8": 2.0, sees "2": 3.0, "9": 3.0; but point "9": [3.0, 7.0] isn't density reachable
Neighborhood map:
0: [ 1, 2 ]
1: [ 0, 2 ]
2: [ 0, 1 ]

Point "2": 3.0, sees "9": 3.0, "3": 5.0; but point "3": [5.0, 3.0] isn't density reachable
Neighborhood map:
0: [ 1, 2 ]
1: [ 0, 2 ]
2: [ 0, 1, 3 ]
3: [ 2 ]

Note that we slide only forward, thus also add back-references to self index. Concurrent non-blocking queue allows to add in parallel (order doesn't matter).

Finally neighborhood map:

0: [0, 1, 2]
1: [0, 1, 2]
2: [0, 2, 3, 1]
3: [3, 2]
4: [4, 7]
5: [5]
6: [6, 8]
7: [7, 8, 9, 4]
8: [6, 8, 9, 7]
9: [8, 9, 7]

Now its easy to traverse and find clusters with Depth First Search (or BFS)

Cluster size: 4
 Point: 0, [2.0, 4.0]
 Point: 2, [3.0, 5.0]
 Point: 8, [2.0, 5.0]
 Point: 9, [3.0, 7.0]
Cluster size: 5
 Point: 3, [5.0, 3.0]
 Point: 1, [7.0, 3.0]
 Point: 7, [8.0, 4.0]
 Point: 4, [7.0, 4.0]
 Point: 6, [6.0, 5.0]

Documentation ¶

Overview ¶

DBSCAN (Density-based spatial clustering) clustering optimized for multicore processing.

Usage example:

var clusterer = NewDBSCANClusterer( 2.0, 2 )

var data = []ClusterablePoint{
		&NamedPoint{"0", []float64{2, 4}},
		&NamedPoint{"1", []float64{7, 3}},
		&NamedPoint{"2", []float64{3, 5}},
		&NamedPoint{"3", []float64{5, 3}},
		&NamedPoint{"4", []float64{7, 4}},
	}

clusterer.MinPts = 2
clusterer.SetEps( 2.0 )

// Automatic discovery of dimension with max variance
clusterer.AutoSelectDimension = false
// Set dimension manually
clusterer.SortDimensionIndex = 1

var result  [][]ClusterablePoint = clusterer.Cluster(data)

Index ¶

func Variance(data []ClusterablePoint, dimension int) float64
type ClusterablePoint
- func NamedPointToClusterablePoint(in []*NamedPoint) (out []ClusterablePoint)
type ClusterablePointSlice
type Clusterer
type ConcurrentQueue_InsertOnly
- func NewConcurrentQueue_InsertOnly() *ConcurrentQueue_InsertOnly
- func (self *ConcurrentQueue_InsertOnly) Add(value uint)
- func (self *ConcurrentQueue_InsertOnly) Slice() []uint
type DBSCANClusterer
- func NewDBSCANClusterer(eps float64, minPts int) *DBSCANClusterer
type NamedPoint
- func NewNamedPoint(name string, point []float64) *NamedPoint

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func Variance ¶

func Variance(data []ClusterablePoint, dimension int) float64

Types ¶

type ClusterablePoint ¶

type ClusterablePoint interface {
	GetPoint() []float64
	String() string
}

func NamedPointToClusterablePoint ¶

func NamedPointToClusterablePoint(in []*NamedPoint) (out []ClusterablePoint)

type ClusterablePointSlice ¶

type ClusterablePointSlice struct {
	Data          []ClusterablePoint
	SortDimension int
}

Slice attaches the methods of Interface to []float64, sorting in increasing order.

func (ClusterablePointSlice) Len ¶

func (self ClusterablePointSlice) Len() int

func (ClusterablePointSlice) Less ¶

func (self ClusterablePointSlice) Less(i, j int) bool

func (ClusterablePointSlice) Sort ¶

func (self ClusterablePointSlice) Sort()

Sort is a convenience method.

func (ClusterablePointSlice) Swap ¶

func (self ClusterablePointSlice) Swap(i, j int)

type Clusterer ¶

type Clusterer interface {
	Cluster([]ClusterablePoint) [][]ClusterablePoint
}

type ConcurrentQueue_InsertOnly ¶

type ConcurrentQueue_InsertOnly struct {
	Head unsafe.Pointer
	Size uint64
}

func NewConcurrentQueue_InsertOnly ¶

func NewConcurrentQueue_InsertOnly() *ConcurrentQueue_InsertOnly

func (*ConcurrentQueue_InsertOnly) Add ¶

func (self *ConcurrentQueue_InsertOnly) Add(value uint)

func (*ConcurrentQueue_InsertOnly) Slice ¶

func (self *ConcurrentQueue_InsertOnly) Slice() []uint

type DBSCANClusterer ¶

type DBSCANClusterer struct {
	MinPts, SortDimensionIndex int
	AutoSelectDimension        bool
	// contains filtered or unexported fields
}

func NewDBSCANClusterer ¶

func NewDBSCANClusterer(eps float64, minPts int) *DBSCANClusterer

func (*DBSCANClusterer) BuildNeighborhoodMap ¶

func (this *DBSCANClusterer) BuildNeighborhoodMap(data []ClusterablePoint) []*ConcurrentQueue_InsertOnly

func (*DBSCANClusterer) CalcDistance ¶

func (this *DBSCANClusterer) CalcDistance(aPoint, bPoint []float64) float64

func (*DBSCANClusterer) Cluster ¶

func (this *DBSCANClusterer) Cluster(data []ClusterablePoint) [][]ClusterablePoint

* step 1: sort data by a dimension step 2: slide through sorted data (in parallel), and compute all points in range of eps (everything above eps is definitely isn't directly reachable) step 3: build neighborhood map & proceed DFS *

func (*DBSCANClusterer) GetEps ¶

func (this *DBSCANClusterer) GetEps() float64

func (*DBSCANClusterer) PredictDimensionByMaxVariance ¶

func (this *DBSCANClusterer) PredictDimensionByMaxVariance(data []ClusterablePoint) int

*

Calculate variance for each dimension (in parallel), returns dimension index with max variance

func (*DBSCANClusterer) SetEps ¶

func (this *DBSCANClusterer) SetEps(eps float64)

type NamedPoint ¶

type NamedPoint struct {
	Name  string
	Point []float64
}

func NewNamedPoint ¶

func NewNamedPoint(name string, point []float64) *NamedPoint

func (*NamedPoint) Copy ¶

func (self *NamedPoint) Copy() *NamedPoint

func (*NamedPoint) GetPoint ¶

func (self *NamedPoint) GetPoint() []float64

func (*NamedPoint) String ¶

func (self *NamedPoint) String() string

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL