Documentation ¶
Overview ¶
Package analysis is used to assemble analysis from individual stages.
Index ¶
- Constants
- Variables
- func LinkStages(stages []*Stage, options *Options) (*expr.Pipeline, chan<- []byte, interface{})
- type Analysis
- type Count
- type Field
- type Fields
- type Histogram
- type HourHistogram
- type Interval
- type Intervals
- type LengthExtremes
- type Options
- type PipelineFactory
- type Processor
- type Stage
- type Type
- type Types
- type ValueExtremes
- type ValueFreq
- type ValueFreqSlice
- type WeekdayHistogram
Constants ¶
const ArrayItemMark = "[]"
ArrayItemMark represents array item in full field name
const NameSeparator = "."
NameSeparator separates names of nested fields
Variables ¶
var ( BsonId string BsonFieldType string BsonCount string BsonCountUnique string BsonValueExtremes string BsonMinValue string BsonMaxValue string BsonAvgValue string BsonLengthExtremes string BsonMinLength string BsonMaxLength string BsonAvgLength string BsonTopNValues string BsonBottomNValues string BsonValueFreqValue string BsonValueFreqCount string BsonValueHistogram string BsonLengthHistogram string BsonWeekdayHistogram string BsonHourHistogram string BsonHistogramStart string BsonHistogramEnd string BsonHistogramRange string BsonHistogramStep string BsonHistogramNumOfSteps string BsonHistogramIntervals string BsonIntervalValue string BsonIntervalCount string JsonHistogramStart string JsonHistogramEnd string JsonHistogramRange string JsonHistogramStep string JsonHistogramNumOfSteps string JsonHistogramIntervals string YamlHistogramStart string YamlHistogramEnd string YamlHistogramRange string YamlHistogramStep string YamlHistogramNumOfSteps string YamlHistogramIntervals string )
Abbreviations for aggregation pipeline.
var AggregationMinVersion = []int{3, 5, 6}
AggregationMinVersion is minimal MongoDB version that allows analysis using aggregation framework
var RandomSampleMinVersion = []int{3, 2, 0}
RandomSampleMinVersion is minimal MongoDB version that allows analysis using random samples
var TypesSort = map[string]int{
"null": 1,
"undefined": 2,
"bool": 3,
"int": 4,
"long": 5,
"double": 6,
"decimal": 7,
"objectId": 8,
"dbPointer": 9,
"symbol": 10,
"string": 11,
"regex": 12,
"javascript": 13,
"javascriptWithScope": 14,
"binData": 15,
"date": 16,
"timestamp": 17,
"minKey": 18,
"maxKey": 19,
"array": 20,
"object": 21,
}
TypesSort represents sort of types in final results.
Functions ¶
Types ¶
type Analysis ¶
type Analysis struct {
// contains filtered or unexported fields
}
Analysis consists of the options, target collection and the four contiguous stages.
func (*Analysis) Run ¶
func (a *Analysis) Run() interface{}
Run the analysis on the selected collection.
func (*Analysis) SetCollection ¶
func (a *Analysis) SetCollection(c *mgo.Collection)
SetCollection set target collection.
func (*Analysis) SetExpandStage ¶
SetExpandStage set expand stage of analysis. Expand stage expands documents to value (result: [name, type] => value). Individual fields in this stage lose their link to the original document that allows analyzing all values of some field in next stages.
func (*Analysis) SetGroupStage ¶
SetGroupStage set group stage of analysis. Group stage group values from expand stage. The values are grouped under the same name and type (result: [name, type] => value aggregation). This stage counts all statistics above the data.
func (*Analysis) SetMergeStage ¶
SetMergeStage set merge stage of analysis. Merge stage merge different types of the same field (result: [name] => types aggregation).
func (*Analysis) SetSampleStage ¶
SetSampleStage set sample stage of analysis. Task of the sample stage is to select the desired sample of documents from the collection and pass them to next stages of analysis.
type Field ¶
type Field struct { Name string `json:"name" yaml:"name" bson:"n"` Level uint `json:"level" yaml:"level"` Count uint64 `json:"count" yaml:"count" bson:"c"` Types Types `json:"types" yaml:"types" bson:"T"` }
Field - analysis results for one document field.
type Histogram ¶
type Histogram struct { Start interface{} `json:"start" yaml:"start" bson:"sta"` // minimal value rounded down with Step End interface{} `json:"end" yaml:"end" bson:"end"` // maximal value rounded up with Step Range float64 `json:"range" yaml:"range" bson:"r"` // (end-start) converted to float64 Step float64 `json:"step" yaml:"step" bson:"s"` // size of one interval, rounded to 1, 0.5, 0.25, 0.2, 0.1, 0.05, 0.025, ... NumberOfSteps uint `json:"numOfSteps" yaml:"numOfSteps" bson:"ns"` // total number of steps Intervals Intervals `json:"intervals" yaml:"intervals" bson:"it"` // values, interval => count }
Histogram of values from one specific type of specific document field.
func (Histogram) MarshalJSON ¶
MarshalJSON - convert intervals to []Count. Key is a interval and empty intervals are added.
func (Histogram) MarshalYAML ¶
MarshalYAML - convert intervals to []Count. Key is a interval and empty intervals are added.
type HourHistogram ¶
type HourHistogram [24]Count
HourHistogram (0 - 23).
func (HourHistogram) MarshalYAML ¶
func (hh HourHistogram) MarshalYAML() (interface{}, error)
MarshalYAML - convert array to slice. yaml.Marshal from unknown reason can not handle array.
type Interval ¶
type Interval struct { Interval uint `bson:"i"` // Interval in histogram. From 0 to (NumberOfSteps - 1) Count Count `bson:"c"` // Number of items belonging to the interval }
Interval and count of values that belong to it.
type LengthExtremes ¶
type LengthExtremes struct { Min uint `json:"min" yaml:"min" bson:"il"` Max uint `json:"max" yaml:"max" bson:"al"` Avg float64 `json:"avg,omitempty" yaml:"avg,omitempty" bson:"gl"` }
LengthExtremes - min, Max, Avg length.
type Options ¶
type Options struct { Location *time.Location // time location for calculations with dates Concurrency int // number of parallel processes for local calculations BufferSize int // buffer size between phases BatchSize int // number of documents in one batch from the database }
Options for all stages of analysis.
type PipelineFactory ¶
PipelineFactory generate pipeline according analysis options.
type Processor ¶
type Processor func(inputCh interface{}, options *Options) interface{}
Processor function has a channel from the previous stage at its input. Return value is output channel that fed to the next stage. The input of the first stage is raw (binary) data from the database. The output of the last stage are the final results.
type Stage ¶
type Stage struct { PipelineFactory PipelineFactory Processor Processor }
Stage can be represented by a pipeline that runs in the database or by a processor function that runs locally. If both are used, then the pipeline results are passed to the input of the processor function.
type Type ¶
type Type struct { Name string `json:"type" yaml:"type" bson:"t"` Count uint64 `json:"count" yaml:"count" bson:"c"` CountUnique uint64 `json:"unique,omitempty" yaml:"unique,omitempty" bson:"cu,omitempty"` ValueExtremes *ValueExtremes `json:"value,omitempty" yaml:"value,omitempty" bson:"ve,omitempty"` LengthExtremes *LengthExtremes `json:"length,omitempty" yaml:"length,omitempty" bson:"le,omitempty"` TopNValues ValueFreqSlice `json:"top,omitempty" yaml:"top,omitempty" bson:"tv,omitempty"` BottomNValues ValueFreqSlice `json:"bottom,omitempty" yaml:"bottom,omitempty" bson:"bv,omitempty"` ValueHistogram *Histogram `json:"valueHistogram,omitempty" yaml:"valueHistogram,omitempty" bson:"vH,omitempty"` LengthHistogram *Histogram `json:"lengthHistogram,omitempty" yaml:"lengthHistogram,omitempty" bson:"lH,omitempty"` WeekdayHistogram *WeekdayHistogram `json:"weekdayHistogram,omitempty" yaml:"weekdayHistogram,omitempty" bson:"wH,omitempty"` HourHistogram *HourHistogram `json:"hourHistogram,omitempty" yaml:"hourHistogram,omitempty" bson:"hH,omitempty"` }
Type - analysis results for one type of the one document field.
type ValueExtremes ¶
type ValueExtremes struct { Min interface{} `json:"min" yaml:"min" bson:"i"` Max interface{} `json:"max" yaml:"max" bson:"a"` Avg interface{} `json:"avg,omitempty" yaml:"avg,omitempty" bson:"g"` }
ValueExtremes - min, Max, Avg value.
type ValueFreq ¶
type ValueFreq struct { Value interface{} `json:"value" yaml:"value" bson:"v"` Count Count `json:"count" yaml:"count" bson:"c"` }
ValueFreq - - frequency of one value occurrence.
type ValueFreqSlice ¶
type ValueFreqSlice []ValueFreq
ValueFreqSlice - frequency of values occurrence.
type WeekdayHistogram ¶
type WeekdayHistogram [7]Count
WeekdayHistogram (0=Sunday ... 6=Saturday).
func (WeekdayHistogram) MarshalYAML ¶
func (wh WeekdayHistogram) MarshalYAML() (interface{}, error)
MarshalYAML - convert array to slice. yaml.Marshal from unknown reason can not handle array.
Directories ¶
Path | Synopsis |
---|---|
stages
|
|
01sample
Package sample represents sample stage of analysis.
|
Package sample represents sample stage of analysis. |
01sample/sampleInDB
Package sampleInDB is the implementation of the sampling stage.
|
Package sampleInDB is the implementation of the sampling stage. |
01sample/tests
Package sampleTests contains common tests for sample stage.
|
Package sampleTests contains common tests for sample stage. |
02expand
Package expand represents expand stage of analysis.
|
Package expand represents expand stage of analysis. |
02expand/expandInDBCommon
Package expandInDBCommon contains common functions for expandInDBDepth and expandInDBSeq packages.
|
Package expandInDBCommon contains common functions for expandInDBDepth and expandInDBSeq packages. |
02expand/expandInDBDepth
Package expandInDBDepth is the implementation of the expand stage that runs in database.
|
Package expandInDBDepth is the implementation of the expand stage that runs in database. |
02expand/expandInDBSeq
Package expandInDBSeq is the implementation of the expand stage that runs in database.
|
Package expandInDBSeq is the implementation of the expand stage that runs in database. |
02expand/expandLocally
Package expandLocally is the implementation of the expand stage that runs locally.
|
Package expandLocally is the implementation of the expand stage that runs locally. |
02expand/tests
Package expandTests contains common tests for expand stage.
|
Package expandTests contains common tests for expand stage. |
03group
Package group represents group stage of analysis.
|
Package group represents group stage of analysis. |
03group/groupInDB
Package groupInDB is the implementation of the group stage that runs in database.
|
Package groupInDB is the implementation of the group stage that runs in database. |
03group/groupLocally
Package groupLocally is the implementation of the group stage that runs locally.
|
Package groupLocally is the implementation of the group stage that runs locally. |
03group/tests
Package groupTests contains common tests for group stage.
|
Package groupTests contains common tests for group stage. |
04merge
Package merge represents merge stage of analysis.
|
Package merge represents merge stage of analysis. |
04merge/mergeInDB
Package mergeInDB is the implementation of the merge stage that runs in database.
|
Package mergeInDB is the implementation of the merge stage that runs in database. |
04merge/mergeLocally
Package mergeLocally is the implementation of the merge stage that runs locally.
|
Package mergeLocally is the implementation of the merge stage that runs locally. |
04merge/tests
Package mergeTests contains common tests for merge stage.
|
Package mergeTests contains common tests for merge stage. |