analysis

package
v0.0.1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: May 27, 2017 License: GPL-3.0 Imports: 8 Imported by: 0

Documentation

Overview

Package analysis is used to assemble analysis from individual stages.

Index

Constants

View Source
const ArrayItemMark = "[]"

ArrayItemMark represents array item in full field name

View Source
const NameSeparator = "."

NameSeparator separates names of nested fields

Variables

View Source
var (
	BsonId                  string
	BsonFieldType           string
	BsonCount               string
	BsonCountUnique         string
	BsonValueExtremes       string
	BsonMinValue            string
	BsonMaxValue            string
	BsonAvgValue            string
	BsonLengthExtremes      string
	BsonMinLength           string
	BsonMaxLength           string
	BsonAvgLength           string
	BsonTopNValues          string
	BsonBottomNValues       string
	BsonValueFreqValue      string
	BsonValueFreqCount      string
	BsonValueHistogram      string
	BsonLengthHistogram     string
	BsonWeekdayHistogram    string
	BsonHourHistogram       string
	BsonHistogramStart      string
	BsonHistogramEnd        string
	BsonHistogramRange      string
	BsonHistogramStep       string
	BsonHistogramNumOfSteps string
	BsonHistogramIntervals  string
	BsonIntervalValue       string
	BsonIntervalCount       string

	JsonHistogramStart      string
	JsonHistogramEnd        string
	JsonHistogramRange      string
	JsonHistogramStep       string
	JsonHistogramNumOfSteps string
	JsonHistogramIntervals  string

	YamlHistogramStart      string
	YamlHistogramEnd        string
	YamlHistogramRange      string
	YamlHistogramStep       string
	YamlHistogramNumOfSteps string
	YamlHistogramIntervals  string
)

Abbreviations for aggregation pipeline.

View Source
var AggregationMinVersion = []int{3, 5, 6}

AggregationMinVersion is minimal MongoDB version that allows analysis using aggregation framework

View Source
var RandomSampleMinVersion = []int{3, 2, 0}

RandomSampleMinVersion is minimal MongoDB version that allows analysis using random samples

View Source
var TypesSort = map[string]int{
	"null":                1,
	"undefined":           2,
	"bool":                3,
	"int":                 4,
	"long":                5,
	"double":              6,
	"decimal":             7,
	"objectId":            8,
	"dbPointer":           9,
	"symbol":              10,
	"string":              11,
	"regex":               12,
	"javascript":          13,
	"javascriptWithScope": 14,
	"binData":             15,
	"date":                16,
	"timestamp":           17,
	"minKey":              18,
	"maxKey":              19,
	"array":               20,
	"object":              21,
}

TypesSort represents sort of types in final results.

Functions

func LinkStages

func LinkStages(stages []*Stage, options *Options) (*expr.Pipeline, chan<- []byte, interface{})

LinkStages links individual stages together.

Types

type Analysis

type Analysis struct {
	// contains filtered or unexported fields
}

Analysis consists of the options, target collection and the four contiguous stages.

func NewAnalysis

func NewAnalysis(options *Options) Analysis

NewAnalysis - analysis factory.

func (*Analysis) Run

func (a *Analysis) Run() interface{}

Run the analysis on the selected collection.

func (*Analysis) SetCollection

func (a *Analysis) SetCollection(c *mgo.Collection)

SetCollection set target collection.

func (*Analysis) SetExpandStage

func (a *Analysis) SetExpandStage(stage *Stage)

SetExpandStage set expand stage of analysis. Expand stage expands documents to value (result: [name, type] => value). Individual fields in this stage lose their link to the original document that allows analyzing all values of some field in next stages.

func (*Analysis) SetGroupStage

func (a *Analysis) SetGroupStage(stage *Stage)

SetGroupStage set group stage of analysis. Group stage group values from expand stage. The values are grouped under the same name and type (result: [name, type] => value aggregation). This stage counts all statistics above the data.

func (*Analysis) SetMergeStage

func (a *Analysis) SetMergeStage(stage *Stage)

SetMergeStage set merge stage of analysis. Merge stage merge different types of the same field (result: [name] => types aggregation).

func (*Analysis) SetSampleStage

func (a *Analysis) SetSampleStage(stage *Stage)

SetSampleStage set sample stage of analysis. Task of the sample stage is to select the desired sample of documents from the collection and pass them to next stages of analysis.

type Count

type Count uint

Count of intervals or values.

type Field

type Field struct {
	Name  string `json:"name"   yaml:"name"    bson:"n"`
	Level uint   `json:"level"  yaml:"level"`
	Count uint64 `json:"count"  yaml:"count"   bson:"c"`
	Types Types  `json:"types"  yaml:"types"   bson:"T"`
}

Field - analysis results for one document field.

type Fields

type Fields []*Field

Fields result of the analysis.

func (Fields) Len

func (r Fields) Len() int

func (Fields) Less

func (r Fields) Less(i, j int) bool

func (Fields) Swap

func (r Fields) Swap(i, j int)

type Histogram

type Histogram struct {
	Start         interface{} `json:"start"      yaml:"start"       bson:"sta"` // minimal value rounded down with Step
	End           interface{} `json:"end"        yaml:"end"         bson:"end"` // maximal value rounded up with Step
	Range         float64     `json:"range"      yaml:"range"       bson:"r"`   // (end-start) converted to float64
	Step          float64     `json:"step"       yaml:"step"        bson:"s"`   // size of one interval, rounded to 1, 0.5, 0.25, 0.2, 0.1, 0.05, 0.025, ...
	NumberOfSteps uint        `json:"numOfSteps" yaml:"numOfSteps"  bson:"ns"`  // total number of steps
	Intervals     Intervals   `json:"intervals"  yaml:"intervals"   bson:"it"`  // values, interval => count
}

Histogram of values from one specific type of specific document field.

func (Histogram) MarshalJSON

func (h Histogram) MarshalJSON() ([]byte, error)

MarshalJSON - convert intervals to []Count. Key is a interval and empty intervals are added.

func (Histogram) MarshalYAML

func (h Histogram) MarshalYAML() (interface{}, error)

MarshalYAML - convert intervals to []Count. Key is a interval and empty intervals are added.

func (*Histogram) SetBSON

func (h *Histogram) SetBSON(raw bson.Raw) error

SetBSON - handle specific types, such as bson.Decimal, ...

type HourHistogram

type HourHistogram [24]Count

HourHistogram (0 - 23).

func (HourHistogram) MarshalYAML

func (hh HourHistogram) MarshalYAML() (interface{}, error)

MarshalYAML - convert array to slice. yaml.Marshal from unknown reason can not handle array.

func (*HourHistogram) SetBSON

func (hh *HourHistogram) SetBSON(raw bson.Raw) error

SetBSON - parse histogram to array

type Interval

type Interval struct {
	Interval uint  `bson:"i"` // Interval in histogram. From 0 to (NumberOfSteps - 1)
	Count    Count `bson:"c"` // Number of items belonging to the interval
}

Interval and count of values that belong to it.

func (*Interval) SetBSON

func (i *Interval) SetBSON(raw bson.Raw) error

SetBSON - handle specific types, such as bson.Decimal, ...

type Intervals

type Intervals []*Interval

Intervals in histogram.

func (Intervals) Len

func (hv Intervals) Len() int

func (Intervals) Less

func (hv Intervals) Less(i, j int) bool

func (Intervals) Swap

func (hv Intervals) Swap(i, j int)

type LengthExtremes

type LengthExtremes struct {
	Min uint    `json:"min"           yaml:"min"             bson:"il"`
	Max uint    `json:"max"           yaml:"max"             bson:"al"`
	Avg float64 `json:"avg,omitempty" yaml:"avg,omitempty"   bson:"gl"`
}

LengthExtremes - min, Max, Avg length.

type Options

type Options struct {
	Location    *time.Location // time location for calculations with dates
	Concurrency int            // number of parallel processes for local calculations
	BufferSize  int            // buffer size between phases
	BatchSize   int            // number of documents in one batch from the database
}

Options for all stages of analysis.

type PipelineFactory

type PipelineFactory func(analysisOptions *Options) *expr.Pipeline

PipelineFactory generate pipeline according analysis options.

type Processor

type Processor func(inputCh interface{}, options *Options) interface{}

Processor function has a channel from the previous stage at its input. Return value is output channel that fed to the next stage. The input of the first stage is raw (binary) data from the database. The output of the last stage are the final results.

type Stage

type Stage struct {
	PipelineFactory PipelineFactory
	Processor       Processor
}

Stage can be represented by a pipeline that runs in the database or by a processor function that runs locally. If both are used, then the pipeline results are passed to the input of the processor function.

type Type

type Type struct {
	Name             string            `json:"type"                       yaml:"type"                        bson:"t"`
	Count            uint64            `json:"count"                      yaml:"count"                       bson:"c"`
	CountUnique      uint64            `json:"unique,omitempty"           yaml:"unique,omitempty"            bson:"cu,omitempty"`
	ValueExtremes    *ValueExtremes    `json:"value,omitempty"            yaml:"value,omitempty"             bson:"ve,omitempty"`
	LengthExtremes   *LengthExtremes   `json:"length,omitempty"           yaml:"length,omitempty"            bson:"le,omitempty"`
	TopNValues       ValueFreqSlice    `json:"top,omitempty"              yaml:"top,omitempty"               bson:"tv,omitempty"`
	BottomNValues    ValueFreqSlice    `json:"bottom,omitempty"           yaml:"bottom,omitempty"            bson:"bv,omitempty"`
	ValueHistogram   *Histogram        `json:"valueHistogram,omitempty"   yaml:"valueHistogram,omitempty"    bson:"vH,omitempty"`
	LengthHistogram  *Histogram        `json:"lengthHistogram,omitempty"  yaml:"lengthHistogram,omitempty"   bson:"lH,omitempty"`
	WeekdayHistogram *WeekdayHistogram `json:"weekdayHistogram,omitempty" yaml:"weekdayHistogram,omitempty"  bson:"wH,omitempty"`
	HourHistogram    *HourHistogram    `json:"hourHistogram,omitempty"    yaml:"hourHistogram,omitempty"     bson:"hH,omitempty"`
}

Type - analysis results for one type of the one document field.

type Types

type Types []*Type

Types of the one document field.

func (Types) Len

func (s Types) Len() int

func (Types) Less

func (s Types) Less(i, j int) bool

func (Types) Swap

func (s Types) Swap(i, j int)

type ValueExtremes

type ValueExtremes struct {
	Min interface{} `json:"min"           yaml:"min"             bson:"i"`
	Max interface{} `json:"max"           yaml:"max"             bson:"a"`
	Avg interface{} `json:"avg,omitempty" yaml:"avg,omitempty"   bson:"g"`
}

ValueExtremes - min, Max, Avg value.

type ValueFreq

type ValueFreq struct {
	Value interface{} `json:"value"        yaml:"value"           bson:"v"`
	Count Count       `json:"count"        yaml:"count"           bson:"c"`
}

ValueFreq - - frequency of one value occurrence.

type ValueFreqSlice

type ValueFreqSlice []ValueFreq

ValueFreqSlice - frequency of values occurrence.

type WeekdayHistogram

type WeekdayHistogram [7]Count

WeekdayHistogram (0=Sunday ... 6=Saturday).

func (WeekdayHistogram) MarshalYAML

func (wh WeekdayHistogram) MarshalYAML() (interface{}, error)

MarshalYAML - convert array to slice. yaml.Marshal from unknown reason can not handle array.

func (*WeekdayHistogram) SetBSON

func (wh *WeekdayHistogram) SetBSON(raw bson.Raw) error

SetBSON - parse histogram to array

Directories

Path Synopsis
stages
01sample
Package sample represents sample stage of analysis.
Package sample represents sample stage of analysis.
01sample/sampleInDB
Package sampleInDB is the implementation of the sampling stage.
Package sampleInDB is the implementation of the sampling stage.
01sample/tests
Package sampleTests contains common tests for sample stage.
Package sampleTests contains common tests for sample stage.
02expand
Package expand represents expand stage of analysis.
Package expand represents expand stage of analysis.
02expand/expandInDBCommon
Package expandInDBCommon contains common functions for expandInDBDepth and expandInDBSeq packages.
Package expandInDBCommon contains common functions for expandInDBDepth and expandInDBSeq packages.
02expand/expandInDBDepth
Package expandInDBDepth is the implementation of the expand stage that runs in database.
Package expandInDBDepth is the implementation of the expand stage that runs in database.
02expand/expandInDBSeq
Package expandInDBSeq is the implementation of the expand stage that runs in database.
Package expandInDBSeq is the implementation of the expand stage that runs in database.
02expand/expandLocally
Package expandLocally is the implementation of the expand stage that runs locally.
Package expandLocally is the implementation of the expand stage that runs locally.
02expand/tests
Package expandTests contains common tests for expand stage.
Package expandTests contains common tests for expand stage.
03group
Package group represents group stage of analysis.
Package group represents group stage of analysis.
03group/groupInDB
Package groupInDB is the implementation of the group stage that runs in database.
Package groupInDB is the implementation of the group stage that runs in database.
03group/groupLocally
Package groupLocally is the implementation of the group stage that runs locally.
Package groupLocally is the implementation of the group stage that runs locally.
03group/tests
Package groupTests contains common tests for group stage.
Package groupTests contains common tests for group stage.
04merge
Package merge represents merge stage of analysis.
Package merge represents merge stage of analysis.
04merge/mergeInDB
Package mergeInDB is the implementation of the merge stage that runs in database.
Package mergeInDB is the implementation of the merge stage that runs in database.
04merge/mergeLocally
Package mergeLocally is the implementation of the merge stage that runs locally.
Package mergeLocally is the implementation of the merge stage that runs locally.
04merge/tests
Package mergeTests contains common tests for merge stage.
Package mergeTests contains common tests for merge stage.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL