pdk

package module
v0.8.0 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Feb 14, 2018 License: BSD-3-Clause Imports: 25 Imported by: 23

README

pdk

Pilosa Dev Kit - implementation tooling and use case examples are here!

Documentation is here: https://www.pilosa.com/docs/pdk/

Requirements

Installing

We assume you are on a UNIX-like operating system. Otherwise adapt the following instructions for your platform. We further assume that you have a Go development environment set up. You should have $GOPATH/bin on your $PATH for access to installed binaries.

  • go get github.com/pilosa/pdk
  • cd $GOPATH/src/github.com/pilosa/pdk
  • make install

Taxi usecase

To get started immediately, run this:

pdk taxi

This will create and fill an index called taxi, using the short url list in usecase/taxi/urls-short.txt.

If you want to try out the full data set, run this:

pdk taxi -i taxi-big -f usecase/taxi/greenAndYellowUrls.txt

There are a number of other options you can tweak to affect the speed and memory usage of the import (or point it to a remote pilosa instance). Use pdk taxi -help to see all the options.

Note that this url file represents 1+ billion columns of data - depending on your hardware this will probably take well over 3 hours, and consume quite a bit of memory (and CPU). You can make a file with fewer URLs if you just want to get a sample.

After importing, you can try a few example queries at https://github.com/alanbernstein/pilosa-notebooks/blob/master/taxi-use-case.ipynb .

Net usecase

To get started immediately, run this:

pdk net -i en0

which will capture traffic on the interface en0 (see available interfaces with ifconfig).

SSB

The Star Schema Benchmark is a benchmark based on TPC-H but tweaked for a somewhat difference use case. It has been implemented by some big data projects such as https://hortonworks.com/blog/sub-second-analytics-hive-druid/ .

To execute the star schema benchmark with Pilosa, you must.

  1. Generate the SSB data at a particular scale factor.
  2. Import the data into Pilosa.
  3. Run the demo-ssb application for convenience which has all of the SSB queries pre-written.
Generating SSB data

Use https://github.com/electrum/ssb-dbgen.git to generate the raw SSB data. This can be a bit finicky to work with - hit up @tgruben for tips (or maybe he'll update this section 😉.

When generating the data, you have to select a particular scale factor - the size of the generated data will be about 600MB * SF(scale factor), so SF=100 will generate about 60GB of data.

Import data into Pilosa.

Use pdk ssb to import the data into Pilosa. You must specify the directory containing the .tbl files generated in the first step as well as the location of your pilosa cluster. There are a few other options which you can tweak which may help import performance. See pdk ssb -h for more information.

Run demo-ssb

This repo https://github.com/pilosa/demo-ssb.git contains a small Go program which packages up the different queries which comprise the benchmark. Running demo-ssb starts a web server which executes queries against pilosa on your behalf. You can simply run (e.g.) curl localhost:8000/query/1.1 to run an SSB query.

Documentation

Index

Constants

View Source
const (
	BYTE     = 1.0
	KILOBYTE = 1024 * BYTE
	MEGABYTE = 1024 * KILOBYTE
	GIGABYTE = 1024 * MEGABYTE
	TERABYTE = 1024 * GIGABYTE
)

Variables

This section is empty.

Functions

func GetFrames

func GetFrames(body []byte) ([]string, error)

GetFrames interprets body as pql queries and then tries to determine the frame of each. Some queries do not have frames, and the empty string will be returned for these.

func NewPilosaForwarder

func NewPilosaForwarder(phost string, t Translator) *pilosaForwarder

NewPilosaForwarder returns a new pilosaForwarder which forwards all requests to `phost`. It inspects pilosa responses and runs the row ids through the Translator `t` to translate them to whatever they were mapped from.

func NewPilosaProxy

func NewPilosaProxy(host string, client *http.Client) *pilosaProxy

NewPilosaProxy returns a pilosaProxy based on `host` and `client`.

func NexterStartFrom

func NexterStartFrom(s uint64) func(n *Nexter)

NexterStartFrom returns an option which makes a Nexter start from integer "s".

func StartMappingProxy

func StartMappingProxy(bind string, h http.Handler) error

StartMappingProxy listens for incoming http connections on `bind` and and uses h to handle all requests. This function does not return unless there is a problem (like http.ListenAndServe).

Types

type AttrMapper

type AttrMapper struct {
	Mapper  Mapper
	Parsers []Parser
	Fields  []int
}

AttrMapper is a struct for mapping some set of data fields to a value for sending to Pilosa as a SetColumnAttrs query

type BinaryFloatMapper

type BinaryFloatMapper struct {
	Min      float64
	Max      float64
	BitDepth int
	// contains filtered or unexported fields
}

BinaryFloatMapper is a Mapper for float types, mapping to a set of buckets representing the value in a binary sense

func (BinaryFloatMapper) ID

func (m BinaryFloatMapper) ID(fi ...interface{}) (rowIDs []int64, err error)

ID maps floats to binary bit sets

type BinaryIntMapper

type BinaryIntMapper struct {
	Min      int64
	Max      int64
	BitDepth int
	// contains filtered or unexported fields
}

BinaryIntMapper is a Mapper for int types, mapping to a set of buckets representing the value in a binary sense

func (BinaryIntMapper) ID

func (m BinaryIntMapper) ID(ii ...interface{}) (rowIDs []int64, err error)

ID maps floats to binary bit sets

type Bit

type Bit struct {
	// contains filtered or unexported fields
}

type BitMapper

type BitMapper struct {
	Frame   string
	Mapper  Mapper
	Parsers []Parser
	Fields  []int
}

BitMapper is a struct for mapping some set of data fields to a (frame, id) combination for sending to Pilosa as a SetBit query

type BoltTranslator

type BoltTranslator struct {
	Db *bolt.DB
	// contains filtered or unexported fields
}

BoltTranslator is a Translator which stores the two way val/id mapping in boltdb. It only accepts string values to map.

func NewBoltTranslator

func NewBoltTranslator(filename string, frames ...string) (bt *BoltTranslator, err error)

func (*BoltTranslator) BulkAdd

func (bt *BoltTranslator) BulkAdd(frame string, values [][]byte) error

func (*BoltTranslator) Close

func (bt *BoltTranslator) Close() error

func (*BoltTranslator) Get

func (bt *BoltTranslator) Get(frame string, id uint64) (val interface{})

Get returns the previously mapped value to the monotonic id generated from GetID. For BoltTranslator, val will always be a []byte.

func (*BoltTranslator) GetID

func (bt *BoltTranslator) GetID(frame string, val interface{}) (id uint64, err error)

GetID maps val (which must be a byte slice) to a monotonic id.

type BoolMapper

type BoolMapper struct {
}

BoolMapper is a trivial Mapper for boolean types

func (BoolMapper) ID

func (m BoolMapper) ID(bi ...interface{}) (rowIDs []int64, err error)

ID maps a bool to a rowID (identity mapper)

type BucketVLock

type BucketVLock struct {
	// contains filtered or unexported fields
}

func NewBucketVLock

func NewBucketVLock() BucketVLock

func (BucketVLock) Lock

func (b BucketVLock) Lock(val []byte)

func (BucketVLock) Unlock

func (b BucketVLock) Unlock(val []byte)

type Bytes

type Bytes uint64

func (Bytes) String

func (b Bytes) String() string

Returns a human-readable byte string of the form 10M, 12.5K, and so forth. The following units are available:

T: Terabyte
G: Gigabyte
M: Megabyte
K: Kilobyte
B: Byte

The unit that results in the smallest number greater than or equal to 1 is always chosen.

type ChanBitIterator

type ChanBitIterator chan pcli.Bit

func NewChanBitIterator

func NewChanBitIterator() ChanBitIterator

func (ChanBitIterator) NextBit

func (c ChanBitIterator) NextBit() (pcli.Bit, error)

type ChanValIterator

type ChanValIterator chan pcli.FieldValue

func NewChanValIterator

func NewChanValIterator() ChanValIterator

func (ChanValIterator) NextValue

func (c ChanValIterator) NextValue() (pcli.FieldValue, error)

type CustomMapper

type CustomMapper struct {
	Func   func(...interface{}) interface{}
	Mapper Mapper
}

CustomMapper is a Mapper that applies a function to a slice of fields, then applies a simple Mapper to the result of that, returning a rowID. This is a generic way to support mappings which span multiple fields. It is not supported by the importing config system.

func (CustomMapper) ID

func (m CustomMapper) ID(fields ...interface{}) (rowIDs []int64, err error)

ID maps a set of fields using a custom function

type DayOfMonthMapper

type DayOfMonthMapper struct {
}

func (DayOfMonthMapper) ID

func (m DayOfMonthMapper) ID(ti ...interface{}) (rowIDs []int64, err error)

ID maps a timestamp to a day of month bucket (1-31)

type DayOfWeekMapper

type DayOfWeekMapper struct {
}

DayOfWeekMapper is a Mapper for timestamps, mapping the day of week only

func (DayOfWeekMapper) ID

func (m DayOfWeekMapper) ID(ti ...interface{}) (rowIDs []int64, err error)

ID maps a timestamp to a day of week bucket

type Errors

type Errors []error

func (Errors) Error

func (errs Errors) Error() string

type FieldSpec

type FieldSpec struct {
	Name string
	Min  int
	Max  int
}

type FileFragment

type FileFragment struct {
	// contains filtered or unexported fields
}

func NewFileFragment

func NewFileFragment(f *os.File, startLoc, endLoc int64) (*FileFragment, error)

func SplitFileLines

func SplitFileLines(f *os.File, numParts int64) ([]*FileFragment, error)

func (*FileFragment) Close

func (ff *FileFragment) Close() error

func (*FileFragment) Read

func (ff *FileFragment) Read(b []byte) (n int, err error)

type FloatMapper

type FloatMapper struct {
	Buckets []float64 // slice representing bucket intervals [left0 left1 ... leftN-1 rightN-1]
	// contains filtered or unexported fields
}

FloatMapper is a Mapper for float types, mapping to arbitrary buckets

func (FloatMapper) ID

func (m FloatMapper) ID(fi ...interface{}) (rowIDs []int64, err error)

ID maps floats to arbitrary buckets

type FloatParser

type FloatParser struct {
}

FloatParser is a parser for float types

func (FloatParser) Parse

func (p FloatParser) Parse(field string) (result interface{}, err error)

Parse parses a float string to a float64 value

type FrameSpec

type FrameSpec struct {
	Name           string
	CacheType      pcli.CacheType
	CacheSize      uint
	InverseEnabled bool
	Fields         []FieldSpec
}

func NewFieldFrameSpec

func NewFieldFrameSpec(name string, min int, max int) FrameSpec

NewFieldFrameSpec creates a frame which is dedicated to a single BSI field which will have the same name as the frame

func NewRankedFrameSpec

func NewRankedFrameSpec(name string, size int) FrameSpec

type GridMapper

type GridMapper struct {
	Xmin float64
	Xmax float64
	Xres int64
	Ymin float64
	Ymax float64
	Yres int64
	// contains filtered or unexported fields
}

GridMapper is a Mapper for a 2D grid (e.g. small-scale latitude/longitude)

func (GridMapper) ID

func (m GridMapper) ID(xyi ...interface{}) (rowIDs []int64, err error)

ID maps pairs of floats to regular buckets

type GridToFloatMapper

type GridToFloatMapper struct {
	// contains filtered or unexported fields
}

func NewGridToFloatMapper

func NewGridToFloatMapper(gm GridMapper, lfm LinearFloatMapper, gridVals []float64) GridToFloatMapper

func (GridToFloatMapper) ID

func (m GridToFloatMapper) ID(vals ...interface{}) ([]int64, error)

type INexter

type INexter interface {
	Next() uint64
	Last() uint64
}

type IPParser

type IPParser struct {
}

IPParser is a parser for IP addresses

func (IPParser) Parse

func (p IPParser) Parse(field string) (result interface{}, err error)

Parse parses an IP string into a TODO

type ImportClient

type ImportClient struct {
	BufferSize int
	// contains filtered or unexported fields
}

func NewImportClient

func NewImportClient(host, index string, frames []string, bufsize int) *ImportClient

func (*ImportClient) Close

func (ic *ImportClient) Close()

func (*ImportClient) SetBit

func (ic *ImportClient) SetBit(rowID, columnID uint64, frame string)

func (*ImportClient) SetBitTimestamp

func (ic *ImportClient) SetBitTimestamp(rowID, columnID uint64, frame string, timestamp time.Time)

type Index

type Index struct {
	// contains filtered or unexported fields
}

func NewIndex

func NewIndex() *Index

func (*Index) AddBit

func (i *Index) AddBit(frame string, col uint64, row uint64)

func (*Index) AddValue

func (i *Index) AddValue(frame, field string, col uint64, val int64)

func (*Index) Client

func (i *Index) Client() *pcli.Client

func (*Index) Close

func (i *Index) Close() error

type Indexer

type Indexer interface {
	AddBit(frame string, col uint64, row uint64)
	AddValue(frame, field string, col uint64, val int64)
	Close() error
	Client() *pcli.Client
}

func SetupPilosa

func SetupPilosa(hosts []string, index string, frames []FrameSpec, batchsize uint) (Indexer, error)

type Ingester

type Ingester struct {
	ParseConcurrency int
	// contains filtered or unexported fields
}

func NewIngester

func NewIngester(source Source, parser Parrrser, mapper Mapppper, indexer Indexer) *Ingester

func (*Ingester) Run

func (n *Ingester) Run() error

type IntMapper

type IntMapper struct {
	Min int64
	Max int64
	Res int64 // number of bins
	// contains filtered or unexported fields
}

IntMapper is a Mapper for integer types, mapping each int in the range to a row

func (IntMapper) ID

func (m IntMapper) ID(ii ...interface{}) (rowIDs []int64, err error)

ID maps an int range to a rowID range

type IntParser

type IntParser struct {
}

IntParser is a parser for integer types

func (IntParser) Parse

func (p IntParser) Parse(field string) (result interface{}, err error)

Parse parses an integer string to an int64 value

type KeyMapper

type KeyMapper interface {
	MapRequest(body []byte) ([]byte, error)
	MapResult(frame string, res interface{}) (interface{}, error)
}

KeyMapper describes the functionality for mapping the keys contained in requests and responses.

type LevelTranslator

type LevelTranslator struct {
	// contains filtered or unexported fields
}

LevelTranslator is a Translator which stores the two way val/id mapping in leveldb.

func NewLevelTranslator

func NewLevelTranslator(dirname string, frames ...string) (lt *LevelTranslator, err error)

func (*LevelTranslator) Close

func (lt *LevelTranslator) Close() error

func (*LevelTranslator) Get

func (lt *LevelTranslator) Get(frame string, id uint64) (val interface{})

func (*LevelTranslator) GetID

func (lt *LevelTranslator) GetID(frame string, val interface{}) (id uint64, err error)

type LinearFloatMapper

type LinearFloatMapper struct {
	Min   float64
	Max   float64
	Res   float64
	Scale string // linear, logarithmic
	// contains filtered or unexported fields
}

LinearFloatMapper is a Mapper for float types, mapping to regularly spaced buckets TODO: consider defining this in terms of a linear mapping ID = floor(a*value + b)

func (LinearFloatMapper) ID

func (m LinearFloatMapper) ID(fi ...interface{}) (rowIDs []int64, err error)

ID maps floats to regularly spaced buckets

type Mapper

type Mapper interface {
	ID(...interface{}) ([]int64, error)
}

Mapper represents a single method for mapping a specific data type to a slice of row IDs. A data type might be composed of multiple fields (e.g. a 2D point). A data type might use multiple mappers.

type Mapppper

type Mapppper interface {
	Map(parsedRecord interface{}) (PilosaRecord, error)
}

Mapper is the interface for taking parsed records from the Parser and figuring out what bits and values to set in Pilosa. Mappers usually have a Translator and a Nexter for converting arbitrary values to monotonic integer ids and generating column ids respectively.

type MonthMapper

type MonthMapper struct {
}

MonthMapper is a Mapper for timestamps, mapping the month only

func (MonthMapper) ID

func (m MonthMapper) ID(ti ...interface{}) (rowIDs []int64, err error)

ID maps a timestamp to a month bucket (1-12)

type Nexter

type Nexter struct {
	// contains filtered or unexported fields
}

Nexter is a threadsafe monotonic unique id generator

func NewNexter

func NewNexter(opts ...NexterOption) *Nexter

NewNexter creates a new id generator starting at 0 - can be modified by options.

func (*Nexter) Last

func (n *Nexter) Last() (lastID uint64)

Last returns the most recently generated id

func (*Nexter) Next

func (n *Nexter) Next() (nextID uint64)

Next generates a new id and returns it

type NexterOption

type NexterOption func(n *Nexter)

NexterOption can be passed to NewNexter to modify the Nexter's behavior.

type Parrrser

type Parrrser interface {
	Parse(record []byte) (interface{}, error)
}

Parser is the interface for turning raw records from Source into Go objects. Implementations of Parser should be thread safe. The current naming is a temporary workaround until the previous Parser interface can be deprecated.

type Parser

type Parser interface {
	Parse(string) (interface{}, error)
}

Parser represents a single method for parsing a string field to a value

type PilosaImporter

type PilosaImporter interface {
	SetBit(rowID, columnID uint64, frame string)
	SetBitTimestamp(rowID, columnID uint64, frame string, timestamp time.Time)
	Close()
}

type PilosaKeyMapper

type PilosaKeyMapper struct {
	// contains filtered or unexported fields
}

PilosaKeyMapper implements the KeyMapper interface.

func NewPilosaKeyMapper

func NewPilosaKeyMapper(t Translator) *PilosaKeyMapper

func (*PilosaKeyMapper) MapRequest

func (p *PilosaKeyMapper) MapRequest(body []byte) ([]byte, error)

func (*PilosaKeyMapper) MapResult

func (p *PilosaKeyMapper) MapResult(frame string, res interface{}) (mappedRes interface{}, err error)

MapResult converts the result of a single top level query (one element of QueryResponse.Results) to its mapped counterpart.

type PilosaRecord

type PilosaRecord struct {
	Col  uint64
	Bits []SetBit
	Vals []Val
}

PilosaRecord represents a number of set bits and values in a single Column in Pilosa.

type Point

type Point struct {
	X float64
	Y float64
}

Point is a point in a 2D space

type Proxy

type Proxy interface {
	ProxyRequest(orig *http.Request, origbody []byte) (*http.Response, error)
}

Proxy describes the functionality for proxying requests.

type Region

type Region struct {
	Vertices []Point
}

Region is a simple polygonal region of R2 space

type RegionMapper

type RegionMapper struct {
	Regions []Region
	// contains filtered or unexported fields
}

RegionMapper is a Mapper for a set of geometric regions (e.g. neighborhoods or states) TODO: generate regions by reading shapefile

type SetBit

type SetBit struct {
	Frame string
	Row   uint64
	Time  time.Time
}

SetBit represents a bit to set in Pilosa sans column id (which is held by the PilosaRecord containg the SetBit).

type SingleVLock

type SingleVLock struct {
	// contains filtered or unexported fields
}

func NewSingleVLock

func NewSingleVLock() SingleVLock

func (SingleVLock) Lock

func (s SingleVLock) Lock(val []byte)

func (SingleVLock) Unlock

func (s SingleVLock) Unlock(val []byte)

type Source

type Source interface {
	Record() ([]byte, error)
}

Source is the interface for getting raw data one record at a time. Implementations of Source should be thread safe.

type SparseIntMapper

type SparseIntMapper struct {
	Min int64
	Max int64
	Map map[int64]int64
	// contains filtered or unexported fields
}

SparseIntMapper is a Mapper for integer types, mapping only relevant ints

func (SparseIntMapper) ID

func (m SparseIntMapper) ID(ii ...interface{}) (rowIDs []int64, err error)

ID maps arbitrary ints to a rowID range

type StringContainsMapper

type StringContainsMapper struct {
	Matches []string // slice of strings to check for containment
}

StringContainsMapper is a Mapper for string types...

type StringMatchesMapper

type StringMatchesMapper struct {
	Matches []string // slice of strings to check for match
}

StringMatchesMapper is a Mapper for string types...

type StringParser

type StringParser struct {
}

StringParser is a parser for string types

func (StringParser) Parse

func (p StringParser) Parse(field string) (result interface{}, err error)

Parse is an identity parser for strings

type TimeOfDayMapper

type TimeOfDayMapper struct {
	Res int64
}

TimeOfDayMapper is a Mapper for timestamps, mapping the time component only TODO: consider putting all time buckets in same frame pros: single frame cons: would have to abandon the simple ID interface. also single frame may not be a good thing

func (TimeOfDayMapper) ID

func (m TimeOfDayMapper) ID(ti ...interface{}) (rowIDs []int64, err error)

ID maps a timestamp to a time of day bucket

type TimeParser

type TimeParser struct {
	Layout string
}

TimeParser is a parser for timestamps

func (TimeParser) Parse

func (p TimeParser) Parse(field string) (result interface{}, err error)

Parse parses a timestamp string to a time.Time value

type Translator

type Translator interface {
	// Get must be safe for concurrent access
	Get(frame string, id uint64) interface{}
	GetID(frame string, val interface{}) (uint64, error)
}

Translator describes the functionality which the proxy server requires to translate row ids to what they were mapped from.

type Val

type Val struct {
	Frame string
	Field string
	Value int64
}

Val represents a BSI value to set in a Pilosa field sans column id (which is held by the PilosaRecord containg the Val).

type ValueLocker

type ValueLocker interface {
	Lock(val []byte)
	Unlock(val []byte)
}

type YearMapper

type YearMapper struct {
	MinYear int64 // TODO? use this to eliminate empty rows for year < 2000 or whatever
}

YearMapper is a Mapper for timestamps, mapping the year only

func (YearMapper) ID

func (m YearMapper) ID(ti ...interface{}) (rowIDs []int64, err error)

ID maps a timestamp to a year bucket

Directories

Path Synopsis
cmd
pdk
usecase
ssb

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL