common

package
v0.0.2 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Sep 5, 2019 License: Apache-2.0 Imports: 18 Imported by: 27

Documentation

Index

Constants

View Source
const (
	BucketSize = 8

	// number of hash functions
	NumHashes = 1 << log2NumHashes

	RecordIDBytes = int(unsafe.Sizeof(RecordID{}))
)
View Source
const SizeOfGeoPoint = unsafe.Sizeof(GeoPointGo{})

SizeOfGeoPoint is the size of GeoPointGo in memory

View Source
const VectorPartyHeader uint32 = 0xFADEFACE

VectorPartyHeader is the magic header written into the beginning of each vector party file.

Variables

DataTypeName returns the literal name of the data type.

View Source
var NullDataValue = DataValue{}

NullDataValue is a global data value that stands a null value where the newly added columns haven't received any data.

StringToDataType maps string representation to DataType

Functions

func AdditionUpdate added in v0.0.2

func AdditionUpdate(oldValue, newValue unsafe.Pointer, dataType DataType)

func ArrayLengthCompare added in v0.0.2

func ArrayLengthCompare(v1, v2 *DataValue) int

ArrayLengthCompare compare

func ArrayValueFromArray added in v0.0.2

func ArrayValueFromArray(value []interface{}, dataType DataType) (interface{}, error)

ArrayValueFromArray convert any array to array of sepecified item data type

func ArrayValueFromString added in v0.0.2

func ArrayValueFromString(value string, dataType DataType) (interface{}, error)

ArrayValueFromString convert string to array of sepecified item data type we can support json formatted string like: string array: "[\"11\",\"12\",\"13\"]" integer array: "[11,12,13]" string array of uuid: "[\"1e88a975-3d26-4277-ace9-bea91b072977\",\"1e88a975-3d26-4277-ace9-bea91b072978\",\"1e88a975-3d26-4277-ace9-bea91b072979\"]" string array of geo-point: "[\"Point(180.0, 90.0)\",\"Point(179.0, 89.0)\",\"Point(178.0, 88.0)\"]" we can also support comma delimited string value as long as the item not contain comma "11,12,13"

func CalculateListElementBytes added in v0.0.2

func CalculateListElementBytes(dataType DataType, length int) int

CalculateListElementBytes returns the total size in bytes needs to be allocated for a list type column for a single row along with the validity vector start.

func CalculateListNilOffset added in v0.0.2

func CalculateListNilOffset(dataType DataType, length int) int

func ColumnHeaderSize

func ColumnHeaderSize(numCols int) int

ColumnHeaderSize returns the total size of the column headers.

func CompareArray added in v0.0.2

func CompareArray(dataType DataType, a, b unsafe.Pointer) int

CompareArray compare array values, the main purpose of this comparsion is for equal comparison larger/less comparision may not be accurate

func CompareBool

func CompareBool(a, b bool) int

CompareBool compares boolean value

func CompareFloat32

func CompareFloat32(a, b unsafe.Pointer) int

CompareFloat32 compares float32 value

func CompareGeoPoint added in v0.0.2

func CompareGeoPoint(a, b unsafe.Pointer) int

CompareGeoPoint compare GeoPoint Values

func CompareInt16

func CompareInt16(a, b unsafe.Pointer) int

CompareInt16 compares int16 value

func CompareInt32

func CompareInt32(a, b unsafe.Pointer) int

CompareInt32 compares int32 value

func CompareInt64

func CompareInt64(a, b unsafe.Pointer) int

CompareInt64 compares int64 value

func CompareInt8

func CompareInt8(a, b unsafe.Pointer) int

CompareInt8 compares int8 value

func CompareUUID added in v0.0.2

func CompareUUID(a, b unsafe.Pointer) int

CompareUUID compare UUID values

func CompareUint16

func CompareUint16(a, b unsafe.Pointer) int

CompareUint16 compares uint16 value

func CompareUint32

func CompareUint32(a, b unsafe.Pointer) int

CompareUint32 compares uint32 value

func CompareUint8

func CompareUint8(a, b unsafe.Pointer) int

CompareUint8 compares uint8 value

func ConvertToArrayValue added in v0.0.2

func ConvertToArrayValue(dataType DataType, value interface{}) (interface{}, error)

ConvertToArrayValue convert input to ArrayValue at best effort

func ConvertToBool

func ConvertToBool(value interface{}) (bool, bool)

ConvertToBool convert input into bool at best effort

func ConvertToFloat32

func ConvertToFloat32(value interface{}) (float32, bool)

ConvertToFloat32 convert input into float32 at best effort

func ConvertToFloat64

func ConvertToFloat64(value interface{}) (float64, bool)

ConvertToFloat64 convert input into float64 at best effort

func ConvertToGeoPoint

func ConvertToGeoPoint(value interface{}) ([2]float32, bool)

ConvertToGeoPoint convert input into uuid type ([2]float32) at best effort

func ConvertToInt16

func ConvertToInt16(value interface{}) (int16, bool)

ConvertToInt16 convert input into int16 at best effort

func ConvertToInt32

func ConvertToInt32(value interface{}) (int32, bool)

ConvertToInt32 convert input into int32 at best effort

func ConvertToInt64

func ConvertToInt64(value interface{}) (int64, bool)

ConvertToInt64 convert input into int64 at best effort

func ConvertToInt8

func ConvertToInt8(value interface{}) (int8, bool)

ConvertToInt8 convert input into int8 at best effort

func ConvertToUUID

func ConvertToUUID(value interface{}) ([2]uint64, bool)

ConvertToUUID convert input into uuid type ([2]uint64) at best effort

func ConvertToUint16

func ConvertToUint16(value interface{}) (uint16, bool)

ConvertToUint16 convert input into uint16 at best effort

func ConvertToUint32

func ConvertToUint32(value interface{}) (uint32, bool)

ConvertToUint32 convert input into uint32 at best effort

func ConvertToUint64

func ConvertToUint64(value interface{}) (uint64, bool)

ConvertToUint64 convert input into uint64 at best effort

func ConvertToUint8

func ConvertToUint8(value interface{}) (uint8, bool)

ConvertToUint8 convert input into uint8 at best effort

func ConvertValueForType added in v0.0.2

func ConvertValueForType(dataType DataType, value interface{}) (interface{}, error)

ConvertValueForType converts data value based on data type

func DataTypeBits

func DataTypeBits(dataType DataType) int

DataTypeBits returns the number of bits of a data type.

func DataTypeBytes

func DataTypeBytes(dataType DataType) int

DataTypeBytes returns how many bytes a value of the data type occupies.

func GeoPointFromString

func GeoPointFromString(str string) (point [2]float32, err error)

GeoPointFromString convert string to geopoint we support wkt format, eg. Point(lng,lat) Inside AresDB system we store lat,lng format

func GetBool added in v0.0.2

func GetBool(baseAddr uintptr, i int) bool

GetBool reads the value as bool at ith position from baseAddr.

func GetPrimaryKeyBytes added in v0.0.2

func GetPrimaryKeyBytes(primaryKeyValues []DataValue, keyLength int) ([]byte, error)

GetPrimaryKeyBytes returns primary key bytes for a given row.

func GetValue added in v0.0.2

func GetValue(baseAddr uintptr, i int, dataType DataType) unsafe.Pointer

GetValue reads value as dataType at ith position from baseAddr

func IsArrayType added in v0.0.2

func IsArrayType(dataType DataType) bool

IsArrayType determins where a data type is Array

func IsEnumType added in v0.0.2

func IsEnumType(dataType DataType) bool

IsEnumType determines whether a data type is enum type

func IsGoType

func IsGoType(dataType DataType) bool

IsGoType determines whether a data type is golang type

func IsNumeric

func IsNumeric(dataType DataType) bool

IsNumeric determines whether a data type is numeric

func MarshalPrimaryKey added in v0.0.2

func MarshalPrimaryKey(pk PrimaryKey) ([]byte, error)

MarshalPrimaryKey marshals a PrimaryKey into json. We cannot define MarshalJson for PrimaryKey since pointer cannot be a receiver.

func MinMaxUpdate added in v0.0.2

func MinMaxUpdate(oldValue, newValue unsafe.Pointer, dataType DataType, cmpFunc CompareFunc, expectedRes int)

MinMaxUpdate update the old value if compareRes == expectedRes

func SetBool added in v0.0.2

func SetBool(baseAddr uintptr, i int, val bool)

SetBool sets the value as bool at ith position from baseAddr.

func SetValue added in v0.0.2

func SetValue(baseAddr uintptr, i int, val unsafe.Pointer, dataType DataType)

SetValue sets the value as dataType at ith position from baseAddr.

func VectorPartyEquals added in v0.0.2

func VectorPartyEquals(v1 VectorParty, v2 VectorParty) bool

VectorPartyEquals covers nil VectorParty compare

Types

type ArchiveVectorParty

type ArchiveVectorParty interface {
	VectorParty

	// Get cumulative count on specified offset
	GetCount(offset int) uint32
	// set cumulative count on specified offset
	SetCount(offset int, count uint32)

	// Pin archive vector party for use
	Pin()
	// Release pin
	Release()
	// WaitForUsers Wait/Check whether all users finished
	// batch lock needs to be held before calling if blocking wait
	// eg.
	// 	batch.Lock()
	// 	vp.WaitForUsers(true)
	// 	batch.Unlock()
	WaitForUsers(blocking bool) (usersDone bool)

	// CopyOnWrite copies vector party on write/update
	CopyOnWrite(batchSize int) ArchiveVectorParty
	// LoadFromDisk start loading vector party from disk,
	// this is a non-blocking operation
	LoadFromDisk(hostMemManager HostMemoryManager, diskStore diskstore.DiskStore, table string, shardID int, columnID, batchID int, batchVersion uint32, seqNum uint32)
	// WaitForDiskLoad waits for vector party disk load to finish
	WaitForDiskLoad()
	// Prune prunes vector party based on column mode to clean memory if possible
	Prune()

	// Slice vector party using specified value within [lowerBoundRow, upperBoundRow)
	SliceByValue(lowerBoundRow, upperBoundRow int, value unsafe.Pointer) (startRow int, endRow int, startIndex int, endIndex int)
	// Slice vector party to get [startIndex, endIndex) based on [lowerBoundRow, upperBoundRow)
	SliceIndex(lowerBoundRow, upperBoundRow int) (startIndex, endIndex int)
}

ArchiveVectorParty represents vector party in archive store

type ArrayValue added in v0.0.2

type ArrayValue struct {
	// item data type
	DataType DataType
	// item list
	Items []interface{}
}

Array value representation in Go for UpsertBatch

func NewArrayValue added in v0.0.2

func NewArrayValue(dataType DataType) *ArrayValue

NewArrayValue create a new ArrayValue instance

func (*ArrayValue) AddItem added in v0.0.2

func (av *ArrayValue) AddItem(item interface{})

AddItem add new item into array

func (*ArrayValue) GetLength added in v0.0.2

func (av *ArrayValue) GetLength() int

GetLength return item numbers for the array value

func (*ArrayValue) GetSerBytes added in v0.0.2

func (av *ArrayValue) GetSerBytes() int

GetSerBytes return the bytes will be used in upsertbatch serialized format

func (*ArrayValue) Write added in v0.0.2

func (av *ArrayValue) Write(writer *utils.BufferWriter) error

Write serialize data into writer Serialized Array data format: number of items: 4 bytes item values: per item bytes * number of items, align to byte item validity: 1 bit * number of items final align to 8 bytes

type ArrayValueReader added in v0.0.2

type ArrayValueReader struct {
	// contains filtered or unexported fields
}

ArrayValueReader is an aux class to reader item data from bytes buffer

func NewArrayValueReader added in v0.0.2

func NewArrayValueReader(dataType DataType, value unsafe.Pointer) *ArrayValueReader

NewArrayValueReader is to create ArrayValueReader to read from upsertbatch, which includes the item number

func (*ArrayValueReader) Get added in v0.0.2

func (reader *ArrayValueReader) Get(index int) unsafe.Pointer

Get returns the buffer pointer for the index-th item

func (*ArrayValueReader) GetBool added in v0.0.2

func (reader *ArrayValueReader) GetBool(index int) bool

GetBool returns bool value for Bool item type at index

func (*ArrayValueReader) GetBytes added in v0.0.2

func (reader *ArrayValueReader) GetBytes() int

GetBytes returns the bytes counts this value occopies

func (*ArrayValueReader) GetLength added in v0.0.2

func (reader *ArrayValueReader) GetLength() int

GetLength return item numbers inside the array

func (*ArrayValueReader) IsValid added in v0.0.2

func (reader *ArrayValueReader) IsValid(index int) bool

IsValid check if the item in index-th place is valid or not

type Batch added in v0.0.2

type Batch struct {
	// Batch mutex is locked in reader mode by queries during the entire transfer
	// to ensure row level consistent read. It is locked in writer mode only for
	// updates from ingestion, and for modifications to the columns slice itself
	// (e.g., adding new columns). Appends will update LastReadBatchID and
	// NumRecordsInLastWriteBatch to make newly added records visible only at the last
	// step, therefore the batch does not need to be locked for appends.
	// For sorted bathes this is also locked in writer mode for initiating loading
	// from disk (vector party creation, Loader/Users initialization), and for
	// vector party detaching during eviction.
	*sync.RWMutex
	// For live batches, index out of bound and nil VectorParty indicates
	// mode 0 for the corresponding VectorParty.
	// For archive batches, index out of bound and nil VectorParty indicates that
	// the corresponding VectorParty has not been loaded into memory from disk.
	Columns []VectorParty
}

Batch represents a sorted or live batch.

func (*Batch) Dump added in v0.0.2

func (b *Batch) Dump(file *os.File)

func (*Batch) Equals added in v0.0.2

func (b *Batch) Equals(other *Batch) bool

Equals check whether two batches are the same. Notes both batches should have all its columns loaded into memory before comparison. Therefore this function should be only called for unit test purpose.

func (*Batch) GetDataValue added in v0.0.2

func (b *Batch) GetDataValue(row, columnID int) DataValue

GetDataValue read value from underlying columns.

func (*Batch) GetDataValueWithDefault added in v0.0.2

func (b *Batch) GetDataValueWithDefault(row, columnID int, defaultValue DataValue) DataValue

GetDataValueWithDefault read value from underlying columns and if it's missing, it will return passed value instead.

func (*Batch) GetVectorParty added in v0.0.2

func (b *Batch) GetVectorParty(columnID int) VectorParty

GetVectorParty returns the VectorParty for the specified column from the batch. It requires the batch to be locked for reading.

func (*Batch) SafeDestruct added in v0.0.2

func (b *Batch) SafeDestruct()

SafeDestruct destructs all vector parties of this batch.

type BatchReader added in v0.0.2

type BatchReader interface {
	GetDataValue(row, columnID int) DataValue
	GetDataValueWithDefault(row, columnID int, defaultValue DataValue) DataValue
}

BatchReader defines the interface to retrieve a DataValue given a row index and column index.

type BootStrapToken added in v0.0.2

type BootStrapToken interface {
	// Call AcquireToken to reserve usage token before any data purge operation
	// when return result is true, then you can proceed to the purge operation and later call ReleaseToken to release the token
	// when return result is false, then some bootstrap work is going on, no purge operation is permitted
	AcquireToken(table string, shard uint32) bool
	// Call ReleaseToken wheneven you call AcquireToken with true return value to release token
	ReleaseToken(table string, shard uint32)
}

BootStrapToken used to Acqure/Release token during data purge operations

type CVectorParty

type CVectorParty interface {
	//Judge column mode
	JudgeMode() ColumnMode
	// Get column mode
	GetMode() ColumnMode
}

CVectorParty is vector party that is backed by c

type ColumnMemoryUsage

type ColumnMemoryUsage struct {
	Preloaded    uint `json:"preloaded"`
	NonPreloaded uint `json:"nonPreloaded"`
	Live         uint `json:"live"`
}

ColumnMemoryUsage contains column memory usage

type ColumnMode

type ColumnMode int

ColumnMode represents how many vectors a vector party may have. For live batch, it should always be 0,1 or 2. For sorted column of archive batch, it will be mode 0 or 3. For other columns of archive batch, it can be any of these four modes.

const (
	// AllValuesDefault (mode 0)
	AllValuesDefault ColumnMode = iota
	// AllValuesPresent (mode 1)
	AllValuesPresent
	// HasNullVector (mode 2)
	HasNullVector
	// HasCountVector (mode 3)
	HasCountVector
	// MaxColumnMode represents the upper limit of column modes
	MaxColumnMode
)

type ColumnUpdateMode

type ColumnUpdateMode int

ColumnUpdateMode represents how to update data from UpsertBatch

const (
	// UpdateOverwriteNotNull (default) will overwrite existing value if new value is NOT null, otherwise just skip
	UpdateOverwriteNotNull ColumnUpdateMode = iota
	// UpdateForceOverwrite will simply overwrite existing value even when new data is null
	UpdateForceOverwrite
	// UpdateWithAddition will add the existing value with new value if new value is not null, existing null value will be treated as 0 in Funculation
	UpdateWithAddition
	// UpdateWithMin will save the minimum of existing and new value if new value is not null, existing null value will be treated as MAX_INT in Funculation
	UpdateWithMin
	// UpdateWithMax will save the maximum of existing and new value if new value is not null, existing null value will be treated as MIN_INT in Funculation
	UpdateWithMax
	// MaxColumnUpdateMode is the current upper limit for column update modes
	MaxColumnUpdateMode
)

type CompareFunc

type CompareFunc func(a, b unsafe.Pointer) int

CompareFunc represents compare function

func GetCompareFunc

func GetCompareFunc(dataType DataType) CompareFunc

GetCompareFunc get the compare function for specific data type

type DataType

type DataType uint32

DataType is the type of value supported in AresDB.

const (
	Unknown   DataType = 0x00000000
	Bool      DataType = 0x00000001
	Int8      DataType = 0x00010008
	Uint8     DataType = 0x00020008
	Int16     DataType = 0x00030010
	Uint16    DataType = 0x00040010
	Int32     DataType = 0x00050020
	Uint32    DataType = 0x00060020
	Float32   DataType = 0x00070020
	SmallEnum DataType = 0x00080008
	BigEnum   DataType = 0x00090010
	UUID      DataType = 0x000a0080
	GeoPoint  DataType = 0x000b0040
	GeoShape  DataType = 0x000c0000
	Int64     DataType = 0x000d0040

	// array types
	ArrayBool      DataType = 0x01000001
	ArrayInt8      DataType = 0x01010008
	ArrayUint8     DataType = 0x01020008
	ArrayInt16     DataType = 0x01030010
	ArrayUint16    DataType = 0x01040010
	ArrayInt32     DataType = 0x01050020
	ArrayUint32    DataType = 0x01060020
	ArrayFloat32   DataType = 0x01070020
	ArraySmallEnum DataType = 0x01080008
	ArrayBigEnum   DataType = 0x01090010
	ArrayUUID      DataType = 0x010a0080
	ArrayGeoPoint  DataType = 0x010b0040
	ArrayInt64     DataType = 0x010d0040
)

The list of supported DataTypes. DataType & 0x0000FFFF: The width of the data type in bits, or width of the item data type for array. DataType & 0x00FF0000 >> 16: The base type of the data or array. DataType & 0x01000000 >> 24: Indicatation of arrary type, and item type is on DataType & 0x00FF0000 >> 16. DataType & 0xFE000000 >> 24: Reserved See https://github.com/uber/aresdb/wiki/redologs for more details.

func DataTypeForColumn added in v0.0.2

func DataTypeForColumn(column metaCom.Column) DataType

DataTypeForColumn returns the in memory data type for a column

func DataTypeFromString

func DataTypeFromString(str string) DataType

DataTypeFromString convert string representation of data type into DataType

func GetElementDataType added in v0.0.2

func GetElementDataType(dataType DataType) DataType

GetElementDataType retrieve item data type for Array DataType

func NewDataType

func NewDataType(value uint32) (DataType, error)

NewDataType converts an uint32 value into a DataType. It returns error if the the data type is invalid.

type DataValue

type DataValue struct {
	// Used for golang vector party
	GoVal    GoDataValue
	OtherVal unsafe.Pointer
	DataType DataType
	CmpFunc  CompareFunc
	Valid    bool

	IsBool  bool
	BoolVal bool
}

DataValue is the wrapper to encapsulate validity, bool value and other value type into a single struct to make it easier for value comparison.

func GetDataValue added in v0.0.2

func GetDataValue(col interface{}, columnIDInSchema int, columnType string) (DataValue, error)

GetDataValue returns the DataValue for the given column value.

func ValueFromString

func ValueFromString(str string, dataType DataType) (val DataValue, err error)

ValueFromString converts raw string value to actual value given input data type.

func (DataValue) Compare

func (v1 DataValue) Compare(v2 DataValue) int

Compare compares two value wrapper.

func (DataValue) ConvertToHumanReadable

func (v1 DataValue) ConvertToHumanReadable(dataType DataType) interface{}

ConvertToHumanReadable convert DataValue to meaningful golang data types

type EnumDict added in v0.0.2

type EnumDict struct {
	// Either 0x100 for small_enum, or 0x10000 for big_enum.
	Capacity    int            `json:"capacity"`
	Dict        map[string]int `json:"dict"`
	ReverseDict []string       `json:"reverseDict"`
}

EnumDict contains mapping from and to enum strings to numbers.

type EnumUpdater added in v0.0.2

type EnumUpdater interface {
	// UpdateEnum can update enum for one column
	UpdateEnum(table, column string, enumList []string) error
}

type GeoPointGo

type GeoPointGo [2]float32

GeoPointGo represents GeoPoint Golang Type

type GeoShapeGo

type GeoShapeGo struct {
	Polygons [][]GeoPointGo
}

GeoShapeGo represents GeoShape Golang Type

func ConvertToGeoShape

func ConvertToGeoShape(value interface{}) (*GeoShapeGo, bool)

ConvertToGeoShape converts the arbitrary value to GeoShapeGo

func GeoShapeFromString

func GeoShapeFromString(str string) (GeoShapeGo, error)

GeoShapeFromString convert string to geoshape Supported format POLYGON ((lng lat, lng lat, lng lat, ...), (...))

func (*GeoShapeGo) GetBytes

func (gs *GeoShapeGo) GetBytes() int

GetBytes implements GoDataValue interface

func (*GeoShapeGo) GetSerBytes

func (gs *GeoShapeGo) GetSerBytes() int

GetSerBytes implements GoDataValue interface

func (*GeoShapeGo) Read

func (gs *GeoShapeGo) Read(dataReader *utils.StreamDataReader) error

Read implements Read interface for GoDataValue

func (*GeoShapeGo) Write

func (gs *GeoShapeGo) Write(dataWriter *utils.StreamDataWriter) error

Write implements Read interface for GoDataValue

type GoDataValue

type GoDataValue interface {
	// GetBytes returns number of bytes copied in golang memory for this value
	GetBytes() int
	// GetSerBytes return the number of bytes required for serialize this value
	GetSerBytes() int
	Write(writer *utils.StreamDataWriter) error
	Read(reader *utils.StreamDataReader) error
}

GoDataValue represents a value backed in golang memory

func GetGoDataValue

func GetGoDataValue(dataType DataType) GoDataValue

GetGoDataValue return GoDataValue

type HostMemoryManager

type HostMemoryManager interface {
	ReportUnmanagedSpaceUsageChange(bytes int64)
	ReportManagedObject(table string, shard, batchID, columnID int, bytes int64)
	GetArchiveMemoryUsageByTableShard() (map[string]map[string]*ColumnMemoryUsage, error)
	TriggerEviction()
	TriggerPreload(tableName string, columnID int,
		oldPreloadingDays int, newPreloadingDays int)
	Start()
	Stop()
}

HostMemoryManager manages archive batch storage in host memory. Specifically, it keeps track of memory usage of archive batches and makes preloading and eviction decisions based on retention config.

The space available to archive batches is defined as maxMem - unmanagedMem, where unmanagedMem accounts for C allocated buffers in live batches and primary keys, which changes over time. Eviction of archive batches is configured at column level using two configs: preloadingDays and priorities.

Data eviction policy is defined as such: Always evict data not in preloading zone first; Preloading data won’t be evicted until all the non-preloading data are evicted and server is still in short of memory. For data within the same zone, eviction will happen based on column priority For data with same priority, eviction will happen based on data time, older data will be evicted first, for same old data, larger size columns will be evicted first;

HostMemoryManger will also maintain two go routines. One for preloading data and another for eviction. Calling start to start those goroutines and call stop to stop them. Stop is a blocking call.

Both TriggerPreload and TriggerEviction are asynchronous calls.

type HostVectorPartySlice

type HostVectorPartySlice struct {
	Values unsafe.Pointer
	Nulls  unsafe.Pointer
	// The length of the count vector is Length+1
	Counts       unsafe.Pointer
	Length       int
	ValueType    DataType
	DefaultValue DataValue

	ValueStartIndex int
	NullStartIndex  int
	CountStartIndex int

	ValueBytes int
	NullBytes  int
	CountBytes int
}

HostVectorPartySlice stores pointers to data for a column in host memory. And its start index and Bytes

type JobType

type JobType string

JobType now we only have archiving job type.

const (
	// ArchivingJobType is the archiving job type.
	ArchivingJobType JobType = "archiving"
	// BackfillJobType is the backfill job type.
	BackfillJobType JobType = "backfill"
	// SnapshotJobType is the snapshot job type.
	SnapshotJobType JobType = "snapshot"
	// PurgeJobType is the purge job type.
	PurgeJobType JobType = "purge"
)

type Key added in v0.0.2

type Key []byte

Key represents the key for the item

type ListVectorParty added in v0.0.2

type ListVectorParty interface {
	// GetElemCount is to get count of elements of n-th element in the VP
	GetElemCount(row int) uint32
	// GetListValue is to get the raw value of n-th element in the VP
	GetListValue(row int) (unsafe.Pointer, bool)
	// SetListValue is to set value for n-th element in the VP
	SetListValue(row int, val unsafe.Pointer, valid bool)
}

ListVectorParty is the interface for list vector party to read and write list value.

type LiveVectorParty

type LiveVectorParty interface {
	VectorParty

	// If we already know this is a bool vp, we can set bool directly without constructing a data value struct.
	SetBool(offset int, val bool, valid bool)
	// Set value via a unsafe.Pointer directly.
	SetValue(offset int, val unsafe.Pointer, valid bool)
	// Set go value directly.
	SetGoValue(offset int, val GoDataValue, valid bool)
	// Get value directly
	GetValue(offset int) (unsafe.Pointer, bool)
	// GetMinMaxValue get min and max value,
	// returns uint32 value since only valid for time column
	GetMinMaxValue() (min, max uint32)
}

LiveVectorParty represents vector party in live store

type Pinnable added in v0.0.2

type Pinnable struct {
	// Used in archive batches to allow requesters to wait until the vector party
	// is fully loaded from disk.
	Loader sync.WaitGroup
	// For archive store only. Number of users currently using this vector party.
	// This field is protected by the batch lock.
	Pins int
	// For archive store only. The condition for pins to drop down to 0.
	AllUsersDone *sync.Cond
}

Pinnable implements a vector party that support pin and release operations.

func (*Pinnable) Pin added in v0.0.2

func (vp *Pinnable) Pin()

Pin vector party for use, caller should lock archive batch before calling

func (*Pinnable) Release added in v0.0.2

func (vp *Pinnable) Release()

Release releases the vector party from the archive store so that it can be evicted or deleted.

func (*Pinnable) WaitForDiskLoad added in v0.0.2

func (vp *Pinnable) WaitForDiskLoad()

WaitForDiskLoad waits for vector party disk load to finish

func (*Pinnable) WaitForUsers added in v0.0.2

func (vp *Pinnable) WaitForUsers(blocking bool) bool

WaitForUsers wait for vector party user to finish and return true when all users are done

type PrimaryKey added in v0.0.2

type PrimaryKey interface {
	// Find looks up a value given key
	Find(key Key) (RecordID, bool)
	// FindOrInsert find or insert a key value pair into
	FindOrInsert(key Key, value RecordID, eventTime uint32) (existingFound bool, recordID RecordID, err error)
	// Update updates a key with a new recordID. Return whether key exists in the primary key or not.
	Update(key Key, value RecordID) bool
	// Delete deletes a key if it exists
	Delete(key Key)
	// Update the cutoff event time.
	UpdateEventTimeCutoff(eventTimeCutoff uint32)
	// GetEventTimeCutoff returns the cutoff event time.
	GetEventTimeCutoff() uint32
	// GetDataForTransfer locks the primary key for transferring data
	// the caller should unlock by calling  UnlockAfterTransfer when done
	LockForTransfer() PrimaryKeyData
	// UnlockAfterTransfer unlocks primary key
	UnlockAfterTransfer()
	// Destruct clean up all existing resources used by primary key
	Destruct()
	// Size returns the current number of items.
	Size() uint
	// Capacity returns how many items current primary key can hold.
	Capacity() uint
	// AllocatedBytes returns the size of primary key in bytes.
	AllocatedBytes() uint
}

PrimaryKey is an interface for primary key index

type PrimaryKeyData added in v0.0.2

type PrimaryKeyData struct {
	Data       unsafe.Pointer
	NumBytes   int
	Seeds      [NumHashes]uint32
	KeyBytes   int
	NumBuckets int
}

PrimaryKeyData holds the data for transferring to GPU for query purposes

type RecordID added in v0.0.2

type RecordID struct {
	BatchID int32  `json:"batchID"`
	Index   uint32 `json:"index"`
}

RecordID represents a record location with BatchID as the inflated vector id and offset determines the offset of record inside the vector

type SlicedVector

type SlicedVector struct {
	Values []interface{} `json:"values"`
	Counts []int         `json:"counts"`
}

SlicedVector is vector party data represented into human-readable slice format consists of a value slice and count slice, count slice consists of accumulative counts. swagger:model slicedVector

type TableSchema added in v0.0.2

type TableSchema struct {
	sync.RWMutex `json:"-"`
	// Main schema of the table. Mutable.
	Schema metaCom.Table `json:"schema"`
	// Maps from column names to their IDs. Mutable.
	ColumnIDs map[string]int `json:"columnIDs"`
	// Maps from enum column names to their case dictionaries. Mutable.
	EnumDicts map[string]EnumDict `json:"enumDicts"`
	// DataType for each column ordered by column ID. Mutable.
	ValueTypeByColumn []DataType `json:"valueTypeByColumn"`
	// Number of bytes in the primary key. Immutable.
	PrimaryKeyBytes int `json:"primaryKeyBytes"`
	// Types of each primary key column. Immutable.
	PrimaryKeyColumnTypes []DataType `json:"primaryKeyColumnTypes"`
	// Default values of each column. Mutable. Nil means default value is not set.
	DefaultValues []*DataValue `json:"-"`
}

TableSchema stores metadata of the table such as columns and primary keys. It also stores the dictionaries for enum columns.

func NewTableSchema added in v0.0.2

func NewTableSchema(table *metaCom.Table) *TableSchema

NewTableSchema creates a new table schema object from metaStore table object, this does not set enum cases.

func (*TableSchema) CreateEnumDict added in v0.0.2

func (t *TableSchema) CreateEnumDict(columnName string, enumCases []string)

createEnumDict creates the enum dictionary for the specified column with the specified initial cases, and attaches it to TableSchema object. Caller should acquire the schema lock before calling this function.

func (*TableSchema) GetArchivingSortColumns added in v0.0.2

func (t *TableSchema) GetArchivingSortColumns() []int

GetArchivingSortColumns makes a copy of the Schema.ArchivingSortColumns so callers don't have to hold a read lock to access it.

func (*TableSchema) GetColumnDeletions added in v0.0.2

func (t *TableSchema) GetColumnDeletions() []bool

GetColumnDeletions returns a boolean slice that indicates whether a column has been deleted. Callers need to hold a read lock.

func (*TableSchema) GetColumnIfNonNilDefault added in v0.0.2

func (t *TableSchema) GetColumnIfNonNilDefault() []bool

GetColumnIfNonNilDefault returns a boolean slice that indicates whether a column has non nil default value. Callers need to hold a read lock.

func (*TableSchema) GetPrimaryKeyColumns added in v0.0.2

func (t *TableSchema) GetPrimaryKeyColumns() []int

GetPrimaryKeyColumns makes a copy of the Schema.PrimaryKeyColumns so callers don't have to hold a read lock to access it.

func (*TableSchema) GetValueTypeByColumn added in v0.0.2

func (t *TableSchema) GetValueTypeByColumn() []DataType

GetValueTypeByColumn makes a copy of the ValueTypeByColumn so callers don't have to hold a read lock to access it.

func (*TableSchema) MarshalJSON added in v0.0.2

func (t *TableSchema) MarshalJSON() ([]byte, error)

MarshalJSON marshals TableSchema into json.

func (*TableSchema) SetDefaultValue added in v0.0.2

func (t *TableSchema) SetDefaultValue(columnID int)

SetDefaultValue parses the default value string if present and sets to TableSchema. Schema lock should be acquired and release by caller and enum dict should already be created/update before this function.

func (*TableSchema) SetTable added in v0.0.2

func (t *TableSchema) SetTable(table *metaCom.Table)

SetTable sets a updated table and update TableSchema, should acquire lock before calling.

type TableSchemaReader added in v0.0.2

type TableSchemaReader interface {
	// GetSchema returns schema for a table.
	GetSchema(table string) (*TableSchema, error)
	// GetSchemas returns all table schemas.
	GetSchemas() map[string]*TableSchema

	// Provide exclusive access to read/write data protected by MemStore.
	utils.RWLocker
}

type UpsertBatch added in v0.0.2

type UpsertBatch struct {
	// Number of rows in the batch, must be between 0 and 65535.
	NumRows int

	// Number of columns.
	NumColumns int

	// Arrival Time of Upsert Batch
	ArrivalTime uint32
	// contains filtered or unexported fields
}

UpsertBatch stores and indexes a serialized upsert batch of data on a particular table. It is used for both client-server data transfer and redo logging. In redo logs each batch is prepended by a 4-byte buffer size. The serialized buffer of the batch is in the following format:

[uint32] magic_number
[uint32] buffer_size

<begin of buffer>
[int32]  version_number
[int32]  num_of_rows
[uint16] num_of_columns
<reserve 14 bytes>
[uint32] arrival_time
[uint32] column_offset_0 ... [uint32] column_offset_x+1
[uint32] column_reserved_field1_0 ... [uint32] column_reserved_field1_x
[uint32] column_reserved_field2_0 ... [uint32] column_reserved_field2_x
[uint32] column_data_type_0 ... [uint32] column_data_type_x
[uint16] column_id_0 ... [uint16] column_id_x
[uint8] column_mode_0 ... [uint8] column_mode_x

(optional) [uint8] null_vector_0
(optional) [padding to 4 byte alignment uint32] offset_vector_0
[padding for 8 byte alignment] value_vector_0
...

[padding for 8 byte alignment]
<end of buffer>

Each component in the serialized buffer is byte aligned (not pointer aligned or bit aligned). All serialized numbers are written in little-endian. The struct is used for both client serialization and server deserialization. See https://github.com/uber/aresdb/wiki/redo_logs for more details.

Note: only fixed size values are supported currently.

func NewUpsertBatch added in v0.0.2

func NewUpsertBatch(buffer []byte) (*UpsertBatch, error)

NewUpsertBatch deserializes an upsert batch on the server. buffer does not contain the 4-byte buffer size.

func (*UpsertBatch) ExtractBackfillBatch added in v0.0.2

func (u *UpsertBatch) ExtractBackfillBatch(backfillRows []int) *UpsertBatch

ExtractBackfillBatch extracts given rows and stores in a new UpsertBatch The returned new UpsertBatch is not fully serialized and can only be used for structured reads.

func (*UpsertBatch) GetAlternativeBytes added in v0.0.2

func (u *UpsertBatch) GetAlternativeBytes() int

GetAlternativeBytes returns alternativeBytes

func (*UpsertBatch) GetBool added in v0.0.2

func (u *UpsertBatch) GetBool(row int, col int) (bool, bool, error)

GetBool returns the data (boolean type) stored at (row, col), and the validity of the value.

func (*UpsertBatch) GetBuffer added in v0.0.2

func (u *UpsertBatch) GetBuffer() []byte

GetBuffer returns the underline buffer used to construct the upsert batch.

func (*UpsertBatch) GetColumMode added in v0.0.2

func (u *UpsertBatch) GetColumMode(col int) ColumnMode

convenient function to get ColumnMode, assume no out of index

func (*UpsertBatch) GetColumnID added in v0.0.2

func (u *UpsertBatch) GetColumnID(col int) (int, error)

GetColumnID returns the logical id of a column.

func (*UpsertBatch) GetColumnIndex added in v0.0.2

func (u *UpsertBatch) GetColumnIndex(columnID int) (int, error)

GetColumnIndex returns the local index of a column given a logical index id.

func (*UpsertBatch) GetColumnLen added in v0.0.2

func (u *UpsertBatch) GetColumnLen() int

convenient function to get columns len

func (*UpsertBatch) GetColumnNames added in v0.0.2

func (u *UpsertBatch) GetColumnNames(schema *TableSchema) ([]string, error)

GetColumnNames reads columnNames in UpsertBatch, user should not lock schema

func (*UpsertBatch) GetColumnType added in v0.0.2

func (u *UpsertBatch) GetColumnType(col int) (DataType, error)

GetColumnType returns the data type of a column.

func (*UpsertBatch) GetColumnUpdateMode added in v0.0.2

func (u *UpsertBatch) GetColumnUpdateMode(col int) ColumnUpdateMode

convenient function to get ColumnUpdateMode, assume no out of index

func (*UpsertBatch) GetDataValue added in v0.0.2

func (u *UpsertBatch) GetDataValue(row, col int) (DataValue, error)

GetDataValue returns the DataValue for the given row and col index. It first check validity of the value, then it check whether it's a boolean column to decide whether to load bool value or other value type.

func (*UpsertBatch) GetEventColumnIndex added in v0.0.2

func (u *UpsertBatch) GetEventColumnIndex() int

GetEventColumnIndex returns the column index of event time

func (*UpsertBatch) GetPrimaryKeyBytes added in v0.0.2

func (u *UpsertBatch) GetPrimaryKeyBytes(row int, primaryKeyCols []int, keyLength int) ([]byte, error)

GetPrimaryKeyBytes returns primary key bytes for a given row. Note primaryKeyCol is not list of primary key columnIDs.

func (*UpsertBatch) GetPrimaryKeyCols added in v0.0.2

func (u *UpsertBatch) GetPrimaryKeyCols(primaryKeyColumnIDs []int) ([]int, error)

GetPrimaryKeyCols converts primary key columnIDs to cols in this upsert batch.

func (*UpsertBatch) GetValue added in v0.0.2

func (u *UpsertBatch) GetValue(row int, col int) (unsafe.Pointer, bool, error)

GetValue returns the data (fixed sized) stored at (row, col), including the pointer to the data, and the validity of the value.

func (*UpsertBatch) ReadData added in v0.0.2

func (u *UpsertBatch) ReadData(start int, length int) ([][]interface{}, error)

ReadData reads data from upsert batch and convert values to meaningful representations given data type.

func (*UpsertBatch) ReadGoValue added in v0.0.2

func (u *UpsertBatch) ReadGoValue(row, col int) GoDataValue

convenient function to get GoDataValue

type UpsertBatchBuilder

type UpsertBatchBuilder struct {
	NumRows int
	// contains filtered or unexported fields
}

UpsertBatchBuilder is the builder for constructing an UpsertBatch buffer. It allows random value write at (row, col).

func NewUpsertBatchBuilder

func NewUpsertBatchBuilder() *UpsertBatchBuilder

NewUpsertBatchBuilder creates a new builder for constructing an UpersetBatch.

func (*UpsertBatchBuilder) AddColumn

func (u *UpsertBatchBuilder) AddColumn(columnID int, dataType DataType) error

AddColumn add a new column to the builder. Initially, new columns have all values set to null.

func (*UpsertBatchBuilder) AddColumnWithUpdateMode

func (u *UpsertBatchBuilder) AddColumnWithUpdateMode(columnID int, dataType DataType, updateMode ColumnUpdateMode) error

AddColumnWithUpdateMode add a new column to the builder with update mode info. Initially, new columns have all values set to null.

func (*UpsertBatchBuilder) AddRow

func (u *UpsertBatchBuilder) AddRow()

AddRow increases the number of rows in the batch by 1. A new row with all nil values is appended to the row array.

func (*UpsertBatchBuilder) RemoveRow

func (u *UpsertBatchBuilder) RemoveRow()

RemoveRow decreases the number of rows in the batch by 1. The last row will be removed. It's a no-op if the number of rows is 0.

func (*UpsertBatchBuilder) ResetRows

func (u *UpsertBatchBuilder) ResetRows()

ResetRows reset the row count to 0.

func (*UpsertBatchBuilder) SetValue

func (u *UpsertBatchBuilder) SetValue(row int, col int, value interface{}) error

SetValue set a value to a given (row, col).

func (UpsertBatchBuilder) ToByteArray

func (u UpsertBatchBuilder) ToByteArray() ([]byte, error)

ToByteArray produces a serialized UpsertBatch in byte array.

type UpsertBatchHeader

type UpsertBatchHeader struct {
	// contains filtered or unexported fields
}

UpsertBatchHeader is a helper class used by upsert batch reader and writer to access the column header info.

func NewUpsertBatchHeader

func NewUpsertBatchHeader(buffer []byte, numCols int) UpsertBatchHeader

NewUpsertBatchHeader create upsert batch header from buffer

func (UpsertBatchHeader) ReadColumnFlag

func (u UpsertBatchHeader) ReadColumnFlag(col int) (ColumnMode, ColumnUpdateMode, error)

ReadColumnFlag returns the mode for a column.

func (UpsertBatchHeader) ReadColumnID

func (u UpsertBatchHeader) ReadColumnID(col int) (int, error)

ReadColumnID returns the logical ID for a column.

func (UpsertBatchHeader) ReadColumnOffset

func (u UpsertBatchHeader) ReadColumnOffset(col int) (int, error)

ReadColumnOffset takes col index from 0 to numCols + 1 and returns the value stored.

func (UpsertBatchHeader) ReadColumnType

func (u UpsertBatchHeader) ReadColumnType(col int) (DataType, error)

ReadColumnType returns the type for a column.

func (UpsertBatchHeader) ReadEnumDictLength added in v0.0.2

func (u UpsertBatchHeader) ReadEnumDictLength(col int) (int, error)

ReadEnumDictLength takes col index from 0 to numCols - 1 and returns the value stored.

func (*UpsertBatchHeader) WriteColumnFlag

func (u *UpsertBatchHeader) WriteColumnFlag(columnMode ColumnMode, columnUpdateMode ColumnUpdateMode, col int) error

WriteColumnFlag writes the mode of a column.

func (*UpsertBatchHeader) WriteColumnID

func (u *UpsertBatchHeader) WriteColumnID(value int, col int) error

WriteColumnID writes the id of a column.

func (*UpsertBatchHeader) WriteColumnOffset

func (u *UpsertBatchHeader) WriteColumnOffset(value int, col int) error

WriteColumnOffset writes the offset of a column. It can take col index from 0 to numCols + 1.

func (*UpsertBatchHeader) WriteColumnType

func (u *UpsertBatchHeader) WriteColumnType(value DataType, col int) error

WriteColumnType writes the type of a column.

func (*UpsertBatchHeader) WriteEnumDictLength added in v0.0.2

func (u *UpsertBatchHeader) WriteEnumDictLength(value int, col int) error

WriteEnumDictLength writes the offset of a column. It can take col index from 0 to numCols - 1.

type UpsertBatchVersion

type UpsertBatchVersion uint32

UpsertBatchVersion represents the version of upsert batch

const (
	V1 UpsertBatchVersion = 0xFEED0001
)

type ValueCountsUpdateMode

type ValueCountsUpdateMode int

ValueCountsUpdateMode represents the way we update value counts when we are writing values to vector parties.

const (
	// IgnoreCount skip setting value counts.
	IgnoreCount ValueCountsUpdateMode = iota
	// IncrementCount only increment count.
	IncrementCount
	// CheckExistingCount also check existing count.
	CheckExistingCount
)

type VectorParty

type VectorParty interface {
	//   allocate underlying storage for vector party
	Allocate(hasCount bool)

	// GetValidity get validity of given offset.
	GetValidity(offset int) bool
	// GetDataValue returns the DataValue for the specified index.
	// It first check validity of the value, then it check whether it's a
	// boolean column to decide whether to load bool value or other value
	// type. Index bound is not checked!
	GetDataValue(offset int) DataValue
	// SetDataValue writes a data value at given offset. Third parameter count should
	// only be passed for compressed columns. checkValueCount is a flag to tell whether
	// need to check value count (NonDefaultValueCount and ValidValueCount) while setting
	// the value. It should be true for archive store and false for live store. **This does
	// not set the count vector as this is not accumulated count.**
	SetDataValue(offset int, value DataValue, countsUpdateMode ValueCountsUpdateMode, counts ...uint32)
	// GetDataValueByRow returns the DataValue for the specified row. It will do binary
	// search on the count vector to find the correct offset if this is a mode 3 vector
	// party. Otherwise it will behave same as GetDataValue.
	// Caller needs to ensure row is within valid range.
	GetDataValueByRow(row int) DataValue

	GetDataType() DataType
	GetLength() int
	GetBytes() int64

	// Slice vector party into human readable SlicedVector format
	Slice(startRow, numRows int) SlicedVector

	// SafeDestruct destructs vector party memory
	SafeDestruct()

	// Write serialize vector party
	Write(writer io.Writer) error
	// Read deserialize vector party
	Read(reader io.Reader, serializer VectorPartySerializer) error
	// Check whether two vector parties are equal (used only in unit tests)
	Equals(other VectorParty) bool
	// GetNonDefaultValueCount get Number of non-default values stored
	GetNonDefaultValueCount() int
	// IsList tells whether it's a list vector party or not
	IsList() bool
	// AsList returns ListVectorParty representation of this vector party.
	// Caller should always call IsList before conversion, otherwise panic may happens
	// for incompatible vps.
	AsList() ListVectorParty
	// Dump is for testing purpose
	Dump(file *os.File)
}

VectorParty interface

type VectorPartySerializer

type VectorPartySerializer interface {
	// ReadVectorParty reads vector party from disk and set fields in passed-in vp.
	ReadVectorParty(vp VectorParty) error
	// WriteSnapshotVectorParty writes vector party to disk
	WriteVectorParty(vp VectorParty) error
	// CheckVectorPartySerializable check if the VectorParty is serializable
	CheckVectorPartySerializable(vp VectorParty) error
	// ReportVectorPartyMemoryUsage report memory usage according to underneath VectorParty property
	ReportVectorPartyMemoryUsage(bytes int64)
}

VectorPartySerializer is the interface to read/write a vector party from/to disk. Refer to https://github.com/uber/aresdb/wiki/VectorStore for more details about vector party's on disk format.

func NewVectorPartyArchiveSerializer added in v0.0.2

func NewVectorPartyArchiveSerializer(hostMemManager HostMemoryManager, diskStore diskstore.DiskStore, table string, shardID int,
	columnID int, batchID int, batchVersion uint32, seqNum uint32) VectorPartySerializer

NewVectorPartyArchiveSerializer returns a new VectorPartySerializer

func NewVectorPartySnapshotSerializer added in v0.0.2

func NewVectorPartySnapshotSerializer(hostMemeManager HostMemoryManager, diskStore diskstore.DiskStore, table string, shardID int,
	columnID, batchID int, batchVersion uint32, seqNum uint32, redoLogFile int64, offset uint32) VectorPartySerializer

NewVectorPartySnapshotSerializer returns a new VectorPartySerializer

Directories

Path Synopsis
Code generated by mockery v1.0.0.
Code generated by mockery v1.0.0.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL