kv

package
v0.0.0-...-ff78b6e Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 6, 2023 License: Apache-2.0 Imports: 16 Imported by: 0

README

Ethdb package hold's bouquet of objects to access DB

Words "KV" and "DB" have special meaning here:

  • KV - key-value-style API to access data: let developer manage transactions, stateful cursors.
  • DB - object-oriented-style API to access data: Get/Put/Delete/WalkOverTable/MultiPut, managing transactions internally.

So, DB abstraction fits 95% times and leads to more maintainable code - because it looks stateless.

About "key-value-style": Modern key-value databases don't provide Get/Put/Delete methods, because it's very hard-drive-unfriendly - it pushes developers do random-disk-access which is order of magnitude slower than sequential read. To enforce sequential-reads - introduced stateful cursors/iterators - they intentionally look as file-api: open_cursor/seek/write_data_from_current_position/move_to_end/step_back/step_forward/delete_key_on_current_position/append.

Class diagram:

// This is not call graph, just show classes from low-level to high-level. 
// And show which classes satisfy which interfaces.

+-----------------------------------+   +-----------------------------------+ 
|  github.com/erigonteh/mdbx-go    |   | google.golang.org/grpc.ClientConn |                    
|  (app-agnostic MDBX go bindings)  |   | (app-agnostic RPC and streaming)  |
+-----------------------------------+   +-----------------------------------+
                  |                                      |
                  |                                      |
                  v                                      v
+-----------------------------------+   +-----------------------------------+
|       ethdb/kv_mdbx.go            |   |       ethdb/kv_remote.go          |                
|  (tg-specific MDBX implementaion) |   |   (tg-specific remote DB access)  |              
+-----------------------------------+   +-----------------------------------+
                  |                                      |
                  |                                      |
                  v                                      v    
+----------------------------------------------------------------------------------------------+
|                                       eth/kv_interface.go                                   |  
|         (Common KV interface. DB-friendly, disk-friendly, cpu-cache-friendly.                |
|           Same app code can work with local or remote database.                              |
|           Allows experiment with another database implementations.                           |
|          Supports context.Context for cancelation. Any operation can return error)           |
+----------------------------------------------------------------------------------------------+

Then:
turbo/snapshotsync/block_reader.go.go
erigon-lib/state/aggregator_v3.go

Then:
kv_temporal.go

ethdb.AbstractKV design:

  • InMemory, ReadOnly: NewMDBX().Flags(mdbx.ReadOnly).InMem().Open()

  • MultipleDatabases, Customization: NewMDBX().Path(path).WithBucketsConfig(config).Open()

  • 1 Transaction object can be used only within 1 goroutine.

  • Only 1 write transaction can be active at a time (other will wait).

  • Unlimited read transactions can be active concurrently (not blocked by write transaction).

  • Methods db.Update, db.View - can be used to open and close short transaction.

  • Methods Begin/Commit/Rollback - for long transaction.

  • it's safe to call .Rollback() after .Commit(), multiple rollbacks are also safe. Common transaction patter:

tx, err := db.Begin(true, ethdb.RW)
if err != nil {
    return err
}
defer tx.Rollback() // important to avoid transactions leak at panic or early return

// ... code which uses database in transaction
 
err := tx.Commit()
if err != nil {
    return err
}
  • No internal copies/allocations. It means: 1. app must copy keys/values before put to database. 2. Data after read from db - valid only during current transaction - copy it if plan use data after transaction Commit/Rollback.

  • Methods .Bucket() and .Cursor(), can’t return nil, can't return error.

  • Bucket and Cursor - are interfaces - means different classes can satisfy it: for example MdbxCursor and MdbxDupSortCursor classes satisfy it. If your are not familiar with "DupSort" concept, please read dupsort.md

  • If Cursor returns err!=nil then key SHOULD be != nil (can be []byte{} for example). Then traversal code look as:

for k, v, err := c.First(); k != nil; k, v, err = c.Next() {
if err != nil {
return err
}
// logic
}
  • Move cursor: cursor.Seek(key)

ethdb.Database design:

  • Allows pass multiple implementations
  • Allows traversal tables by db.Walk

ethdb.TxDb design:

  • holds inside 1 long-running transaction and 1 cursor per table
  • method Begin DOESN'T create new TxDb object, it means this object can be passed into other objects by pointer, and high-level app code can start/commit transactions when it needs without re-creating all objects which holds TxDb pointer.
  • This is reason why txDb.CommitAndBegin() method works: inside it creating new transaction object, pinter to TxDb stays valid.

How to dump/load table

Install all database tools: make db-tools

./build/bin/mdbx_dump -a <datadir>/erigon/chaindata | lz4 > dump.lz4
lz4 -d < dump.lz4 | ./build/bin/mdbx_load -an <datadir>/erigon/chaindata

How to get table checksum

./build/bin/mdbx_dump -s table_name <datadir>/erigon/chaindata | tail -n +4 | sha256sum # tail here is for excluding header 

Header example:
VERSION=3
geometry=l268435456,c268435456,u25769803776,s268435456,g268435456
mapsize=756375552
maxreaders=120
format=bytevalue
database=TBL0001
type=btree
db_pagesize=4096
duplicates=1
dupsort=1
HEADER=END

Documentation

Index

Constants

View Source
const (

	//HashedAccounts
	// key - address hash
	// value - account encoded for storage
	// Contains Storage:
	//key - address hash + incarnation + storage key hash
	//value - storage value(common.hash)
	HashedAccounts = "HashedAccount"
	HashedStorage  = "HashedStorage"
)
View Source
const (

	//key - contract code hash
	//value - contract code
	Code = "Code"

	//key - addressHash+incarnation
	//value - code hash
	ContractCode = "HashedCodeHash"

	// IncarnationMap for deleted accounts
	//key - address
	//value - incarnation of account when it was last deleted
	IncarnationMap = "IncarnationMap"

	//TEVMCode -
	//key - contract code hash
	//value - contract TEVM code
	ContractTEVMCode = "TEVMCode"
)
View Source
const (
	// DatabaseInfo is used to store information about data layout.
	DatabaseInfo = "DbInfo"

	// Naming:
	//   NeaderNumber - Ethereum-specific block number. All nodes have same BlockNum.
	//   NeaderID - auto-increment ID. Depends on order in which node see headers.
	//      Invariant: for all headers in snapshots Number == ID. It means no reason to store Num/ID for this headers in DB.
	//   Same about: TxNum/TxID, BlockNum/BlockID
	HeaderNumber    = "HeaderNumber"           // header_hash -> header_num_u64
	BadHeaderNumber = "BadHeaderNumber"        // header_hash -> header_num_u64
	HeaderCanonical = "CanonicalHeader"        // block_num_u64 -> header hash
	Headers         = "Header"                 // block_num_u64 + hash -> header (RLP)
	HeaderTD        = "HeadersTotalDifficulty" // block_num_u64 + hash -> td (RLP)

	BlockBody = "BlockBody" // block_num_u64 + hash -> block body

	// Naming:
	//  TxNum - Ethereum canonical transaction number - same across all nodes.
	//  TxnID - auto-increment ID - can be differrent across all nodes
	//  BlockNum/BlockID - same
	//
	// EthTx - stores all transactions of Canonical/NonCanonical/Bad blocks
	// TxnID (auto-increment ID) - means nodes in network will have different ID of same transactions
	// Snapshots (frozen data): using TxNum (not TxnID)
	//
	// During ReOrg - txs are not removed/updated
	//
	// Also this table has system-txs before and after block: if
	// block has no system-tx - records are absent, but TxnID increasing
	//
	// In Erigon3: table MaxTxNum storing TxNum (not TxnID). History/Indices are using TxNum (not TxnID).
	EthTx           = "BlockTransaction"        // tx_id_u64 -> rlp(tx)
	NonCanonicalTxs = "NonCanonicalTransaction" // tbl_sequence_u64 -> rlp(tx)
	MaxTxNum        = "MaxTxNum"                // block_number_u64 -> max_tx_num_in_block_u64

	Receipts = "Receipt"        // block_num_u64 -> canonical block receipts (non-canonical are not stored)
	Log      = "TransactionLog" // block_num_u64 + txId -> logs of transaction

	// Stores bitmap indices - in which block numbers saw logs of given 'address' or 'topic'
	// [addr or topic] + [2 bytes inverted shard number] -> bitmap(blockN)
	// indices are sharded - because some bitmaps are >1Mb and when new incoming blocks process it
	//	 updates ~300 of bitmaps - by append small amount new values. It cause much big writes (MDBX does copy-on-write).
	//
	// if last existing shard size merge it with delta
	// if serialized size of delta > ShardLimit - break down to multiple shards
	// shard number - it's biggest value in bitmap
	LogTopicIndex   = "LogTopicIndex"
	LogAddressIndex = "LogAddressIndex"

	// CallTraceSet is the name of the table that contain the mapping of block number to the set (sorted) of all accounts
	// touched by call traces. It is DupSort-ed table
	// 8-byte BE block number -> account address -> two bits (one for "from", another for "to")
	CallTraceSet = "CallTraceSet"
	// Indices for call traces - have the same format as LogTopicIndex and LogAddressIndex
	// Store bitmap indices - in which block number we saw calls from (CallFromIndex) or to (CallToIndex) some addresses
	CallFromIndex = "CallFromIndex"
	CallToIndex   = "CallToIndex"

	// Cumulative indexes for estimation of stage execution
	CumulativeGasIndex         = "CumulativeGasIndex"
	CumulativeTransactionIndex = "CumulativeTransactionIndex"

	TxLookup = "BlockTransactionLookup" // hash -> transaction/receipt lookup metadata

	ConfigTable = "Config" // config prefix for the db

	// Progress of sync stages: stageName -> stageData
	SyncStageProgress = "SyncStage"

	Clique             = "Clique"
	CliqueSeparate     = "CliqueSeparate"
	CliqueSnapshot     = "CliqueSnapshot"
	CliqueLastSnapshot = "CliqueLastSnapshot"

	// Proof-of-stake
	// Beacon chain head that is been executed at the current time
	CurrentExecutionPayload = "CurrentExecutionPayload"

	// NodeRecords stores P2P node records (ENR)
	NodeRecords = "NodeRecord"
	// Inodes stores P2P discovery service info about the nodes
	Inodes = "Inode"

	// Transaction senders - stored separately from the block bodies
	Senders = "TxSender" // block_num_u64 + blockHash -> sendersList (no serialization format, every 20 bytes is new sender)

	// headBlockKey tracks the latest know full block's hash.
	HeadBlockKey = "LastBlock"

	HeadHeaderKey = "LastHeader"

	// headBlockHash, safeBlockHash, finalizedBlockHash of the latest Engine API forkchoice
	LastForkchoice = "LastForkchoice"

	// TransitionBlockKey tracks the last proof-of-work block
	TransitionBlockKey = "TransitionBlock"

	// migrationName -> serialized SyncStageProgress and SyncStageUnwind buckets
	// it stores stages progress to understand in which context was executed migration
	// in case of bug-report developer can ask content of this bucket
	Migrations = "Migration"

	Sequence = "Sequence" // tbl_name -> seq_u64

	Epoch        = "DevEpoch"        // block_num_u64+block_hash->transition_proof
	PendingEpoch = "DevPendingEpoch" // block_num_u64+block_hash->transition_proof

	Issuance = "Issuance" // block_num_u64->RLP(issuance+burnt[0 if < london])

	StateAccounts   = "StateAccounts"
	StateStorage    = "StateStorage"
	StateCode       = "StateCode"
	StateCommitment = "StateCommitment"

	// BOR
	BorReceipts  = "BorReceipt"
	BorFinality  = "BorFinality"
	BorTxLookup  = "BlockBorTransactionLookup" // transaction_hash -> block_num_u64
	BorSeparate  = "BorSeparate"               // persisted snapshots of the Validator Sets, with their proposer priorities
	BorEvents    = "BorEvents"                 // event_id -> event_payload
	BorEventNums = "BorEventNums"              // block_num -> event_id (first event_id in that block)
	BorSpans     = "BorSpans"                  // span_id -> span (in JSON encoding)

	// Downloader
	BittorrentCompletion = "BittorrentCompletion"
	BittorrentInfo       = "BittorrentInfo"

	// Domains/Histry/InvertedIndices
	// Contants have "Tbl" prefix, to avoid collision with actual Domain names
	// This constants is very rarely used in APP, but Domain/History/Idx names are widely used
	TblAccountKeys        = "AccountKeys"
	TblAccountVals        = "AccountVals"
	TblAccountHistoryKeys = "AccountHistoryKeys"
	TblAccountHistoryVals = "AccountHistoryVals"
	TblAccountIdx         = "AccountIdx"

	TblStorageKeys        = "StorageKeys"
	TblStorageVals        = "StorageVals"
	TblStorageHistoryKeys = "StorageHistoryKeys"
	TblStorageHistoryVals = "StorageHistoryVals"
	TblStorageIdx         = "StorageIdx"

	TblCodeKeys        = "CodeKeys"
	TblCodeVals        = "CodeVals"
	TblCodeHistoryKeys = "CodeHistoryKeys"
	TblCodeHistoryVals = "CodeHistoryVals"
	TblCodeIdx         = "CodeIdx"

	TblCommitmentKeys        = "CommitmentKeys"
	TblCommitmentVals        = "CommitmentVals"
	TblCommitmentHistoryKeys = "CommitmentHistoryKeys"
	TblCommitmentHistoryVals = "CommitmentHistoryVals"
	TblCommitmentIdx         = "CommitmentIdx"

	TblLogAddressKeys = "LogAddressKeys"
	TblLogAddressIdx  = "LogAddressIdx"
	TblLogTopicsKeys  = "LogTopicsKeys"
	TblLogTopicsIdx   = "LogTopicsIdx"

	TblTracesFromKeys = "TracesFromKeys"
	TblTracesFromIdx  = "TracesFromIdx"
	TblTracesToKeys   = "TracesToKeys"
	TblTracesToIdx    = "TracesToIdx"

	Snapshots = "Snapshots" // name -> hash

	//State Reconstitution
	RAccountKeys = "RAccountKeys"
	RAccountIdx  = "RAccountIdx"
	RStorageKeys = "RStorageKeys"
	RStorageIdx  = "RStorageIdx"
	RCodeKeys    = "RCodeKeys"
	RCodeIdx     = "RCodeIdx"

	PlainStateR    = "PlainStateR"    // temporary table for PlainState reconstitution
	PlainStateD    = "PlainStateD"    // temporary table for PlainStare reconstitution, deletes
	CodeR          = "CodeR"          // temporary table for Code reconstitution
	CodeD          = "CodeD"          // temporary table for Code reconstitution, deletes
	PlainContractR = "PlainContractR" // temporary table for PlainContract reconstitution
	PlainContractD = "PlainContractD" // temporary table for PlainContract reconstitution, deletes

	// [slot] => [Beacon state]
	BeaconState = "BeaconState"
	// [slot] => [signature + block without execution payload]
	BeaconBlocks = "BeaconBlock"
	// [slot] => [attestation list (custom encoding)]
	Attestetations = "Attestetations"

	// [slot] => [Canonical block root]
	CanonicalBlockRoots = "CanonicalBlockRoots"
	// [Root (block root] => Slot
	BlockRootToSlot = "BlockRootToSlot"
	// [Block Root] => [State Root]
	BlockRootToStateRoot = "BlockRootToStateRoot"
	StateRootToBlockRoot = "StateRootToBlockRoot"

	BlockRootToBlockNumber = "BlockRootToBlockNumber"
	BlockRootToBlockHash   = "BlockRootToBlockHash"

	LastBeaconSnapshot    = "LastBeaconSnapshot"
	LastBeaconSnapshotKey = "LastBeaconSnapshotKey"

	// [Block Root] => [Parent Root]
	BlockRootToParentRoot = "BlockRootToParentRoot"

	HighestFinalized = "HighestFinalized" // hash -> transaction/receipt lookup metadata

	// BlockRoot => Beacon Block Header
	BeaconBlockHeaders = "BeaconBlockHeaders"

	// LightClientStore => LightClientStore object
	// LightClientFinalityUpdate => latest finality update
	// LightClientOptimisticUpdate => latest optimistic update
	LightClient = "LightClient"
	// Period (one every 27 hours) => LightClientUpdate
	LightClientUpdates = "LightClientUpdates"
	// Beacon historical data
	// ValidatorIndex => [Public Key]
	ValidatorPublicKeys = "ValidatorPublickeys"
)
View Source
const (
	RecentLocalTransaction = "RecentLocalTransaction" // sequence_u64 -> tx_hash
	PoolTransaction        = "PoolTransaction"        // txHash -> sender_id_u64+tx_rlp
	PoolInfo               = "PoolInfo"               // option_key -> option_value
)
View Source
const AccountChangeSet = "AccountChangeSet"

AccountChangeSet and StorageChangeSet - of block N store values of state before block N changed them. Because values "after" change stored in PlainState. Logical format:

key - blockNum_u64 + key_in_plain_state
value - value_in_plain_state_before_blockNum_changes

Example: If block N changed account A from value X to Y. Then:

AccountChangeSet has record: bigEndian(N) + A -> X
PlainState has record: A -> Y

See also: docs/programmers_guide/db_walkthrough.MD#table-history-of-accounts

As you can see if block N changes much accounts - then all records have repetitive prefix `bigEndian(N)`. MDBX can store such prefixes only once - by DupSort feature (see `docs/programmers_guide/dupsort.md`). Both buckets are DupSort-ed and have physical format: AccountChangeSet:

key - blockNum_u64
value - address + account(encoded)

StorageChangeSet:

key - blockNum_u64 + address + incarnation_u64
value - plain_storage_key + value
View Source
const E2AccountsHistory = "AccountHistory"

AccountsHistory and StorageHistory - indices designed to serve next 2 type of requests: 1. what is smallest block number >= X where account A changed 2. get last shard of A - to append there new block numbers

Task 1. is part of "get historical state" operation (see `core/state:GetAsOf`): If `db.Seek(A+bigEndian(X))` returns non-last shard -

then get block number from shard value Y := RoaringBitmap(shard_value).GetGte(X)
and with Y go to ChangeSets: db.Get(ChangeSets, Y+A)

If `db.Seek(A+bigEndian(X))` returns last shard -

then we go to PlainState: db.Get(PlainState, A)

Format:

  • index split to shards by 2Kb - RoaringBitmap encoded sorted list of block numbers (to avoid performance degradation of popular accounts or look deep into history. Also 2Kb allows avoid Overflow pages inside DB.)
  • if shard is not last - then key has suffix 8 bytes = bigEndian(max_block_num_in_this_shard)
  • if shard is last - then key has suffix 8 bytes = 0xFF

It allows:

  • server task 1. by 1 db operation db.Seek(A+bigEndian(X))
  • server task 2. by 1 db operation db.Get(A+0xFF)

see also: docs/programmers_guide/db_walkthrough.MD#table-change-sets

AccountsHistory:

key - address + shard_id_u64
value - roaring bitmap  - list of block where it changed

StorageHistory

key - address + storage_key + shard_id_u64
value - roaring bitmap - list of block where it changed
View Source
const E2StorageHistory = "StorageHistory"
View Source
const PlainContractCode = "PlainCodeHash"

PlainContractCode - key - address+incarnation value - code hash

View Source
const PlainState = "PlainState"

PlainState logical layout:

Contains Accounts:
  key - address (unhashed)
  value - account encoded for storage
Contains Storage:
  key - address (unhashed) + incarnation + storage key (unhashed)
  value - storage value(common.hash)

Physical layout:

PlainState and HashedStorage utilises DupSort feature of MDBX (store multiple values inside 1 key).

-------------------------------------------------------------

key              |            value

------------------------------------------------------------- [acc_hash] | [acc_value] [acc_hash]+[inc] | [storage1_hash]+[storage1_value]

| [storage2_hash]+[storage2_value] // this value has no own key. it's 2nd value of [acc_hash]+[inc] key.
| [storage3_hash]+[storage3_value]
| ...

[acc_hash]+[old_inc] | [storage1_hash]+[storage1_value]

| ...

[acc2_hash] | [acc2_value]

...
View Source
const ReadersLimit = 32000 // MDBX_READERS_LIMIT=32767
View Source
const StorageChangeSet = "StorageChangeSet"
View Source
const TrieOfAccounts = "TrieAccount"

TrieOfAccounts and TrieOfStorage hasState,groups - mark prefixes existing in hashed_account table hasTree - mark prefixes existing in trie_account table (not related with branchNodes) hasHash - mark prefixes which hashes are saved in current trie_account record (actually only hashes of branchNodes can be saved) @see UnmarshalTrieNode @see integrity.Trie

+-----------------------------------------------------------------------------------------------------+ | DB record: 0x0B, hasState: 0b1011, hasTree: 0b1001, hasHash: 0b1001, hashes: [x,x] | +-----------------------------------------------------------------------------------------------------+

|                                           |                               |
v                                           |                               v

+---------------------------------------------+ | +--------------------------------------+ | DB record: 0x0B00, hasState: 0b10001 | | | DB record: 0x0B03, hasState: 0b10010 | | hasTree: 0, hasHash: 0b10000, hashes: [x] | | | hasTree: 0, hasHash: 0, hashes: [] | +---------------------------------------------+ | +--------------------------------------+

|                    |                              |                         |                  |
v                    v                              v                         v                  v

+------------------+ +----------------------+ +---------------+ +---------------+ +---------------+ | Account: | | BranchNode: 0x0B0004 | | Account: | | Account: | | Account: | | 0x0B0000... | | has no record in | | 0x0B01... | | 0x0B0301... | | 0x0B0304... | | in HashedAccount | | TrieAccount | | | | | | | +------------------+ +----------------------+ +---------------+ +---------------+ +---------------+

                           |                |
                           v                v
		           +---------------+  +---------------+
		           | Account:      |  | Account:      |
		           | 0x0B000400... |  | 0x0B000401... |
		           +---------------+  +---------------+

Invariants: - hasTree is subset of hasState - hasHash is subset of hasState - first level in account_trie always exists if hasState>0 - TrieStorage record of account.root (length=40) must have +1 hash - it's account.root - each record in TrieAccount table must have parent (may be not direct) and this parent must have correct bit in hasTree bitmap - if hasState has bit - then HashedAccount table must have record according to this bit - each TrieAccount record must cover some state (means hasState is always > 0) - TrieAccount records with length=1 can satisfy (hasBranch==0&&hasHash==0) condition - Other records in TrieAccount and TrieStorage must (hasTree!=0 || hasHash!=0)

View Source
const TrieOfStorage = "TrieStorage"
View Source
const Unlim int = -1

const Unbounded []byte = nil

View Source
const VerkleRoots = "VerkleRoots"

Mapping [block number] => [Verkle Root]

View Source
const VerkleTrie = "VerkleTrie"

Mapping [Verkle Root] => [Rlp-Encoded Verkle Node]

Variables

View Source
var (
	ErrAttemptToDeleteNonDeprecatedBucket = errors.New("only buckets from dbutils.ChaindataDeprecatedTables can be deleted")

	DbSize    = metrics.GetOrCreateGauge(`db_size`)    //nolint
	TxLimit   = metrics.GetOrCreateGauge(`tx_limit`)   //nolint
	TxSpill   = metrics.GetOrCreateGauge(`tx_spill`)   //nolint
	TxUnspill = metrics.GetOrCreateGauge(`tx_unspill`) //nolint
	TxDirty   = metrics.GetOrCreateGauge(`tx_dirty`)   //nolint

	DbCommitPreparation = metrics.GetOrCreateSummary(`db_commit_seconds{phase="preparation"}`) //nolint
	//DbGCWallClock       = metrics.GetOrCreateSummary(`db_commit_seconds{phase="gc_wall_clock"}`) //nolint
	//DbGCCpuTime         = metrics.GetOrCreateSummary(`db_commit_seconds{phase="gc_cpu_time"}`)   //nolint
	//DbCommitAudit       = metrics.GetOrCreateSummary(`db_commit_seconds{phase="audit"}`)         //nolint
	DbCommitWrite  = metrics.GetOrCreateSummary(`db_commit_seconds{phase="write"}`)  //nolint
	DbCommitSync   = metrics.GetOrCreateSummary(`db_commit_seconds{phase="sync"}`)   //nolint
	DbCommitEnding = metrics.GetOrCreateSummary(`db_commit_seconds{phase="ending"}`) //nolint
	DbCommitTotal  = metrics.GetOrCreateSummary(`db_commit_seconds{phase="total"}`)  //nolint

	DbPgopsNewly   = metrics.GetOrCreateGauge(`db_pgops{phase="newly"}`)   //nolint
	DbPgopsCow     = metrics.GetOrCreateGauge(`db_pgops{phase="cow"}`)     //nolint
	DbPgopsClone   = metrics.GetOrCreateGauge(`db_pgops{phase="clone"}`)   //nolint
	DbPgopsSplit   = metrics.GetOrCreateGauge(`db_pgops{phase="split"}`)   //nolint
	DbPgopsMerge   = metrics.GetOrCreateGauge(`db_pgops{phase="merge"}`)   //nolint
	DbPgopsSpill   = metrics.GetOrCreateGauge(`db_pgops{phase="spill"}`)   //nolint
	DbPgopsUnspill = metrics.GetOrCreateGauge(`db_pgops{phase="unspill"}`) //nolint
	DbPgopsWops    = metrics.GetOrCreateGauge(`db_pgops{phase="wops"}`)    //nolint

	GcLeafMetric     = metrics.GetOrCreateGauge(`db_gc_leaf`)     //nolint
	GcOverflowMetric = metrics.GetOrCreateGauge(`db_gc_overflow`) //nolint
	GcPagesMetric    = metrics.GetOrCreateGauge(`db_gc_pages`)    //nolint

)
View Source
var (
	//StorageModeTEVM - does not translate EVM to TEVM
	StorageModeTEVM = []byte("smTEVM")

	PruneTypeOlder  = []byte("older")
	PruneTypeBefore = []byte("before")

	PruneHistory        = []byte("pruneHistory")
	PruneHistoryType    = []byte("pruneHistoryType")
	PruneReceipts       = []byte("pruneReceipts")
	PruneReceiptsType   = []byte("pruneReceiptsType")
	PruneTxIndex        = []byte("pruneTxIndex")
	PruneTxIndexType    = []byte("pruneTxIndexType")
	PruneCallTraces     = []byte("pruneCallTraces")
	PruneCallTracesType = []byte("pruneCallTracesType")

	DBSchemaVersionKey = []byte("dbVersion")

	BittorrentPeerID            = "peerID"
	CurrentHeadersSnapshotHash  = []byte("CurrentHeadersSnapshotHash")
	CurrentHeadersSnapshotBlock = []byte("CurrentHeadersSnapshotBlock")
	CurrentBodiesSnapshotHash   = []byte("CurrentBodiesSnapshotHash")
	CurrentBodiesSnapshotBlock  = []byte("CurrentBodiesSnapshotBlock")
	PlainStateVersion           = []byte("PlainStateVersion")

	HighestFinalizedKey         = []byte("HighestFinalized")
	LightClientStore            = []byte("LightClientStore")
	LightClientFinalityUpdate   = []byte("LightClientFinalityUpdate")
	LightClientOptimisticUpdate = []byte("LightClientOptimisticUpdate")
)

Keys

View Source
var BorTablesCfg = TableCfg{
	BorReceipts:  {Flags: DupSort},
	BorFinality:  {Flags: DupSort},
	BorTxLookup:  {Flags: DupSort},
	BorEvents:    {Flags: DupSort},
	BorEventNums: {Flags: DupSort},
	BorSpans:     {Flags: DupSort},
}
View Source
var ChaindataDeprecatedTables = []string{
	Clique,
	TransitionBlockKey,
}

ChaindataDeprecatedTables - list of buckets which can be programmatically deleted - for example after migration

View Source
var ChaindataTables = []string{}/* 113 elements not displayed */

ChaindataTables - list of all buckets. App will panic if some bucket is not in this list. This list will be sorted in `init` method. ChaindataTablesCfg - can be used to find index in sorted version of ChaindataTables list by name

View Source
var ChaindataTablesCfg = TableCfg{
	HashedStorage: {
		Flags:                     DupSort,
		AutoDupSortKeysConversion: true,
		DupFromLen:                72,
		DupToLen:                  40,
	},
	AccountChangeSet: {Flags: DupSort},
	StorageChangeSet: {Flags: DupSort},
	PlainState: {
		Flags:                     DupSort,
		AutoDupSortKeysConversion: true,
		DupFromLen:                60,
		DupToLen:                  28,
	},
	CallTraceSet: {Flags: DupSort},

	TblAccountKeys:           {Flags: DupSort},
	TblAccountHistoryKeys:    {Flags: DupSort},
	TblAccountHistoryVals:    {Flags: DupSort},
	TblAccountIdx:            {Flags: DupSort},
	TblStorageKeys:           {Flags: DupSort},
	TblStorageHistoryKeys:    {Flags: DupSort},
	TblStorageHistoryVals:    {Flags: DupSort},
	TblStorageIdx:            {Flags: DupSort},
	TblCodeKeys:              {Flags: DupSort},
	TblCodeHistoryKeys:       {Flags: DupSort},
	TblCodeIdx:               {Flags: DupSort},
	TblCommitmentKeys:        {Flags: DupSort},
	TblCommitmentHistoryKeys: {Flags: DupSort},
	TblCommitmentIdx:         {Flags: DupSort},
	TblLogAddressKeys:        {Flags: DupSort},
	TblLogAddressIdx:         {Flags: DupSort},
	TblLogTopicsKeys:         {Flags: DupSort},
	TblLogTopicsIdx:          {Flags: DupSort},
	TblTracesFromKeys:        {Flags: DupSort},
	TblTracesFromIdx:         {Flags: DupSort},
	TblTracesToKeys:          {Flags: DupSort},
	TblTracesToIdx:           {Flags: DupSort},
	RAccountKeys:             {Flags: DupSort},
	RAccountIdx:              {Flags: DupSort},
	RStorageKeys:             {Flags: DupSort},
	RStorageIdx:              {Flags: DupSort},
	RCodeKeys:                {Flags: DupSort},
	RCodeIdx:                 {Flags: DupSort},
}
View Source
var DBSchemaVersion = types.VersionReply{Major: 6, Minor: 1, Patch: 0}

DBSchemaVersion versions list 5.0 - BlockTransaction table now has canonical ids (txs of non-canonical blocks moving to NonCanonicalTransaction table) 6.0 - BlockTransaction table now has system-txs before and after block (records are absent if block has no system-tx, but sequence increasing) 6.1 - Canonical/NonCanonical/BadBlock transitions now stored in same table: kv.EthTx. Add kv.BadBlockNumber table

View Source
var DownloaderTables = []string{
	BittorrentCompletion,
	BittorrentInfo,
}
View Source
var DownloaderTablesCfg = TableCfg{}
View Source
var ErrChanged = fmt.Errorf("key must not change")
View Source
var ReconTablesCfg = TableCfg{
	PlainStateD:    {Flags: DupSort},
	CodeD:          {Flags: DupSort},
	PlainContractD: {Flags: DupSort},
}
View Source
var SentryTables = []string{}
View Source
var SentryTablesCfg = TableCfg{}
View Source
var TxpoolTablesCfg = TableCfg{}

Functions

func BigChunks

func BigChunks(db RoDB, table string, from []byte, walker func(tx Tx, k, v []byte) (bool, error)) error

BigChunks - read `table` by big chunks - restart read transaction after each 1 minutes

func DefaultPageSize

func DefaultPageSize() uint64

func EnsureNotChangedBool

func EnsureNotChangedBool(tx GetPut, bucket string, k []byte, value bool) (ok, enabled bool, err error)

EnsureNotChangedBool - used to store immutable config flags in db. protects from human mistakes

func FirstKey

func FirstKey(tx Tx, table string) ([]byte, error)

FirstKey - candidate on move to kv.Tx interface

func GetBool

func GetBool(tx Getter, bucket string, k []byte) (enabled bool, err error)

func LastKey

func LastKey(tx Tx, table string) ([]byte, error)

LastKey - candidate on move to kv.Tx interface

func NextSubtree

func NextSubtree(in []byte) ([]byte, bool)

NextSubtree does []byte++. Returns false if overflow.

func ReadAhead

func ReadAhead(ctx context.Context, db RoDB, progress *atomic.Bool, table string, from []byte, amount uint32) (clean func())

Types

type Bucket

type Bucket string

type BucketMigrator

type BucketMigrator interface {
	BucketMigratorRO
	DropBucket(string) error
	CreateBucket(string) error
	ExistsBucket(string) (bool, error)
	ClearBucket(string) error
}

BucketMigrator used for buckets migration, don't use it in usual app code

type BucketMigratorRO

type BucketMigratorRO interface {
	ListBuckets() ([]string, error)
}

type Closer

type Closer interface {
	Close()
}

type CmpFunc

type CmpFunc func(k1, k2, v1, v2 []byte) int

type Cursor

type Cursor interface {
	First() ([]byte, []byte, error)               // First - position at first key/data item
	Seek(seek []byte) ([]byte, []byte, error)     // Seek - position at first key greater than or equal to specified key
	SeekExact(key []byte) ([]byte, []byte, error) // SeekExact - position at exact matching key if exists
	Next() ([]byte, []byte, error)                // Next - position at next key/value (can iterate over DupSort key/values automatically)
	Prev() ([]byte, []byte, error)                // Prev - position at previous key
	Last() ([]byte, []byte, error)                // Last - position at last key and last possible value
	Current() ([]byte, []byte, error)             // Current - return key/data at current cursor position

	Count() (uint64, error) // Count - fast way to calculate amount of keys in bucket. It counts all keys even if Prefix was set.

	Close()
}

Cursor - class for navigating through a database CursorDupSort are inherit this class

If methods (like First/Next/Seek) return error, then returned key SHOULD not be nil (can be []byte{} for example). Then looping code will look as: c := kv.Cursor(bucketName)

for k, v, err := c.First(); k != nil; k, v, err = c.Next() {
   if err != nil {
       return err
   }
   ... logic
}

type CursorDupSort

type CursorDupSort interface {
	Cursor

	// SeekBothExact -
	// second parameter can be nil only if searched key has no duplicates, or return error
	SeekBothExact(key, value []byte) ([]byte, []byte, error)
	SeekBothRange(key, value []byte) ([]byte, error) // SeekBothRange - exact match of the key, but range match of the value
	FirstDup() ([]byte, error)                       // FirstDup - position at first data item of current key
	NextDup() ([]byte, []byte, error)                // NextDup - position at next data item of current key
	NextNoDup() ([]byte, []byte, error)              // NextNoDup - position at first data item of next key
	PrevDup() ([]byte, []byte, error)
	PrevNoDup() ([]byte, []byte, error)
	LastDup() ([]byte, error) // LastDup - position at last data item of current key

	CountDuplicates() (uint64, error) // CountDuplicates - number of duplicates for the current key
}

CursorDupSort

Example:

for k, v, err = cursor.First(); k != nil; k, v, err = cursor.NextNoDup() {
	if err != nil {
		return err
	}
	for ; v != nil; _, v, err = cursor.NextDup() {
		if err != nil {
			return err
		}

	}
}

type DBI

type DBI uint

type DBVerbosityLvl

type DBVerbosityLvl int8

type Deleter

type Deleter interface {
	// Delete removes a single entry.
	Delete(table string, k []byte) error
}

Deleter wraps the database delete operations.

type Domain

type Domain string
const (
	AccountsDomain Domain = "AccountsDomain"
	StorageDomain  Domain = "StorageDomain"
	CodeDomain     Domain = "CodeDomain"
)

type GetPut

type GetPut interface {
	Getter
	Putter
}

type Getter

type Getter interface {
	Has

	// GetOne references a readonly section of memory that must not be accessed after txn has terminated
	GetOne(table string, key []byte) (val []byte, err error)

	// ForEach iterates over entries with keys greater or equal to fromPrefix.
	// walker is called for each eligible entry.
	// If walker returns an error:
	//   - implementations of local db - stop
	//   - implementations of remote db - do not handle this error and may finish (send all entries to client) before error happen.
	ForEach(table string, fromPrefix []byte, walker func(k, v []byte) error) error
	ForPrefix(table string, prefix []byte, walker func(k, v []byte) error) error
	ForAmount(table string, prefix []byte, amount uint32, walker func(k, v []byte) error) error
}

type Has

type Has interface {
	// Has indicates whether a key exists in the database.
	Has(table string, key []byte) (bool, error)
}

type History

type History string
const (
	AccountsHistory History = "AccountsHistory"
	StorageHistory  History = "StorageHistory"
	CodeHistory     History = "CodeHistory"
)

type InvertedIdx

type InvertedIdx string
const (
	AccountsHistoryIdx InvertedIdx = "AccountsHistoryIdx"
	StorageHistoryIdx  InvertedIdx = "StorageHistoryIdx"
	CodeHistoryIdx     InvertedIdx = "CodeHistoryIdx"

	LogTopicIdx   InvertedIdx = "LogTopicIdx"
	LogAddrIdx    InvertedIdx = "LogAddrIdx"
	TracesFromIdx InvertedIdx = "TracesFromIdx"
	TracesToIdx   InvertedIdx = "TracesToIdx"
)

type Label

type Label uint8
const (
	ChainDB      Label = 0
	TxPoolDB     Label = 1
	SentryDB     Label = 2
	ConsensusDB  Label = 3
	DownloaderDB Label = 4
	InMem        Label = 5
)

func UnmarshalLabel

func UnmarshalLabel(s string) Label

func (Label) String

func (l Label) String() string

type PendingMutations

type PendingMutations interface {
	StatelessRwTx
	// Flush all in-memory data into `tx`
	Flush(ctx context.Context, tx RwTx) error
	Close()
	BatchSize() int
}

PendingMutations in-memory storage of changes Later they can either be flushed to the database or abandon

type Putter

type Putter interface {
	// Put inserts or updates a single entry.
	Put(table string, k, v []byte) error
}

Putter wraps the database write operations.

type RoDB

type RoDB interface {
	Closer
	ReadOnly() bool
	View(ctx context.Context, f func(tx Tx) error) error

	// BeginRo - creates transaction
	// 	tx may be discarded by .Rollback() method
	//
	// A transaction and its cursors must only be used by a single
	// 	thread (not goroutine), and a thread may only have a single transaction at a time.
	//  It happen automatically by - because this method calls runtime.LockOSThread() inside (Rollback/Commit releases it)
	//  By this reason application code can't call runtime.UnlockOSThread() - it leads to undefined behavior.
	//
	// If this `parent` is non-NULL, the new transaction
	//	will be a nested transaction, with the transaction indicated by parent
	//	as its parent. Transactions may be nested to any level. A parent
	//	transaction and its cursors may not issue any other operations than
	//	Commit and Rollback while it has active child transactions.
	BeginRo(ctx context.Context) (Tx, error)
	AllTables() TableCfg
	PageSize() uint64

	// Pointer to the underlying C environment handle, if applicable (e.g. *C.MDBX_env)
	CHandle() unsafe.Pointer
}

RoDB - Read-only version of KV.

type RwCursor

type RwCursor interface {
	Cursor

	Put(k, v []byte) error           // Put - based on order
	Append(k []byte, v []byte) error // Append - append the given key/data pair to the end of the database. This option allows fast bulk loading when keys are already known to be in the correct order.
	Delete(k []byte) error           // Delete - short version of SeekExact+DeleteCurrent or SeekBothExact+DeleteCurrent

	// DeleteCurrent This function deletes the key/data pair to which the cursor refers.
	// This does not invalidate the cursor, so operations such as MDB_NEXT
	// can still be used on it.
	// Both MDB_NEXT and MDB_GET_CURRENT will return the same record after
	// this operation.
	DeleteCurrent() error
}

type RwCursorDupSort

type RwCursorDupSort interface {
	CursorDupSort
	RwCursor

	PutNoDupData(key, value []byte) error // PutNoDupData - inserts key without dupsort
	DeleteCurrentDuplicates() error       // DeleteCurrentDuplicates - deletes all of the data items for the current key
	DeleteExact(k1, k2 []byte) error      // DeleteExact - delete 1 value from given key
	AppendDup(key, value []byte) error    // AppendDup - same as Append, but for sorted dup data
}

type RwDB

type RwDB interface {
	RoDB

	Update(ctx context.Context, f func(tx RwTx) error) error
	UpdateNosync(ctx context.Context, f func(tx RwTx) error) error

	BeginRw(ctx context.Context) (RwTx, error)
	BeginRwNosync(ctx context.Context) (RwTx, error)
}

RwDB low-level database interface - main target is - to provide common abstraction over top of MDBX and RemoteKV.

Common pattern for short-living transactions:

 if err := db.View(ctx, func(tx ethdb.Tx) error {
    ... code which uses database in transaction
 }); err != nil {
		return err
}

Common pattern for long-living transactions:

tx, err := db.Begin()
if err != nil {
	return err
}
defer tx.Rollback()

... code which uses database in transaction

err := tx.Commit()
if err != nil {
	return err
}

type RwTx

type RwTx interface {
	Tx
	StatelessWriteTx
	BucketMigrator

	RwCursor(table string) (RwCursor, error)
	RwCursorDupSort(table string) (RwCursorDupSort, error)

	// CollectMetrics - does collect all DB-related and Tx-related metrics
	// this method exists only in RwTx to avoid concurrency
	CollectMetrics()
}

RwTx

WARNING:

  • RwTx is not threadsafe and may only be used in the goroutine that created it.
  • ReadOnly transactions do not lock goroutine to thread, RwTx does
  • User Can't call runtime.LockOSThread/runtime.UnlockOSThread in same goroutine until RwTx Commit/Rollback

type StatelessReadTx

type StatelessReadTx interface {
	Getter

	Commit() error // Commit all the operations of a transaction into the database.
	Rollback()     // Rollback - abandon all the operations of the transaction instead of saving them.

	// ReadSequence - allows to create a linear sequence of unique positive integers for each table.
	// Can be called for a read transaction to retrieve the current sequence value, and the increment must be zero.
	// Sequence changes become visible outside the current write transaction after it is committed, and discarded on abort.
	// Starts from 0.
	ReadSequence(table string) (uint64, error)
}

type StatelessRwTx

type StatelessRwTx interface {
	StatelessReadTx
	StatelessWriteTx
}

type StatelessWriteTx

type StatelessWriteTx interface {
	Putter
	Deleter

	/*
		// if need N id's:
		baseId, err := tx.IncrementSequence(bucket, N)
		if err != nil {
		   return err
		}
		for i := 0; i < N; i++ {    // if N == 0, it will work as expected
		    id := baseId + i
		    // use id
		}


		// or if need only 1 id:
		id, err := tx.IncrementSequence(bucket, 1)
		if err != nil {
		    return err
		}
		// use id
	*/
	IncrementSequence(table string, amount uint64) (uint64, error)
	Append(table string, k, v []byte) error
	AppendDup(table string, k, v []byte) error
}

type TableCfg

type TableCfg map[string]TableCfgItem

func TablesCfgByLabel

func TablesCfgByLabel(label Label) TableCfg

type TableCfgItem

type TableCfgItem struct {
	Flags TableFlags
	// AutoDupSortKeysConversion - enables some keys transformation - to change db layout without changing app code.
	// Use it wisely - it helps to do experiments with DB format faster, but better reduce amount of Magic in app.
	// If good DB format found, push app code to accept this format and then disable this property.
	AutoDupSortKeysConversion bool
	IsDeprecated              bool
	DBI                       DBI
	// DupFromLen - if user provide key of this length, then next transformation applied:
	// v = append(k[DupToLen:], v...)
	// k = k[:DupToLen]
	// And opposite at retrieval
	// Works only if AutoDupSortKeysConversion enabled
	DupFromLen int
	DupToLen   int
}

type TableFlags

type TableFlags uint
const (
	Default    TableFlags = 0x00
	ReverseKey TableFlags = 0x02
	DupSort    TableFlags = 0x04
	IntegerKey TableFlags = 0x08
	IntegerDup TableFlags = 0x20
	ReverseDup TableFlags = 0x40
)

type TemporalTx

type TemporalTx interface {
	Tx
	DomainGet(name Domain, k, k2 []byte) (v []byte, ok bool, err error)
	DomainGetAsOf(name Domain, k, k2 []byte, ts uint64) (v []byte, ok bool, err error)
	HistoryGet(name History, k []byte, ts uint64) (v []byte, ok bool, err error)

	// IndexRange - return iterator over range of inverted index for given key `k`
	// Asc semantic:  [from, to) AND from > to
	// Desc semantic: [from, to) AND from < to
	// Limit -1 means Unlimited
	// from -1, to -1 means unbounded (StartOfTable, EndOfTable)
	// Example: IndexRange("IndexName", 10, 5, order.Desc, -1)
	// Example: IndexRange("IndexName", -1, -1, order.Asc, 10)
	IndexRange(name InvertedIdx, k []byte, fromTs, toTs int, asc order.By, limit int) (timestamps iter.U64, err error)
	HistoryRange(name History, fromTs, toTs int, asc order.By, limit int) (it iter.KV, err error)
	DomainRange(name Domain, fromKey, toKey []byte, ts uint64, asc order.By, limit int) (it iter.KV, err error)
}

type Tx

type Tx interface {
	StatelessReadTx
	BucketMigratorRO

	// ID returns the identifier associated with this transaction. For a
	// read-only transaction, this corresponds to the snapshot being read;
	// concurrent readers will frequently have the same transaction ID.
	ViewID() uint64

	// Cursor - creates cursor object on top of given bucket. Type of cursor - depends on bucket configuration.
	// If bucket was created with mdbx.DupSort flag, then cursor with interface CursorDupSort created
	// Otherwise - object of interface Cursor created
	//
	// Cursor, also provides a grain of magic - it can use a declarative configuration - and automatically break
	// long keys into DupSort key/values. See docs for `bucket.go:TableCfgItem`
	Cursor(table string) (Cursor, error)
	CursorDupSort(table string) (CursorDupSort, error) // CursorDupSort - can be used if bucket has mdbx.DupSort flag

	DBSize() (uint64, error)

	// Range [from, to)
	// Range(from, nil) means [from, EndOfTable)
	// Range(nil, to)   means [StartOfTable, to)
	Range(table string, fromPrefix, toPrefix []byte) (iter.KV, error)
	// Stream is like Range, but for requesting huge data (Example: full table scan). Client can't stop it.
	//Stream(table string, fromPrefix, toPrefix []byte) (iter.KV, error)
	// RangeAscend - like Range [from, to) but also allow pass Limit parameters
	// Limit -1 means Unlimited
	RangeAscend(table string, fromPrefix, toPrefix []byte, limit int) (iter.KV, error)
	//StreamAscend(table string, fromPrefix, toPrefix []byte, limit int) (iter.KV, error)
	// RangeDescend - is like Range [from, to), but expecing `from`<`to`
	// example: RangeDescend("Table", "B", "A", -1)
	RangeDescend(table string, fromPrefix, toPrefix []byte, limit int) (iter.KV, error)
	//StreamDescend(table string, fromPrefix, toPrefix []byte, limit int) (iter.KV, error)
	// Prefix - is exactly Range(Table, prefix, kv.NextSubtree(prefix))
	Prefix(table string, prefix []byte) (iter.KV, error)

	// RangeDupSort - like Range but for fixed single key and iterating over range of values
	RangeDupSort(table string, key []byte, fromPrefix, toPrefix []byte, asc order.By, limit int) (iter.KV, error)

	ForEach(table string, fromPrefix []byte, walker func(k, v []byte) error) error
	ForPrefix(table string, prefix []byte, walker func(k, v []byte) error) error
	ForAmount(table string, prefix []byte, amount uint32, walker func(k, v []byte) error) error

	// Pointer to the underlying C transaction handle (e.g. *C.MDBX_txn)
	CHandle() unsafe.Pointer
	BucketSize(table string) (uint64, error)
}

Tx WARNING:

  • Tx is not threadsafe and may only be used in the goroutine that created it
  • ReadOnly transactions do not lock goroutine to thread, RwTx does

Directories

Path Synopsis
temporal

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL