Documentation ¶
Overview ¶
Package proto provides definition of NEO messages and their marshalling to/from wire format.
Two NEO nodes can exchange messages over underlying network link after performing NEO-specific handshake. A message is sent as a packet specifying ID of subconnection multiplexed on top of the underlying link, carried message code and message data.
PktHeaderN describes packet header structure in 'N' encoding.
Messages are represented by corresponding types that all implement Msg interface.
A message type can be looked up by message code with MsgType.
The proto packages provides only message definitions and low-level primitives for their marshalling. Package lab.nexedi.com/kirr/neo/go/neo/neonet provides actual service for message exchange over network.
Index ¶
- Constants
- Variables
- func MsgCode(msg Msg) uint16
- func MsgType(msgCode uint16) reflect.Type
- type AbortTransaction
- type AcceptIdentification
- type AddObject
- type AddPendingNodes
- type AddTransaction
- type Address
- type AnswerBeginTransaction
- type AnswerCheckCurrentSerial
- type AnswerCheckSerialRange
- type AnswerCheckTIDRange
- type AnswerClusterState
- type AnswerFetchObjects
- type AnswerFetchTransactions
- type AnswerFinalTID
- type AnswerInformationLocked
- type AnswerLastIDs
- type AnswerLastTransaction
- type AnswerLockedTransactions
- type AnswerNewOIDs
- type AnswerNodeList
- type AnswerObject
- type AnswerObjectHistory
- type AnswerObjectUndoSerial
- type AnswerPack
- type AnswerPartitionList
- type AnswerPartitionTable
- type AnswerPrimary
- type AnswerRebaseObject
- type AnswerRebaseTransaction
- type AnswerRecovery
- type AnswerStoreObject
- type AnswerStoreTransaction
- type AnswerTIDs
- type AnswerTIDsFrom
- type AnswerTransactionFinished
- type AnswerTransactionInformation
- type AnswerTweakPartitionTable
- type AnswerUnfinishedTransactions
- type AnswerVoteTransaction
- type AskClusterState
- type AskNewOIDs
- type AskPartitionTable
- type AskTIDs
- type AskTIDsFrom
- type BeginTransaction
- type CellInfo
- type CellState
- type CheckCurrentSerial
- type CheckPartition
- type CheckReplicas
- type CheckSerialRange
- type CheckTIDRange
- type Checksum
- type CloseClient
- type ClusterState
- type Encoding
- type Error
- type ErrorCode
- type FailedVote
- type FetchObjects
- type FetchTransactions
- type FinalTID
- type FinishTransaction
- type FlushLog
- type GetObject
- type IdTime
- type InvalidateObjects
- type LastIDs
- type LastTransaction
- type LockInformation
- type LockedTransactions
- type Msg
- type NodeID
- type NodeInfo
- type NodeList
- type NodeState
- type NodeType
- type NotPrimaryMaster
- type NotifyClusterState
- type NotifyDeadlock
- type NotifyNodeInformation
- type NotifyPartitionChanges
- type NotifyReady
- type NotifyTransactionFinished
- type NotifyUnlockInformation
- type ObjectHistory
- type ObjectUndoSerial
- type PTid
- type Pack
- type PartitionCorrupted
- type PartitionList
- type Ping
- type PktHeaderN
- type Pong
- type PrimaryMaster
- type RebaseObject
- type RebaseTransaction
- type Recovery
- type Repair
- type RepairOne
- type Replicate
- type ReplicationDone
- type RequestIdentification
- type RowInfo
- type SendPartitionTable
- type SetClusterState
- type SetNodeState
- type SetNumReplicas
- type StartOperation
- type StopOperation
- type StoreObject
- type StoreTransaction
- type TransactionInformation
- type Truncate
- type TweakPartitionTable
- type UnfinishedTransactions
- type ValidateTransaction
- type VoteTransaction
Constants ¶
const ( // The protocol version must be increased whenever upgrading a node may require // to upgrade other nodes. Version = 6 // length of packet header in 'N'-encoding PktHeaderLenN = 10 // = unsafe.Sizeof(PktHeaderN{}), but latter gives typed constant (uintptr) // packets larger than PktMaxSize are not allowed. // this helps to avoid out-of-memory error on packets with corrupt message len. PktMaxSize = 0x4000000 INVALID_TID zodb.Tid = 1<<64 - 1 // 0xffffffffffffffff INVALID_OID zodb.Oid = 1<<64 - 1 )
Variables ¶
var ErrDecodeOverflow = errors.New("decode: buffer overflow")
ErrDecodeOverflow is the error returned by neoMsgDecode when decoding hits buffer overflow
var IdTimeNone = IdTime(math.Inf(-1))
IdTimeNone represents None passed as identification time.
Functions ¶
Types ¶
type AbortTransaction ¶
Abort a transaction. This maps to `tpc_abort`.
type AcceptIdentification ¶
type AddObject ¶
type AddObject struct { Oid zodb.Oid Serial zodb.Tid Compression bool Checksum Checksum Data *mem.Buf DataSerial zodb.Tid }
Send an object record to a node that do not have it.
type AddPendingNodes ¶
type AddPendingNodes struct {
NodeList []NodeID
}
Mark given pending nodes as running, for future inclusion when tweaking the partition table.
type AddTransaction ¶
type AddTransaction struct { Tid zodb.Tid User string Description string Extension string Packed bool TTid zodb.Tid OidList []zodb.Oid }
Send metadata of a transaction to a node that do not have them.
type Address ¶
Address represents host:port network endpoint.
func AddrString ¶
AddrString converts network address string into NEO Address.
TODO make neo.Address just string without host:port split
type AnswerBeginTransaction ¶
type AnswerCheckCurrentSerial ¶
type AnswerCheckCurrentSerial struct { // was _answer = StoreObject._answer in py // XXX can we do without embedding e.g. `type AnswerCheckCurrentSerial AnswerStoreObject` ? AnswerStoreObject }
type AnswerCheckSerialRange ¶
type AnswerCheckTIDRange ¶
type AnswerClusterState ¶
type AnswerClusterState struct {
State ClusterState
}
type AnswerFetchObjects ¶
type AnswerFetchTransactions ¶
type AnswerFinalTID ¶
type AnswerInformationLocked ¶
type AnswerLastTransaction ¶
type AnswerNewOIDs ¶
type AnswerNodeList ¶
type AnswerNodeList struct {
NodeList []NodeInfo
}
type AnswerObject ¶
type AnswerObjectHistory ¶
type AnswerObjectUndoSerial ¶
type AnswerPack ¶
type AnswerPack struct {
Status bool
}
type AnswerPartitionList ¶
type AnswerPartitionTable ¶
type AnswerPrimary ¶
type AnswerPrimary struct {
PrimaryNodeID NodeID
}
type AnswerRebaseObject ¶
type AnswerRebaseTransaction ¶
type AnswerStoreObject ¶
type AnswerStoreTransaction ¶
type AnswerStoreTransaction struct{}
type AnswerTIDs ¶
type AnswerTIDsFrom ¶
type AnswerVoteTransaction ¶
type AnswerVoteTransaction struct{}
type AskNewOIDs ¶
type AskNewOIDs struct {
NumOIDs uint32 // PNumber
}
Ask new OIDs to create objects.
type AskPartitionTable ¶
type AskPartitionTable struct { }
Ask storage node the remaining data needed by master to recover.
type AskTIDs ¶
type AskTIDs struct { First uint64 // PIndex [first, last) are offsets that define Last uint64 // PIndex range in tid list on remote. Partition uint32 // PNumber }
Ask for TIDs between a range of offsets. The order of TIDs is descending, and the range is [first, last). This maps to `undoLog`.
type AskTIDsFrom ¶
type AskTIDsFrom struct { MinTID zodb.Tid MaxTID zodb.Tid Length uint32 // PNumber Partition uint32 // PNumber }
Ask for length TIDs starting at min_tid. The order of TIDs is ascending. Used by `iterator`.
type BeginTransaction ¶
Ask to begin a new transaction. This maps to `tpc_begin`.
type CellState ¶
type CellState int8
const ( // Write-only cell. Last transactions are missing because storage is/was down // for a while, or because it is new for the partition. It usually becomes // UP_TO_DATE when replication is done. OUT_OF_DATE CellState = iota //short: O // XXX tag prefix name ? // Normal state: cell is writable/readable, and it isn't planned to drop it. UP_TO_DATE //short: U // Same as UP_TO_DATE, except that it will be discarded as soon as another // node finishes to replicate it. It means a partition is moved from 1 node // to another. It is also discarded immediately if out-of-date. FEEDING //short: F // A check revealed that data differs from other replicas. Cell is neither // readable nor writable. CORRUPTED //short: C // Not really a state: only used in network messages to tell storages to drop // partitions. DISCARDED //short: D )
type CheckCurrentSerial ¶
Check if given serial is current for the given oid, and lock it so that this state is not altered until transaction ends. This maps to `checkCurrentSerialInTransaction`.
type CheckPartition ¶
type CheckPartition struct { Partition uint32 // PNumber Source struct { UpstreamName string Address Address } MinTID zodb.Tid MaxTID zodb.Tid }
Ask a storage node to compare a partition with all other nodes. Like for CheckReplicas, only metadata are checked, optionally within a specific range. A reference node can be specified.
type CheckReplicas ¶
type CheckReplicas struct { PartitionDict map[uint32]NodeID // partition -> source (PNumber) MinTID zodb.Tid MaxTID zodb.Tid }
Ask the cluster to search for mismatches between replicas, metadata only, and optionally within a specific range. Reference nodes can be specified.
type CheckSerialRange ¶
type CheckSerialRange struct { Partition uint32 // PNumber Length uint32 // PNumber MinTID zodb.Tid MaxTID zodb.Tid MinOID zodb.Oid }
Ask some stats about a range of object history. Used to know if there are differences between a replicating node and reference node.
type CheckTIDRange ¶
type CheckTIDRange struct { Partition uint32 // PNumber Length uint32 // PNumber MinTID zodb.Tid MaxTID zodb.Tid }
Ask some stats about a range of transactions. Used to know if there are differences between a replicating node and reference node.
type CloseClient ¶
type CloseClient struct { }
Tell peer that it can close the connection if it has finished with us.
type ClusterState ¶
type ClusterState int8
const ( // The cluster is initially in the RECOVERING state, and it goes back to // this state whenever the partition table becomes non-operational again. // An election of the primary master always happens, in case of a network // cut between a primary master and all other nodes. The primary master: // - first recovers its own data by reading it from storage nodes; // - waits for the partition table be operational; // - automatically switch to ClusterVerifying if the cluster can be safely started. XXX not automatic ClusterRecovering ClusterState = iota // Transient state, used to: // - replay the transaction log, in case of unclean shutdown; // - and actually truncate the DB if the user asked to do so. // Then, the cluster either goes to ClusterRunning or STARTING_BACKUP state. ClusterVerifying // Normal operation. The DB is read-writable by clients. ClusterRunning // Transient state to shutdown the whole cluster. ClusterStopping // Transient state, during which the master (re)connect to the upstream // master. STARTING_BACKUP // Backup operation. The master is notified of new transactions thanks to // invalidations and orders storage nodes to fetch them from upstream. // Because cells are synchronized independently, the DB is often // inconsistent. BACKINGUP // Transient state, when the user decides to go back to RUNNING state. // The master stays in this state until the DB is consistent again. // In case of failure, the cluster will go back to backup mode. STOPPING_BACKUP )
func (ClusterState) String ¶
func (i ClusterState) String() string
type Encoding ¶
type Encoding byte
Encoding represents messages encoding.
func (Encoding) MsgEncode ¶
MsgEncode encodes msg state into buf via encoding e.
len(buf) must be >= e.MsgEncodedLen(m).
func (Encoding) MsgEncodedLen ¶
MsgEncodedLen returns how much space is needed to encode msg payload via encoding e.
type Error ¶
Error is a special type of message, because this can be sent against any other message, even if such a message does not expect a reply usually.
type FailedVote ¶
Report storage nodes for which vote failed. True is returned if it's still possible to finish the transaction.
type FetchObjects ¶
type FetchObjects struct { Partition uint32 // PNumber Length uint32 // PNumber MinTid zodb.Tid MaxTid zodb.Tid MinOid zodb.Oid // already known objects ObjKnownDict map[zodb.Tid][]zodb.Oid // serial -> []oid }
Ask a storage node to send object records we don't have, and reply with the list of records we should not have.
type FetchTransactions ¶
type FetchTransactions struct { Partition uint32 // PNumber Length uint32 // PNumber MinTid zodb.Tid MaxTid zodb.Tid TxnKnownList []zodb.Tid // already known transactions }
Ask a storage node to send all transaction data we don't have, and reply with the list of transactions we should not have.
type FinalTID ¶
Return final tid if ttid has been committed, to recover from certain failures during tpc_finish.
type FinishTransaction ¶
type FinishTransaction struct { Tid zodb.Tid // XXX this is ttid OIDList []zodb.Oid CheckedList []zodb.Oid }
Finish a transaction. Return the TID of the committed transaction. This maps to `tpc_finish`.
type GetObject ¶
Ask a stored object by its OID, optionally at/before a specific tid. This maps to `load/loadBefore/loadSerial`.
type InvalidateObjects ¶
Notify about a new transaction modifying objects, invalidating client caches.
type LastIDs ¶
type LastIDs struct { }
Ask the last OID/TID so that a master can initialize its TransactionManager. Reused by `neoctl print ids`.
type LockInformation ¶
Commit a transaction. The new data is read-locked.
type LockedTransactions ¶
type LockedTransactions struct { }
Ask locked transactions to replay committed transactions that haven't been unlocked.
type Msg ¶
type Msg interface {
// contains filtered or unexported methods
}
Msg is the interface representing a NEO message.
type NodeID ¶
type NodeID int32
NodeID is a node identifier, 4-bytes signed integer
High-order byte:
7 6 5 4 3 2 1 0 | | | | +-+-+-+-- reserved (0) | +-+-+---------- node type +---------------- temporary if negative
NID namespaces are required to prevent conflicts when the master generate new nid before it knows nid of existing storage nodes. So only the high order bit is really important and the 31 other bits could be random. Extra namespace information and non-randomness of 3 LOB help to read logs.
0 is invalid NodeID XXX correct?
TODO -> back to 16-bytes randomly generated node IDs
type NodeInfo ¶
type NodeInfo struct { Type NodeType Addr Address // serving address NID NodeID State NodeState IdTime IdTime // XXX clarify semantic where it is used }
NodeInfo is information about a node.
type NotPrimaryMaster ¶
type NotPrimaryMaster struct { Primary NodeID // XXX PSignedNull in py KnownMasterList []struct { Address } }
Notify peer that I'm not the primary master. Attach any extra information to help the peer joining the cluster.
type NotifyClusterState ¶
type NotifyClusterState struct {
State ClusterState
}
Notify about a cluster state change.
type NotifyDeadlock ¶
Ask master to generate a new TTID that will be used by the client to solve a deadlock by rebasing the transaction on top of concurrent changes.
XXX -> Deadlocked?
type NotifyNodeInformation ¶
type NotifyNodeInformation struct { // NOTE in py this is monotonic_time() of call to broadcastNodesInformation() & friends IdTime IdTime NodeList []NodeInfo }
Notify information about one or more nodes.
type NotifyPartitionChanges ¶
type NotifyPartitionChanges struct { PTid NumReplicas uint32 // PNumber CellList []struct { Offset uint32 // PNumber XXX -> Pid CellInfo CellInfo } }
Notify about changes in the partition table.
type NotifyTransactionFinished ¶
Notify that a transaction blocking a replication is now finished.
type NotifyUnlockInformation ¶
Notify about a successfully committed transaction. The new data can be unlocked.
XXX -> InformationUnlocked?
type ObjectHistory ¶
Ask history information for a given object. The order of serials is descending, and the range is [first, last]. This maps to `history`.
type ObjectUndoSerial ¶
Ask storage the serial where object data is when undoing given transaction, for a list of OIDs.
object_tid_dict has the following format:
key: oid value: 3-tuple current_serial (TID) The latest serial visible to the undoing transaction. undo_serial (TID) Where undone data is (tid at which data is before given undo). is_current (bool) If current_serial's data is current on storage.
type PTid ¶
type PTid uint64
PTid is Partition Table identifier.
Zero value means "invalid id" (<-> None in py.PPTID)
type PartitionCorrupted ¶
Notify that mismatches were found while check replicas for a partition.
type PartitionList ¶
Ask information about partitions.
type PktHeaderN ¶
type PktHeaderN struct { ConnId packed.BE32 // NOTE is .msgid in py MsgCode packed.BE16 // payload message code MsgLen packed.BE32 // payload message length (excluding packet header) }
PktHeaderN represents header of a raw packet in 'N'-encoding.
A packet contains connection ID and message.
type PrimaryMaster ¶
type PrimaryMaster struct { }
Ask node identifier of the current primary master.
type RebaseObject ¶
Rebase an object change to solve a deadlock.
XXX: It is a request packet to simplify the implementation. For more
efficiency, this should be turned into a notification, and the RebaseTransaction should answered once all objects are rebased (so that the client can still wait on something).
type RebaseTransaction ¶
Rebase a transaction to solve a deadlock.
type Recovery ¶
type Recovery struct { }
Ask storage nodes data needed by master to recover. Reused by `neoctl print ids`.
type Repair ¶
type Repair struct { NodeList []NodeID // contains filtered or unexported fields }
Ask storage nodes to repair their databases.
type RepairOne ¶
type RepairOne struct {
// contains filtered or unexported fields
}
Repair is translated to this message, asking a specific storage node to repair its database.
type Replicate ¶
type Replicate struct { Tid zodb.Tid UpstreamName string SourceDict map[uint32]string // partition -> address FIXME string -> Address }
Notify a storage node to replicate partitions up to given 'tid' and from given sources.
- upstream_name: replicate from an upstream cluster
- address: address of the source storage node, or None if there's no new data up to 'tid' for the given partition
type ReplicationDone ¶
Notify the master node that a partition has been successfully replicated from a storage to another.
type RequestIdentification ¶
type RequestIdentification struct { NodeType NodeType // XXX name NID NodeID Address Address // where requesting node is also accepting connections ClusterName string IdTime IdTime // storage DevPath []string // [] of devid NewNID []uint32 // [] of PNumber }
Request a node identification. This must be the first message for any connection.
type SendPartitionTable ¶
Send the full partition table to admin/client/storage nodes on connection.
type SetNumReplicas ¶
type SetNumReplicas struct {
NumReplicas uint32 // PNumber
}
Set the number of replicas.
type StartOperation ¶
type StartOperation struct { // XXX: Is this boolean needed ? Maybe this // can be deduced from cluster state. Backup bool }
Tell a storage node to start operation. Before this message, it must only communicate with the primary master.
type StopOperation ¶
type StopOperation struct { }
Notify that the cluster is not operational anymore. Any operation between nodes must be aborted.
type StoreObject ¶
type StoreObject struct { Oid zodb.Oid Serial zodb.Tid Compression bool Checksum Checksum Data []byte // TODO -> msg.Buf, separately (for writev) DataSerial zodb.Tid Tid zodb.Tid }
Ask to create/modify an object. This maps to `store`.
As for IStorage, 'serial' is ZERO_TID for new objects.
type StoreTransaction ¶
type StoreTransaction struct { Tid zodb.Tid User string Description string Extension string OidList []zodb.Oid }
Ask to store a transaction. Implies vote.
type TransactionInformation ¶
Ask for transaction metadata.
type TweakPartitionTable ¶
Ask the master to balance the partition table, optionally excluding specific nodes in anticipation of removing them.
type UnfinishedTransactions ¶
type UnfinishedTransactions struct { RowList []struct { Offset uint32 // PNumber XXX -> Pid } }
Ask unfinished transactions, which will be replicated when they're finished.
type ValidateTransaction ¶
Do replay a committed transaction that was not unlocked.