Substrate Storage Deep dive.
Shawn Tabrizi Software developer @ Parity Technologies Ltd. [email protected] | @shawntabrizi Abstractions of Substrate Storage
Runtime Storage API
Overlay Change Set High Level Overview
Merkle Trie
Key Value Database Abstractions of Substrate Storage
● sp-io can write to storage with Runtime Storage API a given key + value ● Easy APIs generated through Overlay Change Set decl_storage! macro ● StorageValue, StorageMap, Merkle Trie StorageDoubleMap, etc...
Key Value Database Abstractions of Substrate Storage
● Stages changes to the Runtime Storage API underlying DB. ● Overlay changes are Overlay Change Set committed once per block. ● Two kinds of changes: Merkle Trie ○ Prospective Changes - what may happen. ○ Committed Changes - what will Key Value Database happen. Abstractions of Substrate Storage
● a.k.a. HashDB Runtime Storage API ● paritytech/trie ● Data structure on top of KVDB Overlay Change Set ● Arbitrary Key and Value length ● Nodes are Branches or Leaves Merkle Trie
Key Value Database Abstractions of Substrate Storage
● a.k.a. KVDB Runtime Storage API ● Implemented with RocksDB ● Hash -> Vec
Key (Hash 256) Value (Vec
0x92cdf578c47... [01]
Key Value Database 0x31237cdb79... [02]
0x581348337b... [03] Two Kinds of Keys!
● Trie key path ● KVDB key hash
Don’t worry, we will come back to this... Substrate uses a Base-16 Patricia Merkle Trie Merkle Tree
Root Hash Root Node Hash Can be used to verify (H0 + H1) two trees are the same.
Hash 0 Hash 1 Hash(H0-0 Hash(H1-0 Branch Nodes + H0-1) + H1-1)
Hash 0-0 Hash 0-1 Hash 1-0 Hash 1-1
Hash(D0) Hash(D1) Hash(D2) Hash(D3) Leaf Nodes
Data 0 Data 1 Data 2 Data 3 Merkle Tree
Root Hash Merkle tree allows Hash (H0 + H1) you to more easily prove that some
Hash 0 Hash 1 data exists within Hash(H0-0 Hash(H1-0 + H0-1) + H1-1) the tree with a “Merkle Proof”. Hash 0-0 Hash 0-1 Hash 1-0 Hash 1-1 Hash(D0) Hash(D1) Hash(D2) Hash(D3) More about that later. Data 0 Data 1 Data 2 Data 3 Patricia Trie ● Position in the tree p defines the associated key. ro ar
c spective t ity ● Space optimized for ure ess y icipate elements which share
1. parity 2. participate 3. party a prefix.
4. process 5. procure 6. prospective Beyond Binary Trees
● Branches can have more than two children. ● Everything is the same, just scaled up.
0 1 2 3 4 5 6 7 8 9 a b c d e f
A single hex character is called a “nibble”. Patricia Merkle
Creation of the Patricia Merkle Trie Let’s get visual. What we will be working with...
Literal KVDB Table Types of Nodes Virtual Trie Table Key Value Prefix Type Trie Key Path Value
0x8f35a27d9... [BRANCH] 00 Empty a 7 [BRANCH]
0x2ebcd78e8... [LEAF 00] 01 Leaf a 7 1 1 3 5 5 [LEAF 00]
[BRANCH w/ 10 Branch w/o value [BRANCH 0x27434bcd0... VAL 01] a 7 7 d 3 w/ VAL 01]
11 Branch w value [LEAF 02] 0x802c9c18c... [LEAF 02] a 7 7 d 3 3 7
[LEAF 03] 0x986d278c5... [LEAF 03] a 7 7 d 3 9 7 Node Structure Trie Node
header key children value Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value
11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 All nodes are present. prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value
11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 Nodes with a shared path are children of a branch. prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value You can then progress by 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 looking at the children of the branch. prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 This is a KVDB look up! prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value
11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 You can have a branch which also contains a value! prefix key-end value prefix key-end value
01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]
a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]
a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04
pre partial children value You reach the end when 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 there are no more branches. prefix key-end value prefix key-end value
01 7 02 01 7 03 What you just saw KVDB_LOOKUP(0xff1231a…) -> partial children ● Patricia provides the trie path. a7 0 1 2 3 4 5 6 7 8 9 a b c d e f
0xd378a45… 0xc489b56…
KVDB_LOOKUP(0xd378a4…) ->
prefix key-end value
01 1355 00
Trie Path: a731355 What you just saw Hash([NODE]) = 0xff1231a… partial children ● Patricia provides the trie path. a7 0 1 2 3 4 5 6 7 8 9 a b c d e f
● Merkle provides the recursive 0xd378a45… 0xc489b56… hashing of children nodes into Hash([NODE]) = 0xd378a45… the parent. prefix key-end value
01 1355 00 Two Kinds of Keys!
1. Trie key path is set by you! (e.g. “:CODE”)
○ Arbitrary length! ○ Trie Node ■ Header Info ■ Key Info Trie Node ■ Possible Children header key children value ■ Possible Value 2. KVDB key = Hash([Trie Node]) But wait… there’s more. Child Trie
StateDB Root Node KVDB_LOOKUP(0xd378…)
Child Trie Root Node
Leaf Node
Value: 0xd378a45…
* Child tries can be a different format than the Substrate StateDB. Prefix Trie
● Similar to Child Trie, but you Trie Key Path Value
[BRANCH] cannot get the Root Hash. a 7 a 7 1 1 3 5 5 [LEAF 00] [BRANCH a 7 7 d 3 w/ VAL 01] a 7 7 d 3 3 7 [LEAF 02] ● Probably something temporary a 7 7 d 3 9 7 [LEAF 03] while we fix pruning issues a 7 f 9 3 6 5 [LEAF 04] with child trie. Runtime Storage Trie Path (NEW)
All modules use a prefix trie now! (Long term, they probably become a child trie.)
● Storage Value ○ twox128(module) + twox128(storagename) ● linked_map and map ○ twox128(module) + twox128(storagename) + hasher(key) ● linked_map head ○ twox128(module) + twox128("HeadOf" + storagename) ● double_map ○ twox128(module) + twox128(storagename) + hasher(key1) + hasher(key2) Pruning
● For holding older Block 42 block states, and R cleaning it up.
0 1
0 1 0 1 ● Let’s update two 0 1 0 1 0 1 0 1 values in this trie. 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Pruning We create new
Block 42 Block 43 database entries,
R R but keep the old ones too! 0 1 1
0 1 0 1 1
0 1 0 1 0 1 0 1 0 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 Pruning
Block 42 Block 43 Block 44
R R R
0 1 1 1
0 1 0 1 1 1
0 1 0 1 0 1 0 1 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 Pruning
Eventually, we Block 43 Block 44 prune the old data. R R
0 1 1
0 1 0 1 1
0 1 0 1 0 1 0 1 1
0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 Merkle Trie Complexity Reading Data
Storage Read ● O(log n) reads. ● Not so great... Writing Data
Storage Read ● O(log n) reads, Hash Calculation hashes, and Storage Write writes needed. ● Very expensive for 1. Follow the trie path to the value. ○ O(log n) reads a database. 2. Write the new value. ○ 1 write 3. Calculate new hash ○ 1 hash 4. Repeat (2) + (3) up the trie path ○ O(log n) times Merkle Proof
Storage Read by Full Node ● O(log n) Data Sent to Light Client ● Great for light Computational Verification clients! ● Low bandwidth, 1. Full Node: Follow the trie path to the value. ○ O(log n) reads low computation 2. Full Node: Upload data of trie nodes.
3. Light Client: Download trie node data. 4. Light Client: Verify by hashing. ○ O(log n) hashes Best Practices
In general... Your fundamental goal is to minimize the amount of storage your runtime uses. You should only store consensus critical data in your runtime storage. Scenario: Decentralized Blog
● Runtime should be able ★ Store the text on IPFS to come to consensus ★ Store the IPFS hash about the content in a ➢ DO NOT store the text of blog post… the post in the storage! Struct or Multiple Values? In general… store a struct: ★ Less reads/writes to update ● Direct costs multiple values. ★ Less overall nodes in the trie. ○ O(log n) reads to get a value ★ Adding small items into large ○ O(log n) writes to update a value items accessed at the same time ● Indirect costs is essentially free!
○ Increase number of nodes (n) ○ Size of the value ➢ Less efficient for single value access. ➢ Upgrades requires storage migration. Define Your Storage Trie Path Generation
Foo: double_map hasher($hash1) u32, $hash2(u32) => u32
You can control the hashing algorithm used. By default, these are configured to use Blake2 256.
Final Trie Path:
twox128(module) + twox128(storagename) + hasher(key1) + hasher(key2) XXHash vs Blake2
● What hashing ● Blake2 algorithm should ○ Cryptographic but slow… ○ Use when user can influence the input I use for trie path to the hash. generation? ● XXHash (twox)
○ Non-cryptographic, but blazing fast… ○ When you (the runtime developer) controls this value, this is fine! Unbalanced Trie
● Can happen if a user can influence the trie path. ● Operations are no longer O(log n)! Lists
● Vec: For storing a bounded number of values.
○ Good for when you need to change multiple values at a time (single read/write). ○ Enables iteration. Ex: The current validator set. ● Map: For storing an unbounded number of values.
○ Good for random access to data. Ex: User balances. ● Linked Map: For storing unbounded amount of data, but UI or an offchain worker needs to iterate on all the entries.
○ Ex: The list of nominators and their nominations. Abstractions of Substrate Storage
Runtime Storage API Think about all the
Overlay Change Set layers when you are writing to Substrate Merkle Trie storage. Key Value Database Questions? [email protected] @shawntabrizi Runtime
SR-IO WASM WASM EXEC
EXTERNALites / Overlay changes Runtime Storage API BACKEND
IN-MEMORY Storage Overlay Change Set (storage proof) trie db backend
state cache
Merkle Trie State db
Hash db paritytech/trie Key Value Database KVDB
ROCKS DB