Substrate Storage Deep dive.

Shawn Tabrizi Software developer @ Parity Technologies Ltd. [email protected] | @shawntabrizi Abstractions of Substrate Storage

Runtime Storage API

Overlay Change High Level Overview

Merkle

Key Value Database Abstractions of Substrate Storage

● sp-io can write to storage with Runtime Storage API a given + value ● Easy APIs generated through Overlay Change Set decl_storage! macro ● StorageValue, StorageMap, Merkle Trie StorageDoubleMap, etc...

Key Value Database Abstractions of Substrate Storage

● Stages changes to the Runtime Storage API underlying DB. ● Overlay changes are Overlay Change Set committed once per block. ● Two kinds of changes: Merkle Trie ○ Prospective Changes - what may happen. ○ Committed Changes - what will Key Value Database happen. Abstractions of Substrate Storage

● a.k.a. HashDB Runtime Storage API ● paritytech/trie ● on top of KVDB Overlay Change Set ● Arbitrary Key and Value length ● Nodes are Branches or Leaves Merkle Trie

Key Value Database Abstractions of Substrate Storage

● a.k.a. KVDB Runtime Storage API ● Implemented with RocksDB ● Hash -> Vec Overlay Change Set ● Substrate: Blake2 256

Key (Hash 256) Value (Vec) Merkle Trie 0x0fd923ca5e7... [00]

0x92cdf578c47... [01]

Key Value Database 0x31237cdb79... [02]

0x581348337b... [03] Two Kinds of Keys!

● Trie key path ● KVDB key hash

Don’t worry, we will come back to this... Substrate uses a Base-16 Patricia Merkle Trie Merkle

Root Hash Root Node Hash Can be used to verify (H0 + H1) two trees are the same.

Hash 0 Hash 1 Hash(H0-0 Hash(H1-0 Branch Nodes + H0-1) + H1-1)

Hash 0-0 Hash 0-1 Hash 1-0 Hash 1-1

Hash(D0) Hash(D1) Hash(D2) Hash(D3) Leaf Nodes

Data 0 Data 1 Data 2 Data 3 Merkle Tree

Root Hash Merkle tree allows Hash (H0 + H1) you to more easily prove that some

Hash 0 Hash 1 data exists within Hash(H0-0 Hash(H1-0 + H0-1) + H1-1) the tree with a “Merkle Proof”. Hash 0-0 Hash 0-1 Hash 1-0 Hash 1-1 Hash(D0) Hash(D1) Hash(D2) Hash(D3) More about that later. Data 0 Data 1 Data 2 Data 3 Patricia Trie ● Position in the tree p defines the associated key. ro ar

c spective t ity ● Space optimized for ure ess y icipate elements which

1. parity 2. participate 3. party a prefix.

4. process 5. procure 6. prospective Beyond Binary Trees

● Branches can have more than two children. ● Everything is the same, just scaled up.

0 1 2 3 4 5 6 7 8 9 a b c d e f

A single hex character is called a “nibble”. Patricia Merkle

Creation of the Patricia Merkle Trie Let’s get visual. What we will be working with...

Literal KVDB Table Types of Nodes Virtual Trie Table Key Value Prefix Type Trie Key Path Value

0x8f35a27d9... [BRANCH] 00 Empty a 7 [BRANCH]

0x2ebcd78e8... [LEAF 00] 01 Leaf a 7 1 1 3 5 5 [LEAF 00]

[BRANCH w/ 10 Branch w/o value [BRANCH 0x27434bcd0... VAL 01] a 7 7 d 3 w/ VAL 01]

11 Branch w value [LEAF 02] 0x802c9c18c... [LEAF 02] a 7 7 d 3 3 7

[LEAF 03] 0x986d278c5... [LEAF 03] a 7 7 d 3 9 7 Node Structure Trie Node

header key children value Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value

11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 All nodes are present. prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value

11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 Nodes with a shared path are children of a branch. prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value You can then progress by 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 looking at the children of the branch. prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 This is a KVDB look up! prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value

11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 You can have a branch which also contains a value! prefix key-end value prefix key-end value

01 7 02 01 7 03 Visual of the Substrate State Trie Trie Key Path Value a 7 [BRANCH]

a 7 1 1 3 5 5 [LEAF 00] pre partial children [BRANCH a 7 7 d 3 w/ VAL 01] 10 a7 0 1 2 3 4 5 6 7 8 9 a b c d e f a 7 7 d 3 3 7 [LEAF 02]

a 7 7 d 3 9 7 [LEAF 03] prefix key-end value prefix key-end value a 7 f 9 3 6 5 [LEAF 04] 01 1355 00 01 9365 04

pre partial children value You reach the end when 11 d3 0 1 2 3 4 5 6 7 8 9 a b c d e f 01 there are no more branches. prefix key-end value prefix key-end value

01 7 02 01 7 03 What you just saw KVDB_LOOKUP(0xff1231a…) -> partial children ● Patricia provides the trie path. a7 0 1 2 3 4 5 6 7 8 9 a b c d e f

0xd378a45… 0xc489b56…

KVDB_LOOKUP(0xd378a4…) ->

prefix key-end value

01 1355 00

Trie Path: a731355 What you just saw Hash([NODE]) = 0xff1231a… partial children ● Patricia provides the trie path. a7 0 1 2 3 4 5 6 7 8 9 a b c d e f

● Merkle provides the recursive 0xd378a45… 0xc489b56… hashing of children nodes into Hash([NODE]) = 0xd378a45… the parent. prefix key-end value

01 1355 00 Two Kinds of Keys!

1. Trie key path is set by you! (e.g. “:CODE”)

○ Arbitrary length! ○ Trie Node ■ Header Info ■ Key Info Trie Node ■ Possible Children header key children value ■ Possible Value 2. KVDB key = Hash([Trie Node]) But wait… there’s more. Child Trie

StateDB Root Node KVDB_LOOKUP(0xd378…)

Child Trie Root Node

Leaf Node

Value: 0xd378a45…

* Child can be a different format than the Substrate StateDB. Prefix Trie

● Similar to Child Trie, but you Trie Key Path Value

[BRANCH] cannot get the Root Hash. a 7 a 7 1 1 3 5 5 [LEAF 00] [BRANCH a 7 7 d 3 w/ VAL 01] a 7 7 d 3 3 7 [LEAF 02] ● Probably something temporary a 7 7 d 3 9 7 [LEAF 03] while we fix pruning issues a 7 f 9 3 6 5 [LEAF 04] with child trie. Runtime Storage Trie Path (NEW)

All modules use a prefix trie now! (Long term, they probably become a child trie.)

● Storage Value ○ twox128(module) + twox128(storagename) ● linked_map and map ○ twox128(module) + twox128(storagename) + hasher(key) ● linked_map head ○ twox128(module) + twox128("HeadOf" + storagename) ● double_map ○ twox128(module) + twox128(storagename) + hasher(key1) + hasher(key2) Pruning

● For holding older Block 42 block states, and R cleaning it up.

0 1

0 1 0 1 ● Let’s update two 0 1 0 1 0 1 0 1 values in this trie. 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Pruning We create new

Block 42 Block 43 database entries,

R R but keep the old ones too! 0 1 1

0 1 0 1 1

0 1 0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 Pruning

Block 42 Block 43 Block 44

R R R

0 1 1 1

0 1 0 1 1 1

0 1 0 1 0 1 0 1 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 1 1 0 Pruning

Eventually, we Block 43 Block 44 prune the old data. R R

0 1 1

0 1 0 1 1

0 1 0 1 0 1 0 1 1

0 1 0 1 0 1 0 1 0 1 0 1 0 0 1 1 0 Merkle Trie Complexity Reading Data

Storage Read ● O(log n) reads. ● Not so great... Writing Data

Storage Read ● O(log n) reads, Hash Calculation hashes, and Storage Write writes needed. ● Very expensive for 1. Follow the trie path to the value. ○ O(log n) reads a database. 2. Write the new value. ○ 1 write 3. Calculate new hash ○ 1 hash 4. Repeat (2) + (3) up the trie path ○ O(log n) times Merkle Proof

Storage Read by Full Node ● O(log n) Data Sent to Light ● Great for light Computational Verification clients! ● Low bandwidth, 1. Full Node: Follow the trie path to the value. ○ O(log n) reads low computation 2. Full Node: Upload data of trie nodes.

3. Light Client: Download trie node data. 4. Light Client: Verify by hashing. ○ O(log n) hashes Best Practices

In general... Your fundamental goal is to minimize the amount of storage your runtime uses. You should only store consensus critical data in your runtime storage. Scenario: Decentralized Blog

● Runtime should be able ★ Store the text on IPFS to come to consensus ★ Store the IPFS hash about the content in a ➢ DO NOT store the text of blog post… the post in the storage! Struct or Multiple Values? In general… store a struct: ★ Less reads/writes to update ● Direct costs multiple values. ★ Less overall nodes in the trie. ○ O(log n) reads to get a value ★ Adding small items into large ○ O(log n) writes to update a value items accessed at the same time ● Indirect costs is essentially free!

○ Increase number of nodes (n) ○ Size of the value ➢ Less efficient for single value access. ➢ Upgrades requires storage migration. Define Your Storage Trie Path Generation

Foo: double_map hasher($hash1) u32, $hash2(u32) => u32

You can control the hashing algorithm used. By default, these are configured to use Blake2 256.

Final Trie Path:

twox128(module) + twox128(storagename) + hasher(key1) + hasher(key2) XXHash vs Blake2

● What hashing ● Blake2 algorithm should ○ Cryptographic but slow… ○ Use when user can influence the input I use for trie path to the hash. generation? ● XXHash (twox)

○ Non-cryptographic, but blazing fast… ○ When you (the runtime developer) controls this value, this is fine! Unbalanced Trie

● Can happen if a user can influence the trie path. ● Operations are no longer O(log n)! Lists

● Vec: For storing a bounded number of values.

○ Good for when you need to change multiple values at a time (single read/write). ○ Enables iteration. Ex: The current validator set. ● Map: For storing an unbounded number of values.

○ Good for random access to data. Ex: User balances. ● Linked Map: For storing unbounded amount of data, but UI or an offchain worker needs to iterate on all the entries.

○ Ex: The list of nominators and their nominations. Abstractions of Substrate Storage

Runtime Storage API Think about all the

Overlay Change Set layers when you are writing to Substrate Merkle Trie storage. Key Value Database Questions? [email protected] @shawntabrizi Runtime

SR-IO WASM WASM EXEC

EXTERNALites / Overlay changes Runtime Storage API BACKEND

IN-MEMORY Storage Overlay Change Set (storage proof) trie db backend

state cache

Merkle Trie State db

Hash db paritytech/trie Key Value Database KVDB

ROCKS DB