Let’s take a look at what how Satoshi envisioned optimizing the storage of Bitcoin transactions. First, we need to examine Bitcoin blocks’ structure. Here’s what Satoshi has to say about it:
As usual, let’s break it down!
Satoshi starts off by stating that older transactions don’t need to be stored.
“Once the latest transaction in a coin is buried under enough blocks, the spent transactions before it can be discarded to save disk space.”
Put simply, once enough blocks have been confirmed in front of a block, we can be sure that it is probabilistically impossible to overwrite that block because it would require too much hash power from an attacker.
Recall that the hash of each block contains the hash of the previous block, the transactions in the current block, and the nonce. If we discard the previous transactions, we would break the current block’s hash. Satoshi argues that we need some way to discard old transactions, but without breaking the block’s hash. It makes sense. Those transactions take up valuable disk space. So what to do?
Instead of discarding the transactions outright, Satoshi proposes we hash the transactions into a Merkle Tree.
“To facilitate this without breaking the block’s hash, transactions are hashed in a Merkle Tree, with only the root included in the block’s hash.”
A Merkle Tree is a data structure first proposed by Ralph Merkle in 1979. It is essentially a combination of a binary tree and a linked list with some hashing. Hash trees allow efficient storage of large data structures and secure verification of the contents when they are retrieved at a later time.
Let’s assume we want to store some data in a Merkle Tree format. We start by splitting that data into smaller chunks. Then we pair those data chunks to create a new data hash. The process repeats itself until the total number of hashes becomes only one: the root hash.
This root hash represents the top of the Merkle Tree and serves as a key for every value stored inside it. Beginning from the root node of the tree, the root will tell you which child node to follow to get to the corresponding value, which is stored in the leaf nodes.
To build the Merkle Tree, transactions are first paired together to create a hash (if an odd number of transactions exists, then the last transaction is duplicated). A hash is created for each pair. These hashes are paired up again to create new hashes, and so on, until eventually only a single hash value remains. This is the root hash of the block.
Satoshi says the branches of the Merkle Tree don’t need to be stored; all we need to do is store the root hash. That saves a lot of disk space.
“Old blocks can then be compacted by stubbing off branches of the tree. The interior hashes do not need to be stored.”
Satoshi then goes on to show some math. Yes, it looks complicated, but stick with me!
“A block header alone, without including stored transactions, would be about 80 bytes. At the rate of 10 blocks per minute, we get (80 bytes) * (60 minutes per hour / 10 minutes per block) * (24 hours per day) * (365 days per year) = 4.2MB per year.”
Therefore, the blockchain would grow at a rate of 4.2MB per year. Why does this matter? Remember that in a decentralized setting, there are many nodes that are cooperating to broadcast, propagate, validate, and confirm the transactions in the next block. Any node that wants to be part of the network needs to store the blockchain on its hard drive.
If the blockchain gets too large, nodes that do not have enough RAM cannot store the blockchain on their computer. This limits the number of nodes that can participate in the network, thereby making the network less decentralized.
Satoshi’s approach of using Merkle Trees massively reduces the size of the payload that nodes receive.
Of course, the devil is in the details. If you are aware of the blockchain scaling problem, then you know that it is not as simple as just using Merkle Trees to reduce block size. We will get into the Bitcoin scaling stuff in a later lesson when we discuss Bitcoin in practice, but for now, let’s finish the rest of the Bitcoin whitepaper as Satoshi originally envisioned it.
In the next section, Satoshi explains the concept of “Simplified Payment Verification,” where nodes can still validate transactions without running a full network node. Stay tuned! 🙂