Merkle trees, also known as Hash Trees, are well researched data structures used to verify data stored and transmitted between computers.
To construct a Merkle tree, we must first take the underlying data, and divide it into fixed size pages. With the page size defined, we then compute the Merkle tree. This is done as follows:
- For each page, we take a hash of the data.
- For every pair of hashes at the same level, we compute a hash and place it at a higher level in the tree.
- We then recursively repeat this process until we are left with only a single hash.
This process of constructing the Merkle tree:
Comparing two byte buffers via Merkle trees is a low cost exercise - if the two top level nodes match, the byte buffers are identical. If they do not, it is a trivial matter of walking down the tree to different pages by simply following paths where node hashes differ.
The comparison process can be seen below. Let's consider the case where the left tree represents the source, and the right hand represents the destination tree, and we have a single difference in the sixth page.
For all nodes, with the exception of A, C, F and 6, the hashes are identical. When starting the compare process, work starts at the top of the tree. The system can instantly tell that the two buffers differ after comparing A & A' and more comparison work is warranted. It can then compare node by node as it steps down - it can quickly exclude checking anything below the B page, since it matches, and focus on the path down to 6. In this tiny example with 8 pages, it takes a handful of steps to detect that page 6 differs — the pay off in reduced compute improves dramatically with large buffers holding thousands of pages and accepting sparse updates - as is the typical case for bigger reference data repositories.
Note that there are multiple ways to traverse the tree, and you should consider the approach used based upon your requirements.
- Created 8 December 2020
- Updated 12 December 2020 with link to the original paper
- 🌿reading time
1 min readpublished
continuousTopicsData Structure--- Views