Merkle Trees

Useful to verify any kind of data stored, handled and transferred in and between computers
ðŸŠĶ End of Life. This note is no longer maintained, and is scheduled for deletion.

Merkle trees, also known as Hash Trees, are well researched data structures used to verify data stored and transmitted between computers.

To construct a Merkle tree, we must first take the underlying data, and divide it into fixed size pages. With the page size defined, we then compute the Merkle tree. This is done as follows:

  • For each page, we take a hash of the data.
  • For every pair of hashes at the same level, we compute a hash and place it at a higher level in the tree.
  • We then recursively repeat this process until we are left with only a single hash.

This process of constructing the Merkle tree:

Comparing two byte buffers via Merkle trees is a low cost exercise - if the two top level nodes match, the byte buffers are identical. If they do not, it is a trivial matter of walking down the tree to different pages by simply following paths where node hashes differ.

The comparison process can be seen below. Let's consider the case where the left tree represents the source, and the right hand represents the destination tree, and we have a single difference in the sixth page.

For all nodes, with the exception of A, C, F and 6, the hashes are identical. When starting the compare process, work starts at the top of the tree. The system can instantly tell that the two buffers differ after comparing A & A' and more comparison work is warranted. It can then compare node by node as it steps down - it can quickly exclude checking anything below the B page, since it matches, and focus on the path down to 6. In this tiny example with 8 pages, it takes a handful of steps to detect that page 6 differs — the pay off in reduced compute improves dramatically with large buffers holding thousands of pages and accepting sparse updates - as is the typical case for bigger reference data repositories.

Note that there are multiple ways to traverse the tree, and you should consider the approach used based upon your requirements.



References

Merkle, R. C. (1988). "A Digital Signature Based on a Conventional Encryption Function". Advances in Cryptology — CRYPTO '87. Lecture Notes in Computer Science. 293. pp. 369–378. doi:10.1007/3-540-48184-2_32

Szydlo M. (2004) Merkle Tree Traversal in Log Space and Time. In: Cachin C., Camenisch J.L. (eds) Advances in Cryptology - EUROCRYPT 2004. EUROCRYPT 2004. Lecture Notes in Computer Science, vol 3027. Springer, Berlin, Heidelberg. doi:10.1007/978-3-540-24676-3_32


Change log

  • Created 8 December 2020
  • Updated 12 December 2020 with link to the original paper
Metadata
ðŸŒŋ
reading time
1 min read
published
2020-12-08
last updated
2020-12-12
importance
low
review policy
continuous
Topics
Data Structure
--- Views

ÂĐ 2009-2023 Shaun Laurens. All Rights Reserved.