Writing a storage engine in Rust: Writing a persistent BTree (Part 1)

As part of a recent personal journey to better understand databases and better learn Rust, I have recently took on the project of writing a simple key-value storage engine. Crazy, right? Lets get started!

B-Trees vs LSM Trees

The first thing one must think of when writing a key-value storage engine is -”How should I store my data so that I can perform reads and writes efficiently?”; The two most common approaches for storing data on disk are B-Trees and LSM Trees.

From the “BTree” wikipedia page — a tree with up to four keys in each node and five children in each internal node
From the “LSM-Tree” wikipedia page

A Persistent B-Tree

After considering the alternatives I’ve decided to write a “traditional” BTree.

| IS-ROOT 1-byte| NODE-TYPE 1-byte | PARENT OFFSET - 8 bytes |      | Number of children - 8 bytes |
| Key #0 - ? bytes | Key #2 - ? bytes | ...
| Child Offset #0 - 8 bytes| Child offset #1 - 8 bytes | ...
| IS-ROOT 1-byte| NODE-TYPE 1-byte | PARENT OFFSET - 8 bytes |      | Number of pairs - 8 bytes |
| Key #0 - ? bytes | Value #0 - ? bytes | ...
| Key #N - ? bytes | Value #N - ? bytes |
SPACE_FOR_PAIRS = PAGE_SIZE — HEADER_SIZE = 4096–18 = 4078 bytes.MAX_SIZE_OF_PAIR = SPACE_FOR_PAIRS / MAX_BRANCHING_FACTOR = 
4078 / 200 ~= 20 bytes. (78 bytes remain as junk)

A Node struct:

Lets look at a Node struct capturing the design we have thus far-

A Page struct:

This is a simple wrapper for a page of memory.

Conclusion:

In this first chapter we have designed our on disk page (i.e. node) structure and written methods to serialize nodes to pages and back. In the next chapter we will start writing the BTree.

Software Engineer; Rock Climber; Hiker; Runner; Father; Partner.