The chidb Architecture¶
This section describes the chidb architecture, summarized in the following figure:
- Backend
Contains the B-Tree module and the Pager module. The B-Tree module is responsible for managing a collection of file-based B-Trees, using the chidb file format. However, the B-Tree module does not include any I/O code. All I/O is delegated to the Pager, which provides a page-by-page access to a chidb file. The Pager may include a page cache to optimize disk access.
The specifications of the chidb file format can be found in The chidb File Format
- Core
Contains the chidb API, a SQL compiler, and a database machine (or DBM). The API is the interface that other applications must go through to use a chidb file, and allows client software to open a chidb file, execute SQL statements on that file, and close the file. When a SQL statement is submitted, it is processed by the SQL compiler. The compiler itself is divided into two modules: a SQL parser, which produces a convenient in-memory representation of a SQL query, and the code generator and optimizer, which generates code for the database machine. The database machine is a virtual machine specifically designed to operate on chidb files, and includes instructions such as “Create a new table”, “Find a record with key \(k\)”, etc.
- Accessories
Includes a utilities modules, with miscellaneous code that are used across all modules, and testing code.
chidb API¶
The chidb API comprises a set of functions that allows client software to access and manipulate chidb files, including executing SQL statements on them. Unless otherwise noted, each API function returns one of the return codes listed in the following table. This table lists every possible return code; the description of each API function notes what return codes can be returned by that specific function.
Name |
Integer code |
Description |
---|---|---|
|
0 |
Succesful result |
|
1 |
Invalid SQL |
|
2 |
Could not allocate memory |
|
3 |
Unable to open the database file |
|
4 |
The database file is not well formed |
|
5 |
SQL statement failed because of a constraint violation |
|
6 |
Data type mismatch |
|
7 |
An I/O error has occurred when accessing the file |
|
8 |
API used incorrectly |
|
100 |
|
|
101 |
|
chidb_open
¶
The chidb_open
function is used to open a chidb file. Its signature
is the following:
int chidb_open(
const char* file,
chidb** db
);
file
is the filename of the chidb file to open. If the file does not
exist, it will be created. db
is used to return a pointer to a
chidb
variable. The chidb
type is an opaque type representing
a chidb database. In other words, an API user should not be concerned
with what is contained in a chidb
variable, and should simply use it
as a representation of a chidb database to pass along to other API
functions.
The return value of the function can be CHIDB_OK
, CHIDB_ENOMEM
,
CHIDB_ECANTOPEN
, CHIDB_ECORRUPT
, or CHIDB_EIO
.
chidb_close
¶
The chidb_close
function is used to close a chidb file. Its
signature is the following:
int chidb_close(chidb *db);
db
is the database to close.
The return value of the function can be CHIDB_OK
or
CHIDB_EMISUSE
(if called on a database that is already closed).
int chidb_prepare
¶
The chidb_prepare
function is used to prepare a SQL statement for
execution. Internally, this will require compiling the SQL statement
(but not running it yet). The function’s signature is:
int chidb_prepare(
chidb* db,
const char* sql,
chidb_stmt** stmt
);
db
is the database on which to run the SQL statement. sql
is the
SQL statement itself. stmt
is used to return a pointer to a
chidb_stmt
variable. The chidb_stmt
type is an opaque type
representing a SQL statement.
The return value of the function can be CHIDB_OK
,
CHIDB_EINVALIDSQL
, CHIDB_ENOMEM
.
int chidb_step
¶
The chidb_step
function runs a prepared SQL statement until a result
row is available (or just runs the SQL statement to completion if it is
not meant to produce a result row, such as an INSERT statement). Its
signature is:
int chidb_step(chidb_stmt *stmt);
stmt
is the SQL statement to run.
If the statement is a SELECT statement, chidb_step
returns
CHIDB_ROW
each time a result row is produced. The values of the
result row can be accessed using the column access functions described
below. Thus, chidb_step
has to be called repeatedly to access all
the rows returned by the query. Once there are no more rows left, or if
the statement is not meant to produce any results, then CHIDB_DONE
is returned (note that this function does not return CHIDB_OK
).
The function can also return CHIDB_ECONSTRAINT
, CHIDB_EMISMATCH
,
CHIDB_EMISUSE
(if called on a finalized SQL statement), or
CHIDB_EIO
.
int chidb_finalize
¶
The chidb_finalize
function finalizes a SQL statement, freeing all
resources associated with it.
int chidb_finalize(chidb_stmt *stmt);
stmt
is the SQL statement to finalize.
The return value of the function can be CHIDB_OK
or
CHIDB_EMISUSE
(if called on a statement that is already finalized).
Column access functions¶
Once a SQL statement has been prepared, the following three functions can be used to obtain information on the columns of the rows that will be returned by the statement:
int chidb_column_count(chidb_stmt *stmt);
int chidb_column_type(
chidb_stmt* stmt,
int col
);
const char *chidb_column_name(
chidb_stmt* stmt,
int col
);
In all these functions, the stmt
parameter is the prepared SQL
statement.
chidb_column_count
returns the number of columns in the result rows.
If the SQL statement is not meant to produce any results (such as an
INSERT statement), then 0 is returned.
chidb_column_type
returns the type of column col
(columns are
numbered from 0). The supported types are summarized in the folllowing
table.
Header Value |
SQL Type |
Description |
---|---|---|
0 |
|
Null value |
1 |
|
1-byte signed integer |
2 |
|
2-byte signed integer |
4 |
|
4-byte signed integer |
\(2\cdot\texttt{n}+13\) |
|
Character string |
chidb_column_name
returns a pointer to a null-terminated string
containing the name of column col
. The API client does not have to
free()
the returned string. It is the API function’s responsibility
to allocate and free the memory for this string.
If chidb_step
returns CHIDB_ROW
, the following two functions can
be used to access the contents of each column:
int chidb_column_int(
chidb_stmt* stmt,
int col
);
const char *chidb_column_text(
chidb_stmt* stmt,
int col
);
In all these functions, the stmt
parameter is the SQL statement.
chidb_column_int
returns the integer value in column col
of the
row. The column must be of type BYTE
, SMALLINT
, or INTEGER
.
chidb_column_text
returns a pointer to a null-terminated string
containing the value in column col
of the row. The API client does
not have to free()
the returned string. It is the API function’s
responsibility to allocate and free the memory for this string.
Note that none of these functions return error codes. Calling the column access functions on an unprepared statement, or accessing column values on a statement that has not produced a result row, will produce unexpected behaviour.
Database Machine¶
The database machine (or DBM) is a computing machine specifically designed to operate on chidb files, and includes instructions such as “Create a new table”, “Find a record with key \(k\)”, etc. The DBM architecture is summarized in the following figure:
- The DBM program
A DBM program is composed of one or more DBM instructions. An instruction contains an operation code and up to four operands: P1, P2, P3, and P4. P1 through P3 are signed 32-bit integers, and P4 is a pointer to a null-terminated string. All the DBM instructions are described below. Instructions in the DBM program are numbered from 0.
- Program counter
Keeps track of what instruction is currently being executed. Certain instructions can directly modify the program counter to jump to a specific instruction in the program.
- Registers
The DBM has an unlimited number of registers. A register can contain a 32-bit signed integer, a pointer to a null-terminated string or to raw binary data, or a NULL value. Registers are numbered from 0.
- Cursors
A cursor is a pointer to a specific entry in a B-Tree. Cursors must be able to move to the next or previous entry in a B-Tree in amortized \(O(1)\) time. Hint: this will require an auxiliary data structure of size \(O(\log(n))\).
A DBM starts executing a program on the \(0^{\textrm{th}}\)
instruction, and executes subsequent instructions sequentially until a
Halt
instruction is encountered, or until the program counter
advances past the end of the program (which is equivalent to a Halt
instruction with all its parameters set to 0). Note that it is also
possible for individual instructions to fail, resulting in a premature
termination of the program.
Register Manipulation Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
An integer \(i\) |
A register \(r\) |
Store \(i\) in \(r\). |
||
|
A length \(l\) |
A register \(r\) |
A string \(s\) |
Store \(s\) (with length \(l\)) in \(r\). |
|
|
A register \(r\) |
Store a null value in \(r\). |
|||
|
A register \(r_1\) |
A register \(r_2\) |
Make a shallow copy of the contents of \(r_1\) into \(r_2\). i.e., \(r_2\) must be left pointing to the same value as \(r_1\). |
Control Flow Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A register \(r_1\) |
A jump address \(j\) |
A register \(r_2\) |
If the contents of \(r_1\) are equal to the contents of \(r_2\), jump to \(j\). Otherwise, do nothing. This instruction assumes that the types of the contents of both registers are the same. The behaviour when either of the registers is NULL is undefined. |
|
|
Same as |
||||
|
Same as |
||||
|
Same as |
||||
|
Same as |
||||
|
Same as |
||||
|
An integer \(n\) |
An error message \(s\) |
Halt execution of the database machine and return error code \(n\). If \(n\neq 0\), set the machine’s error message to \(s\). |
||
|
Does nothing. |
Note: Unless instructed otherwise, comparison of binary registers is not necessary. If instructed to implement it, then comparisons must be made byte by byte. Two binary blobs are equal if and only if they have the same length and contain the exact same bytes. If two binary blobs have different lengths, order is determined by the common bytes between the two blobs. If the common bytes are equal, then the blob with the fewer bytes is considered to be less than the blob with more bytes.
Database Opening and Closing Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A cursor \(c\) |
A register \(r\). The register must contain a page number \(n\) |
The number of columns in the table (0 if opening an index) |
Opens the B-Tree rooted at the page \(n\) for read-only access and stores a cursor for it in \(c\) |
|
|
Same as |
||||
|
A cursor \(c\) |
Closes cursor \(c\) and frees up any resources associated with it. |
Cursor Manipulation Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A cursor \(c\) |
A jump address \(j\) |
Makes cursor \(c\) point to the first entry in the B-Tree. If the B-Tree is empty, then jump to \(j\). |
||
|
A cursor \(c\) |
A jump address \(j\) |
Advance cursor \(c\) to the next entry in the B-Tree and jump to \(j\). If there are no more entries (if cursor \(c\) was pointing at the last entry in the B-Tree), do nothing. |
||
|
Same as |
||||
|
A cursor \(c\) |
A jump address \(j\) |
A register \(r\). The register must contain a key \(k\) |
Move cursor \(c\) to point to the entry with key equal to \(k\). If the cursor points to an index B-Tree, the key that must match is \(IdxKey\) in the \((IdxKey,PKey)\) pair. If the B-Tree doesn’t contain \(k\), jump to \(j\). |
|
|
Same as |
||||
|
Same as |
||||
|
Same as |
||||
|
Same as |
||||
|
A cursor \(c\) |
A jump address \(j\) |
A register \(r\). The register must contain a key \(k\) |
Cursor \(c\) points to an index entry containing a \((IdxKey,PKey)\) pair. If \(IdxKey\) is greater than \(k\), jump to \(j\). Otherwise, do nothing. |
|
|
Same as |
||||
|
Same as |
||||
|
Same as |
Cursor Access Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A cursor \(c\) |
A column number \(n\) |
A register \(r\). |
Store in register \(r\) the value in the \(n\)-th column of the entry pointed at by cursor \(c\). Columns are numbered from 0. |
|
|
A cursor \(c\) |
A register \(r\). |
Store in register \(r\) the value of the key of the entry pointed at by cursor \(c\). |
||
|
A cursor \(c\) |
A register \(r\). |
Cursor \(c\) points to an index entry containing a \((IdxKey,PKey)\) pair. Store \(PKey\) in \(r\). |
Database Record Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A register \(r_1\) |
An integer \(n\) |
A register \(r_2\) |
Create a database record using the values from registers \(r_1\) through \(r_1+n-1\), and store the record in \(r_2\). |
|
|
A register \(r\) |
An integer \(n\) |
This instructions indicates that a result row has been produced and pauses execution for the database machine user to fetch the result row. The result row is formed by the values stored in registers \(r\) through \(r+n-1\). |
Insertion Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A cursor \(c\) |
A register \(r_1\). The register must contain a database record \(v\) |
A register \(r_2\). The register must contain a key \(k\) |
Inserts an entry, with key \(k\) and value \(v\), in the B-Tree pointed at by cursor \(c\). |
|
|
A cursor \(c\) |
A register \(r_1\). The register must contain a key \(IdxKey\) |
A register \(r_2\). The register must contain a key \(PKey\) |
Add a new \((IdxKey,PKey)\) entry in the index B-Tree pointed at by cursor \(c\). |
B-Tree Creation Instructions¶
Instruction |
P1 |
P2 |
P3 |
P4 |
Description |
---|---|---|---|---|---|
|
A register \(r\) |
Create a new table B-Tree and store its root page in \(r\). |
|||
|
Same as |