Custom Types I
Overview
Over the course of the next two lectures, we will be implementing a linked list together in Rust! Linked lists are not all that useful in most situations, but they are very simple to understand, and their pointer-y nature will give us a challenge in Rust, helping to solidify your understanding of ownership, borrowing, lifetimes, and error handling.
New material
Here’s a recap of new things introduced today, for anyone wanting a reference after watching the lecture:
- You can define your own types/structs using this syntax:
struct MyType { field1: i32, field2: String, }
- You can initialize a struct using this syntax:
let my_object = MyType {field1: 1, field2: "Hello World".to_string()};
- The
Box
type stores a pointer to heap-allocated memory. You can put anything inside aBox
; aBox<u32>
is a heap-allocated integer (not sure why you’d do this, but you can), and aBox<Node>
is a heap-allocatedNode
.Box::new(...)
allocates memory and initializes it to...
, and theBox
drop function frees the heap memory whenever theBox
is dropped.- If you’re more familiar with C++, this is exactly the same as
std::unique_ptr
.
- If you’re more familiar with C++, this is exactly the same as
- If you have a reference to an Option with something inside (
&Option<T>
) and you want an Option containing a reference to that thing (Option<&T>
), you can callOption::as_ref
. This method peeks inside the Option that you have a reference to. If that Option isNone
, then it just returns youNone
. If that Option has something inside, then it peeks in, grabs a reference to that thing, and returns you a new Option that isSome(that reference)
. You may never need to use this unless you’re implementing data structury code, and it’s totally possible to get by without it (you can do this using a 4-linematch
expression), but it’s a handy trick to have in your arsenal.
Linked lists
As opposed to vectors, which store data in a contiguous array, linked lists store elements in separate heap-allocated nodes, where each node points to the next node in the list.
A linked list in C
To implement this in C, we can define a node struct
containing a value and a
pointer to the next node in the list:
struct Node {
int value;
Node* next;
}
Then, in order to create a linked list, we can allocate a few nodes, assign
them values, and set their next
pointers to point to each other:
int main() {
Node* first = (Node*)malloc(sizeof(Node));
first.value = 1;
Node* second = (Node*)malloc(sizeof(Node));
second.value = 2;
Node* third = (Node*)malloc(sizeof(Node));
third.value = 3;
first.next = second;
second.next = third;
third.next = NULL;
}
A fun excercise might be to think about how many ways you can mess this up. I can think of several off the top of my head:
- Supplying the wrong size to
malloc
- Forgetting to assign a
value
to one of the nodes (this will be uninitialized memory) - Forgetting to assign a
next
pointer, particularly for the last node. If you don’t setthird.next = NULL
, then it will be an uninitialized pointer, and any code that traverses this list will follow the pointer to the next element even though no such element exists - Forgetting to free memory, or freeing incorrectly (e.g. use-after-free)
A linked list in Rust
First, we need to define a Node
type. A reasonable suggestion given what you
know so far is to try this:
struct Node {
value: i32, // i32 is a 32-bit signed int, equivalent to "int" in C
next: Node,
}
However, this won’t work because it’s an infinitely recursive type:
There is no actual way to represent this in memory, since it has infinite size.
In C, we got around this problem by storing a pointer to the next node. A pointer has a fixed size of 8 bytes on 64-bit systems, so this fixes our problem. How should we do this in Rust?
So far, the closest thing we’ve seen to a pointer is a reference. However, the compiler needs to be able to analyze the code to ensure that no reference outlives the owner (in order to prevent dangling pointers / use-after-frees). Because of this, it’s usually not a good idea to put references inside of structs, because then, the compiler has a really hard time analyzing how long the reference survives, particularly if your struct gets passed around a lot.
Instead, we use something called a Box. A Box is a pointer to heap memory,
where the Box constructor (Box::new
) allocates and initializes heap memory,
and the Box drop function frees the memory whenever the owner of the Box is
done using it.
We can define our struct using Box like this:
struct Node {
value: i32,
next: Box<Node>,
}
Handling nulls
This is great, but we run into problems as soon as we try creating a node. What
if we want to create a list with only one node? Then next
shouldn’t point
anywhere, but Rust doesn’t have any way to create an empty Box. (That would be
a null pointer, and the point of last lecture is that we want to avoid nulls.)
Instead, we use an Option to indicate whether there is a next node or not:
struct Node {
value: i32,
next: Option<Box<Node>>,
}
Assembling a list
Now, we can create a few nodes:
let first = Box::new(Node {value: 1, next: None});
let second = Box::new(Node {value: 2, next: None});
let third = Box::new(Node {value: 3, next: None});
And try to connect them into a list:
first.next = Some(second);
second.next = Some(third);
This doesn’t work, but the compiler error is pretty straightforward:
error[E0594]: cannot assign to `first.next`, as `first` is not declared as mutable
--> src/main.rs:12:5
|
7 | let first = Box::new(Node {value: 1, next: None});
| ----- help: consider changing this to be mutable: `mut first`
...
12 | first.next = Some(second);
| ^^^^^^^^^^ cannot assign
Remember that variables in Rust are constant by default. We need to add mut
to make these mutable.
let mut first = Box::new(Node {value: 1, next: None});
let mut second = Box::new(Node {value: 2, next: None});
let mut third = Box::new(Node {value: 3, next: None});
first.next = Some(second);
second.next = Some(third);
More errors!
error[E0382]: assign to part of moved value: `*second`
--> src/main.rs:12:5
|
8 | let mut second = Box::new(Node {value: 2, next: None});
| ---------- move occurs because `second` has type `std::boxed::Box<Node>`, which does not implement the `Copy` trait
...
11 | first.next = Some(second);
| ------ value moved here
12 | second.next = Some(third);
| ^^^^^^^^^^^ value partially assigned here after move
This is an ownership issue: the compiler is saying that second
has been moved
elsewhere, and we’re trying to use it after it has been moved.
Where did it get moved to? Let’s draw this out. When we first create the three nodes, the world looks like this:
Then, after this line:
first.next = Some(second);
the world looks like this:
If you take another look at the compiler error message, you can actually see the compiler trying to explain this to us:
11 | first.next = Some(second);
| ------ value moved here
The second node has been moved out of the second
variable and into the Some
option that was placed in first.next
.
This is the kind of annoying problem that happens a lot in Rust. In C, we have pointers, and we can use the pointers whenever we want (even if they aren’t actually valid anymore, which is a problem). In Rust, the compiler applies a set of specific rules to invalidate your variables whenever you transfer ownership. Although this protects us against memory errors, it can be extremely annoying, and sometimes we need to rewrite our code in different ways in order to satisfy the compiler.
Fortunately for us, there is a simple fix in our case: we can just swap the two lines.
second.next = Some(third);
first.next = Some(second);
This first transfers ownership of the third node into the second node, then transfers ownership of the second node into the first node.
Now, first
effectively owns the entire list. When the first
variable is no
longer being used, the first Node
will be dropped, which will cause the first
Option
to be dropped, which will cause the second Node to be dropped, and so
on until all the memory is freed.
Looping over the list
In C, the way you’d traditionally iterate over a linked list is to create a
Node*
pointer pointing to the first element; then, advance that pointer
through the list, stopping once the pointer becomes NULL:
Node* curr = first;
while (curr != NULL) {
printf("%d\n", curr->value);
curr = curr->next;
}
We can do something similar in Rust. First, we take a reference to the first element:
let curr: &Box<Node> = &first;
Then, we can loop through until we reach the end of the list.
However, note that in the C code, we loop until curr
is NULL, but curr
can’t be NULL in Rust. However, we can make curr
an Option
, stopping the
loop when we see that it is None
:
let curr: Option<&Box<Node>> = Some(&first);
while curr.is_some() {
// we can unwrap the Option because we know for sure that curr is Some
println!("{}", curr.unwrap().value);
let next: &Option<Box<Node>> = &curr.unwrap().next;
curr = next;
}
This code is really close, but it doesn’t quite compile because of a type
mismatch. Conceptually, next
is a reference to curr.next
. However, curr
is supposed to be an Option
that is None
if we’ve hit the end of the list,
or a reference to an element in the list if we haven’t reached the end yet:
There is a method called
Option::as_ref
that is handy in situations such as these. as_ref
takes a reference to an
Option. If the Option is None
, then it returns None
back to you, but if the
option is Some
, then it peeks inside and returns to you Some(a reference to the thing inside)
. It’s not often that you need this unless you’re working
with data structure-y code, but it can be handy to keep in your bag of tricks.
let curr: Option<&Box<Node>> = Some(&first);
while curr.is_some() {
// we can unwrap the Option because we know for sure that curr is Some
println!("{}", curr.unwrap().value);
let next: &Option<Box<Node>> = &curr.unwrap().next;
// Get an Option with a reference to the next element
curr = next.as_ref();
}
And with that, we have a linked list that we can iterate over and print out!