Crafting Marwood

by Erik Bremen

This book is built with mdBook and Mermaid-JS. Its source may be found at marwood-book.

Introduction

Crafting Marwood is a book describing the process behind researching and building Marwood, a Scheme compiler and runtime written in Rust. Marwood may be interacted with on the web at repl.marwood.io, and its source is MIT Licensed and may be found on github.

What is Scheme?

Scheme is a programming language invented in 1975 by Guy Steel and Gerald Jay Sussman at MIT. It's the first dialect of lisp to use lexical scope, and its simplicity makes it a great choice for hobby programming language implementers.

Here's a sample of scheme's syntax, demonstrating a recursive implementation of factorial:

(define (factorial n)
    (if (= n 0) 
        1
        (* n (factorial (- n 1)))))

Why Scheme?

Scheme has very simple grammar, and small number of core language features that once implemented provide the building blocks for the rest of the implementation.

R7Small, the latest standard at the time of writing, was published in 2013 and is just under 90 pages of text including full library specification and appendix.

The simplicity of scheme allows the implementer to spend more time exploring complex branches of programming language implementations, such as garbage collection, macro expansion, optimizations, etc.

How is Marwood different?

When researching scheme implementation material I came across two types of sources:

  1. Academic material dating back to the 1970s. This material is quite abstract and a majority of these papers and books describe implementing a metacircular evaluator. There are very few materials on implementing scheme in a language like C or Rust.

  2. Modern scheme codebases such as Chez Scheme, Guile, Chicken, Chibi. These implementations have the best performance, but are composed of millions of lines of code over years of implementation from dozens of developers.

Marwood's goal was to provide an example of an implemenation of scheme that fits somewhere between #1 and #2.

Who is this book for?

This book is walkthrough through Marwood's source code. It is aimed at an audience with some experience with the Rust programming language, parsers, and compilers.

For a detailed tutorial series on writing an interpreter, I would recommend Crafting Interpreters.

Resources and Influences

These resources were found useful when researching scheme implementations:

Design Overview

Highlevel Design Choices

The first major decision when constructing Marwood was to decide how Marwood would evaluate scheme:

  1. Direct evaluation of the AST in the host language: This is the most natural way to attempt to write an interpreter for Scheme, but it has a few short-comings. Certain features like tail-call optimization and call/cc may be more difficult to implement depending on the host language, and other considerations like memory management are directly affected by the host language.

    This approach makes a lot of sense if the host language is Scheme, however. There are numerous texts (e.g. sicp) on writing meta-circular evaluators in Scheme, and it is one of the many brilliant examples of Scheme's elegant design.

  2. Compilation to an existing target architecture: The target architecture could be hardware based (e.g. x86, amd64, arm) such as in An Incremental Approach to Compiler Construction, or virtual (JVM, WASM, etc).

  3. Compilation to a custom virtual machine: A custom virtual machine is created designed to evaluate Scheme.

Marwood uses a custom virtual machine approach.

Marwood Library

The Marwood library allows the creation of Scheme virtual machines, which may be used to evaluate scheme expressions. Each virtual machine represents the state of a scheme environment (global environment, heap, stack, etc).

Here's an example demonstrating creation of a recursive factorial procedure, and evaluation of (factorial n) for n in 0..10:

use marwood::vm::Vm;

fn main() {
    let mut vm = Vm::new();
    let code = r#"
        (define (factorial n)
            (let factorial ([n n] [acc 1])
               (if (zero? n)
                   acc
                   (factorial (- n 1) (* acc n)))))
    "#;

    vm.eval_text(&code).unwrap();

    for it in 0..10 {
        let (cell, _) = vm.eval_text(&format!("(factorial {})", it)).unwrap();
        println!("the factorial of {} is {}", it, cell);
    }
}

Which produces the following output:

the factorial of 0 is 1
the factorial of 1 is 1
the factorial of 2 is 2
the factorial of 3 is 6
the factorial of 4 is 24
the factorial of 5 is 120
the factorial of 6 is 720
the factorial of 7 is 5040
the factorial of 8 is 40320
the factorial of 9 is 362880

A breakdown of Marwood's Components

The remainder of this text describes the design decisions behind each of Marwood's major library components:

  • A parser and tokenizer used to create Marwood's AST cell object. This object acts as both input to the VM's eval() function, and also the result of an evaluation. Marwood's tokenizer is also useful in other situations, such as providing input to REPL syntax highlighting and bracket matching.

  • A virtual machine that represents a scheme environment and may be used to execute scheme. The VM may be used to evaluate a Cell produced by Marwood's parser. Successful evaluation results in an output Cell, that represents the result of the evaluation.

  • A compiler that given a Cell and a VM, constructs compiled bytecode to be executed on the VM.

  • Numerous built-in Scheme library procedures written in rust, and a scheme prelude containing library procedures written in scheme.

  • The repl and web-repl crates. These creates are separate from the main Marwood library crates, and provide example implementations of REPLs that use the Marwood library.

This diagram illustrates a Marwood read-eval-print loop. Text is read from the readline library, tokenized, parsed, evaluated and then the resulting Cell object is printed back to the user.

The VM object maintains any state expected by the user between evaluations. For example, if the user enters (define x 10), then on evaluation the VM object will modify its heap to represent the new binding for x.

flowchart LR
    A(Readline) -->|Text| B(Tokenizer)
    B -->|Token Stream| C(Parser)
    C -->|Cell| D(VM)
    D -->|Cell| A

The remaining sections of this text describe in detail how these different components were designed.

Parsing & Printing

Scheme, having an incredibly simple grammar, lends itself well to a hand-written lexer and recursive descent parser.

Marwood's parser is composed of three separate components:

  • the Cell object, which represents a Scheme sexpr in Rust

  • the lexer (or scanner or tokenizer), which is responsible for turning plain-text scheme into a list of tokens to be consumed by Marwood's parser.

  • the parser, which is responsible for constructing an aggregate structure called Cell from the lexer output.

Example

This example demonstrates the manual use of Marwood's lexer, parser, and Cell types. It also shows use of Marwood's printer. The alternative printing syntax {:#} provides additional escaping, and is the printer used for Scheme's write procedure.

    let scheme_code = r#"
        (quote (#\x 20 puppies))
    "#;

    let tokens: Vec<Token> = lex::scan(scheme_code).unwrap();
    let cell = parse::parse(scheme_code, &mut tokens.iter().peekable()).unwrap();

    println!("{}", cell);
    println!("{:#}", cell);

outputs:

'(x 20 puppies)
'(#\x 20 puppies)

S-Expressions

When designing a scheme parser, we first have to think a little bit about how to represent Scheme expressions in Rust's type system.

Fortunately Scheme's source code is already representable in a data structure called an S-expression (or sexpr). This is one of the fundamental properties of a lisp: source code is data.

A sexpr can be thought of as an aggregate type that can represent one of two things:

  1. an atom, such as a booleans, numbers, symbols, strings, nil, etc.
  2. a pair in the form (car . cdr), where *car *and cdr are themselves S-expressions

The expression (+ 2 3) in scheme is sugar for a tree of pairs, which could be rewritten (+ . (2 . (3 . ()))).

This diagram depicts the above expression as a tree:

flowchart TD
    A(( )) -->|car| +((+))
    A -->|cdr| B(( ))
    B -->|car| 2((2))
    B -->|cdr| C(( ))
    C -->|car| 3((3))
    C -->|cdr| nil((nil))

Improper Lists

The expression (1 2 . 3) is an example of an *improper list, a list not terminated with '(). It relies on special use of a dot preceding the very last element in a list. This list could also be written as (1 . (2 . 3)).

flowchart TD
    A(( )) -->|car| +((+))
    A -->|cdr| B(( ))
    B -->|car| 2((2))
    B -->|cdr| 3((3))

Improper lists are less common in Scheme, but there are a few parts of R7Rs, such as variable argument support, that rely on improper lists.

Examples of improper lists are:

'(1 2 3 4 5 6 7 8 9 . 10)
'((10 10) (20 20) . (/ 30 30))

This example is invalid scheme because there are two expressions after the dot:

'(1 2 . 3 4)

Cell

If Scheme can be represented by an aggregate type called a sexpr, then we just need to create a type in Rust that represents an S-expression: a type that can be either an atom, or a pair. Fortunately Rust's enum type allows exactly this: the creation of a type that represents one of several possible variants.

Marwood's Cell is the enum type that represents Scheme data and code in Marwood. It is the main input to Marwood's compiler, macro transformers, and printer, and is the output of Marwood's VM as a result of evaluating scheme.

Most Cell's variants represent atom types in Scheme, such as booleans, numbers, symbols, characters, and strings:

pub enum Cell {
    Bool(bool),
    Char(char),
    Number(Number),
    Nil,
    String(String),
    Symbol(String),
    ...
}

As an example, parsing the scheme expression #f would result in Marwood's parser outputting the value Cell::Bool(false).

Pairs

Cell also contains an aggregate type called Pair, which represents a sexpr pair:

pub enum Cell {
    ...
    Pair(Box<Cell>, Box<Cell>),
    ...
}

Pair is a pair of Cells representing the car and cdr parts of a cons pair. This structure can be used to represent pairs in scheme, which in turn may be used to represent lists. This is the main building block for representing Scheme code's tree structure in Rust.

Vectors

Vector represents a literal scheme vector, such as #(10 20 30).

pub enum Cell {
    ...
    Vector(Vec<Cell>),
    ...
}

Special Variants

The Cell enum also contains a few variants that can only be produced as output of Marwood's evaluator. For example, the Cell::Void variant is produced by some Marwood procedures that evaluate to #<void> (e.g. define, set!). It is not possible to produce these variants via Marwood's parser.

#[derive(Debug, Eq, PartialEq, Hash, Clone)]
pub enum Cell {
    ...
    Continuation,
    Macro,
    Procedure(Option<String>),
    Undefined,
    Void,
}

Tokenizing

Marwood's lexer is in the form:

fn scan(text: &str) -> Result<Vec<Token>, Error>

Given scheme code as input, the scan function returns a vector of tokens according to the Scheme grammar Each token contains a span, which may be used to extract the token's text from the source text, and the token type (e.g. LeftParen).

pub struct Token {
    /// (start, end) index of original span in the source &str
    pub span: (usize, usize),
    /// The type output by the scanner
    pub token_type: TokenType,
}

pub enum TokenType {
    Char,
    Dot,
    False,
    LeftParen,
    Number,
    NumberPrefix,
    RightParen,
    SingleQuote,
    String,
    Symbol,
    True,
    WhiteSpace,
    HashParen,
}

Error::Incomplete

The special error Error::Incomplete is returned in certain circumstances where the input looks incomplete, and may be hint to the REPL interface that the user intends to enter a multi-line expression.

An example that may cause an Error::Incomplete error in Marwood's scanner is a mismatched double quote: "hello world.

Scanner Example

Given this code:

'(10 20 "puppies") ; heterogeneous

The tokenizer will produce the following tokens:

[
    Token {
        span: (0, 1),
        token_type: SingleQuote,
    },
    Token {
        span: (1, 2),
        token_type: LeftParen,
    },
    Token {
        span: (2,4),
        token_type: Number,
    },
    Token {
        span: (5,7),
        token_type: Number,
    },
    Token {
        span: (8,17),
        token_type: String,
    },
    Token {
        span: (17,18),
        token_type: RightParen,
    }
]

Parsing

The Parse Function

Marwood's parser is a direct consumer of Vec<Token> returned by the lexer.

parse::parse() takes an iterator over &Token, the original source &str that was supplied to lex::scan, and returns a Cell.

pub fn parse<'a, T: Iterator<Item = &'a Token>>(
    text: &str,
    cur: &mut Peekable<T>,
) -> Result<Cell, Error>

parse will only consume one expression from cur. If cur may contain multiple expressions, the cursor position that parse left off at can be used to determine the cursor position of the next expression.

For example, this scheme source contains five expressions and would require five separate calls to parse.

1 2 3
(+ 1 2)(+ 3 4)

Error::Incomplete

Similarly to scan, parse() will return Error::Incomplete if the token stream does not contain a complete expression.

Some examples of incomplete expressions:

'(1 2 3
'(1 2 3)'(4 5
#(1 2 3)

Virtual Machine

Marwood has a stack based virtual machine represented by the Vm object. Each Vm represents a full scheme runtime environment.

Scheme expressions evaluated on the VM are compiled into byte code represented by [op codes], which are evaluated by the VM.

#[derive(Debug)]
pub struct Vm {
    /// The heap and global environment
    heap: Heap,
    globenv: GlobalEnvironment,

    /// The current program stack
    stack: Stack,

    /// Registers
    acc: VCell,
    ep: usize,
    ip: (usize, usize),
    bp: usize,

    /// System Interface (display, write, etc).
    sys: Box<dyn SystemInterface>,
}

The Vm object provides an eval interface for evaluating scheme expressions, and is the main interface for the console and web repl:

fn eval(&mut self, cell: &Cell) -> Result<Cell, Error>

VM Components

The Vm is composed of these high-level components, which are explored later in this chapter:

Registers

The Vm has a handful of registers that are used to maintain currently running program state:

RegisterDescription
%accAccumulator
%ipInstruction Pointer
%bpFrame Base Pointer
%spStack Pointer
%epEnvironment Pointer

The %acc (or accumulator) register contains the result of any Vm instruction that produces a result. The %acc register is also used by procedures to store the result of procedure application. For example, the procedure (+ 10 10) would result in %acc containing the result 20.

The %ip register contains a tuple that points to the currently running block of bytecode, and offset in the bytecode of the next instruction to be executed.

The %bp, %sp and %ep registers maintain state of Marwood's stack and call frame state and are discussed in detail in the procedure application section.

Op Codes

Marwood's instruction set is built specifically for executing scheme. It's not a general purpose virtual machine.

The MOV instruction is used to move values between Marwood's registers, stack, global and lexical environments.

The CALL, CLOSURE, ENTER, RET, TCALL, and VARARG instructions support procedure application.

The JMP and JNT provide branching support for the compiler to support the scheme if primitive.

The CONS and VPUSH instructions are instructions used to build lists or vectors at runtime, and are primarily used to support the quasiquote language feature.

The HALT instruction stops the virtual machine, returning the contents of the %acc register as the result to eval.

OpcodeDescription
CALL %accGiven the arguments for the procedure in %acc have been pushed on the stack, jump to the procedure in %acc.
CLOSURE %accCreate a lexical environment as a result of evaluating a (lambda ...) expression.
CONSPerforms the cons operation on the first two values of the stack, storing the resulting pair in %acc
ENTERSetup the currently executing procedure's stack frame
HALTHalt program, returning the result contained within ACC
JNT <OFFSET>Set %ip to OFFSET if %acc is #f
JMP <OFFSET>Set %ip to OFFSET
MOV <SRC> <DEST>Move the value from SRC into DEST
PUSHPush the value in ACC on to the stack
RETReturn from a procedure entered via CALL
TCALL %accIdentical to a CALL instruction, except that a tail optimizing CALL is performed.
VPUSHPush the value in %acc onto the vector at the top of the stack, storing the resulting vector in %acc

Why a second Cell type?

While Marwood's Cell type is able to represent any scheme expression, it's not suitable for representing Scheme structure in Marwood's VM.

Consider the following expressions that create a small list bound to x, and then bind y to a subset of the list:

(define x '(1 2 3))
(define y (cdr x))

This is because the expression (define y (cdr x)) has bound y to a sub-structure of x. It is not a copy. If y were mutated with set-car! or set-cdr! we would expect the structure that x is bound to realize the mutation also.

flowchart TD
    x --> A
    y --> B
    A(( )) -->|car| 1((1))
    A -->|cdr| B(( ))
    B -->|car| 2((2))
    B -->|cdr| C(( ))
    C -->|car| 3((3))
    C -->|cdr| nil((nil))

Further, if x were to be bound to another value, we expect that Marwood be able to garbage collect any part of the structure no longer referenced. In this example, the pair x is pointing to would no longer be required and may be collected. Only the structure referenced by y would remain:

(define x 0)
flowchart TD
    y --> B(( ))
    B -->|car| 2((2))
    B -->|cdr| C(( ))
    C -->|car| 3((3))
    C -->|cdr| nil((nil))

Representing this relationship with Marwood's existing Cell data structure would be very difficult. With the Cell data structure, pairs are represented as Pair(Box<Cell>, Box<Cell>). It's not possible for two pairs to refer to the same data structure.

VCell

The VCell type is how Marwood represents scheme at runtime. It's also used to represent other runtime objects, such as VM op codes, registers, and even lexical environments. Marwood's stack and heaps are just vectors of VCell. It is the universal type!

Instead of a pair being represented by (Box<Cell>, Box<Cell>), a VCell pair is represented by (usize, usize), each usize corresponding to the locations of the car and cdr portions on the heap. This is explored further in the section on Marwood's heap.

#[derive(Clone, Debug, Eq, PartialEq)]
pub enum VCell {
    Bool(bool),
    Char(char),
    Nil,
    Number(Number),
    Pair(usize, usize),
    Symbol(Rc<String>),
    String(Rc<RefCell<String>>),
    Vector(Rc<Vector>),
    ...
}

Note: Marwood's VCell enum is kept a maximum of 24 bytes. 8 bytes represent the enum tag, and an additional 16 bytes remain for data. Any VCell variants that may exceed this size are boxed with Rc, though the most common types fit well within this limit. Keeping VCell as small as possible has the benefit that it may be trivially cloneable, which comes in handy when encountering any borrow checker related issues with reading VCell values from Marwood's heap and stack.

VCell > Scheme

VCell's primary role is to represent scheme data in Marwood's VM, but it's also used to represent Marwood's internal instruction format (byte code) and values on Marwood's stack, heap, global and lexical environments. The remaining VCell variants are used to represent the various types that may be encountered.

#[derive(Clone, Debug, Eq, PartialEq)]
pub enum VCell {
    ...

    Continuation(Rc<Continuation>),
    Closure(usize, usize),
    Lambda(Rc<Lambda>),
    LexicalEnv(Rc<LexicalEnvironment>),
    LexicalEnvSlot(usize),
    LexicalEnvPtr(usize, usize),
    Macro(Rc<Transform>),

    Acc,
    ArgumentCount(usize),
    BasePointer(usize),
    BasePointerOffset(i64),
    BuiltInProc(Rc<BuiltInProc>),
    EnvironmentPointer(usize),
    GlobalEnvSlot(usize),
    InstructionPointer(usize, usize),
    OpCode(OpCode),
    Ptr(usize),
}

Stack

Marwood's stack is represented by an upward growing stack of VCell values, and a stack pointer %sp that represents the current top of the stack.

The stack's main purpose is to support the application of procedures, which include the pushing of arguments and meta data required for call frame setup.

Here's an example stack during evaluation of the expression (+ 10 5), with the stack pointer pointing at slot 7, and the arguments to + having been pushed in slots 5 and 6.

SlotValueSP
7%argc[2]
6Fixnum(5)
5Fixnum(10)
4%bp[$00]
3%ip[$223][$06]
2%ep[$ffffffffffffffff]
1%argc[0]
0Undefined

Stack Structure

Marwood's stack is backed by a Rust Vec<VCell>, and the stack pointer sp which contains the index of the current top of the stack in the stack vector. Operations such as push and pop are backed by the rust Vector's implementation of push and pop.

pub struct Stack {
    /// Stack contents
    stack: Vec<VCell>,

    /// Stack Pointer. SP points to the top value to be pushed onto the stack,
    /// This value backs the SP register of the VM
    sp: usize,
}

Stack manipulation is provided by push and pop functions:

    pub fn push<T: Into<VCell> + Display>(&mut self, vcell: T)
    pub fn pop(&mut self) -> Result<&VCell, Error>

Access to values values on the stack are provided by either direct index into the stack, or offset relative to the current stack pointer:


    pub fn get(&self, index: usize) -> Result<&VCell, Error>
    pub fn get_mut(&mut self, index: usize) -> Result<&mut VCell, Error>
    pub fn get_offset(&self, offset: i64) -> Result<&VCell, Error> {
    pub fn get_offset_mut(&mut self, offset: i64) -> Result<&mut VCell, Error>

Stack Growth

The push operation is backed directly by Vec's push function, which will grow the underlying vector automatically. The stack does not currently have any method of shrinking the stack vector.

Stack API Example

    let mut stack = Stack::new();

    // Push
    stack.push(VCell::number(1));
    stack.push(VCell::number(2));

    // Access relative to %sp
    assert_eq!(stack.get_offset(0), Ok(&VCell::number(2)));
    assert_eq!(stack.get_offset(-1), Ok(&VCell::number(1)));
    assert_eq!(stack.get_offset(-2), Ok(&VCell::Undefined));

    // Pop
    assert_eq!(stack.pop(), Ok(&VCell::number(2)));
    assert_eq!(stack.pop(), Ok(&VCell::number(1)));
    assert_eq!(stack.pop(), Err(InvalidStackIndex(0)));

Heap

This small snippset of scheme is useful for reasoning about why scheme might need a heap:

(define x 10)
(define y (* x x ))
(define animals '(cats otters puppies))
(define sub-animals (cdr animals))

Both the value for x and y are bound to numbers. Mutating x and y only require changing what value the symbol is bound to in the environment.

But, animals and sub-animals are a little more nuanced. animals is a list composed of pairs and symbols, and more importantly sub-animals refers to tail of the same list containing the values '(otters puppies). That is, both animals and sub-animals refer to parts of the same data structure. Not a copy.

If sub-animals were modified with set-car! or set-cdr!, the mutation must also be realized on the list that animals is bound to.

One way of supporting this type of funcitonality is to store these shared values on a heap. animals and sub-animals can then contain references into the same data structure on a heap.

Heap Structure

Like the stack, Marwood's heap is composed of a Vec<VCell>, along with various supporting structures used to track free slots, intern symbols, and garbage collection.

pub struct Heap {
    chunk_size: usize,
    free_list: Vec<usize>,
    heap: Vec<VCell>,
    heap_map: gc::Map,
    symbol_table: HashMap<String, usize>,
}

chunk_size refers to the number of heap slots allocated when the heap needs to grow. On initialization, the Heap will allocate chunk_size slots initialized to VCell::Undefined, and add the index of each unused heap slot to free_list. This free list contains indexes on the heap of unused slots ready to be allocated.

    pub fn new(chunk_size: usize) -> Heap {
        Heap {
            chunk_size,
            heap: vec![VCell::undefined(); chunk_size],
            free_list: (0..chunk_size).rev().into_iter().collect()
        }
    }

Heap References

References into Marwood's heap are of the type type HeapRef = usize; and are direct indexes into the heap. These heap references can be found in various places in Marwood's VCell and VM structures:

  • VCell::Ptr(HeapRef)
  • VCell::Pair(HeapRef, HeapRef)
  • VCell::Closure(HeapRef, HeapRef)

Alloc & Free

The heap's core API is composed of alloc, free and get operations on the heap.

    pub fn grow(&mut self)
    pub fn alloc(&mut self) -> usize
    pub fn free(&mut self, ptr: usize)

    pub fn get_at_index(&self, ptr: usize) -> &VCell
    pub fn get_at_index_mut(&mut self, ptr: usize) -> &mut VCell
    pub fn get<'a, T: Into<Cow<'a, VCell>>>(&self, vcell: T) -> VCell

alloc returns the next available slot on the free_list, calling grow to grow the heap by chunk_size elements if the free_list is empty. The disposition of the slot is also marked as State::Allocated in the heap_map, which is used by Marwood's garbage collector and discussed in more detail in later sections.

    pub fn alloc(&mut self) -> usize {
        match self.free_list.pop() {
            None => {
                self.grow();
                self.alloc()
            }
            Some(ptr) => {
                self.heap_map.set(ptr, State::Allocated);
                ptr
            }
        }
    }

free returns a heap slot back to the free_list. The contents is set back to VCell::Undefined, which may help catch any bugs in Marwood where a heap reference is used when it no longer should be.

The slot is also marked State::Free in the heap_map, a data structure used by Marwood's garbage collector.

    pub fn free(&mut self, ptr: usize) {
        self.heap_map.set(ptr, State::Free);
        if let Some(VCell::Symbol(sym)) = self.heap.get(ptr) {
            self.symbol_table.remove(&**sym);
        }
        *self.heap.get_mut(ptr).unwrap() = VCell::Undefined;
        self.free_list.push(ptr);
    }

The core get operations on the heap are not notable, except that the get functions will panic if given an index that is not valid for the current heap.

The function get acts as a wrapper around get_at_index and only performs a heap lookup if the supplied vcell is a pointer type. This allows code in Marwood to access values on the stack or environment slots without having to check of the value is a reference first.

    pub fn get_at_index(&self, ptr: usize) -> &VCell {
        self.heap.get(ptr).expect("heap index out of bounds")
    }

    pub fn get_at_index_mut(&mut self, ptr: usize) -> &mut VCell {
        self.heap.get_mut(ptr).expect("heap index out of bounds")
    }

    pub fn get<'a, T: Into<Cow<'a, VCell>>>(&self, vcell: T) -> VCell {
        match vcell.into() {
            Cow::Borrowed(vcell) => match vcell {
                VCell::Ptr(ptr) => self.get_at_index(*ptr).clone(),
                vcell => vcell.clone(),
            },
            Cow::Owned(vcell) => match vcell {
                VCell::Ptr(ptr) => self.get_at_index(ptr).clone(),
                vcell => vcell,
            },
        }
    }

Put

The put function puts the supplied vcell on the next available slot on the heap returned by alloc. If vcell was already a pointer type, then it's returned instead of being double boxed by the heap. This removes the need for a lot of boxing code in Marwood to check if the value it's boxing is already boxed.

    pub fn put<T: Into<VCell> + Clone>(&mut self, vcell: T) -> VCell {
        let vcell = vcell.into();
        match &vcell {
            VCell::Ptr(_) => vcell,
            VCell::Symbol(sym) => match self.symbol_table.get(sym.deref()) {
                Some(ptr) => VCell::ptr(*ptr),
                None => {
                    let ptr = self.alloc();
                    *self.heap.get_mut(ptr)
                        .expect("heap index is out of bounds") = vcell.clone();
                    self.symbol_table.insert(sym.deref().into(), ptr);
                    VCell::ptr(ptr)
                }
            },
            vcell => {
                let ptr = self.alloc();
                *self.heap.get_mut(ptr)
                    .expect("heap index is out of bounds") = vcell.clone();
                VCell::Ptr(ptr)
            }
        }
    }

Symbol Interning

The heap provides symbol interning by use of Heap::symbol_table. On put of a symbol, the heap will check if the symbol already exists in the symbol table and will return the stored heap location if the symbol already exists. This results in the same symbols always having the same heap location in Marwood. Why is this useful? This allows various parts of Marwood to refer to symbols by their heap reference instead of the string.

The maybe_put function acts as a wrapper around put, and will only put a VCell value if the value is one that must be boxed on a heap. Atom values such as numbers, bool, nil, etc are immutable values that may not always need to be boxed.

    pub fn maybe_put<T: Into<VCell> + Clone>(&mut self, vcell: T) -> VCell

Put & Get Cell

The put_cell and maybe_put_cell are versions of put and maybe_put that recursively place a Cell structure on the heap. Aggregate structures such as pairs and vectors may end up allocating multiple heap slots to represent the structure in the heap.

    pub fn put_cell(&mut self, ast: &cell::Cell) -> VCell
    pub fn maybe_put_cell(&mut self, ast: &cell::Cell) -> VCell

The following call to put_cell ends up allocating 10 slots on the heap to create the list structure and storage for values:

    let mut heap = Heap::new(8192);
    heap.put_cell(&parse!("(puppies 42.0 cats puppies #t)"));

The chart below contains the resulting heap allocations for the storage of this list in the heap. Note that because puppies is an interned symbol at position $00 in the heap, both pairs that reference the value puppy point to the same heap position of $00.

SlotValue
0Symbol(puppies)
1Float(42.0)
2Symbol(cats)
3#t
4()
5($03 . $04)
6($00 . $05)
7($02 . $06)
8($01 . $07)
9($00 . $08)

Global Environment

Numbers