Why does this matter? As Steve Yegge said, "If you don't know how compilers work, then you don't know how computers work." Yegge describes 8 problems that can be solved with compilers (or equally well with interpreters, or with Yegge's typical heavy dosage of cynicism).
Scheme syntax is different from most other programming languages. Consider:
Java has a wide variety of syntactic conventions (keywords, infix operators, three kinds of brackets, operator precedence, dot notation, quotes, commas, semicolons), but Scheme syntax is much simpler:
Java Scheme if (x.val() > 0) {
return fn(A[i] + 3 * i,
new String[] {"one", "two"});
}(if (> (val x) 0)
(fn (+ (aref A i) (* 3 i))
(quote (one two)))
In this page we will cover all the important points of the Scheme language and its interpretation (omitting some minor details), but we will take two steps to get there, defining a simplified language first, before defining the near-full Scheme language.
(define r 10) (* pi (* r r))Here is a table of all the allowable expressions:
Expression | Syntax | Semantics and Example |
---|---|---|
variable reference | symbol | A symbol is interpreted as a variable name;
its value is the variable's
value. Example: r ⇒ 10 (assuming r was previously defined to be 10) |
constant literal | number | A number
evaluates to itself. Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6 |
conditional | (if test conseq alt) | Evaluate test; if true,
evaluate and return conseq; otherwise
alt. Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6 |
definition | (define symbol exp) | Define a new variable and give it
the value of evaluating the expression exp.
Examples: (define r 10) |
procedure call | (proc arg...) | If proc is
anything other than one of the symbols if, define,
or quote then it is treated as a procedure. Evaluate proc
and all the args, and then the procedure is applied to the list of arg values. Example: (sqrt (* 2 8)) ⇒ 4.0 |
In the Syntax column of this table, symbol must be a symbol, number must be an integer or floating point number, and the other italicized words can be any expression. The notation arg... means zero or more repetitions of arg.
program ➡ parse ➡ abstract-syntax-tree ➡ eval ➡ result
And here is a short example of what we want parse and eval to be able to do (begin evaluates each expression in order and returns the final one):
>> program = "(begin (define r 10) (* pi (* r r)))" >>> parse(program) ['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]] >>> eval(parse(program)) 314.1592653589793
Symbol = str # A Scheme Symbol is implemented as a Python str Number = (int, float) # A Scheme Number is implemented as a Python int or float Atom = (Symbol, Number) # A Scheme Atom is a Symbol or Number List = list # A Scheme List is implemented as a Python list Exp = (Atom, List) # A Scheme expression is an Atom or List Env = dict # A Scheme environment (defined below) # is a mapping of {variable: value}
def tokenize(chars: str) -> list: "Convert a string of characters into a list of tokens." return chars.replace('(', ' ( ').replace(')', ' ) ').split()Here we apply tokenize to our sample program:
>>> program = "(begin (define r 10) (* pi (* r r)))" >>> tokenize(program) ['(', 'begin', '(', 'define', 'r', '10', ')', '(', '*', 'pi', '(', '*', 'r', 'r', ')', ')', ')']
Our function parse will take a string representation of a program as input, call tokenize to get a list of tokens, and then call read_from_tokens to assemble an abstract syntax tree. read_from_tokens looks at the first token; if it is a ')' that's a syntax error. If it is a '(', then we start building up a list of sub-expressions until we hit a matching ')'. Any non-parenthesis token must be a symbol or number. We'll let Python make the distinction between them: for each non-paren token, first try to interpret it as an int, then as a float, and if it is neither of those, it must be a symbol. Here is the parser:
def parse(program: str) -> Exp: "Read a Scheme expression from a string." return read_from_tokens(tokenize(program)) def read_from_tokens(tokens: list) -> Exp: "Read an expression from a sequence of tokens." if len(tokens) == 0: raise SyntaxError('unexpected EOF') token = tokens.pop(0) if token == '(': L = [] while tokens[0] != ')': L.append(read_from_tokens(tokens)) tokens.pop(0) # pop off ')' return L elif token == ')': raise SyntaxError('unexpected )') else: return atom(token) def atom(token: str) -> Atom: "Numbers become numbers; every other token is a symbol." try: return int(token) except ValueError: try: return float(token) except ValueError: return Symbol(token)parse works like this:
>>> program = "(begin (define r 10) (* pi (* r r)))" >>> parse(program) ['begin', ['define', 'r', 10], ['*', 'pi', ['*', 'r', 'r']]]We're almost ready to define eval. But we need one more concept first.
import math import operator as op def standard_env() -> Env: "An environment with some Scheme standard procedures." env = Env() env.update(vars(math)) # sin, cos, sqrt, pi, ... env.update({ '+':op.add, '-':op.sub, '*':op.mul, '/':op.truediv, '>':op.gt, '<':op.lt, '>=':op.ge, '<=':op.le, '=':op.eq, 'abs': abs, 'append': op.add, 'apply': lambda proc, args: proc(*args), 'begin': lambda *x: x[-1], 'car': lambda x: x[0], 'cdr': lambda x: x[1:], 'cons': lambda x,y: [x] + y, 'eq?': op.is_, 'expt': pow, 'equal?': op.eq, 'length': len, 'list': lambda *x: List(x), 'list?': lambda x: isinstance(x, List), 'map': map, 'max': max, 'min': min, 'not': op.not_, 'null?': lambda x: x == [], 'number?': lambda x: isinstance(x, Number), 'print': print, 'procedure?': callable, 'round': round, 'symbol?': lambda x: isinstance(x, Symbol), }) return env global_env = standard_env()
We are now ready for the implementation of eval. As a refresher, we repeat the table of Lispy Calculator forms:
Expression | Syntax | Semantics and Example |
---|---|---|
variable reference | symbol | A symbol is interpreted as a variable name;
its value is the variable's
value. Example: r ⇒ 10 (assuming r was previously defined to be 10) |
constant literal | number | A number
evaluates to itself. Examples: 12 ⇒ 12 or -3.45e+6 ⇒ -3.45e+6 |
conditional | (if test conseq alt) | Evaluate test; if true,
evaluate and return conseq; otherwise
alt. Example: (if (> 10 20) (+ 1 1) (+ 3 3)) ⇒ 6 |
definition | (define symbol exp) | Define a new variable and give it
the value of evaluating the expression exp.
Examples: (define r 10) |
procedure call | (proc arg...) | If proc is
anything other than one of the symbols if, define,
or quote then it is treated as a procedure. Evaluate proc
and all the args, and then the procedure is applied to the list of arg values. Example: (sqrt (* 2 8)) ⇒ 4.0 |
Here is the code for eval, which closely follows the table:
def eval(x: Exp, env=global_env) -> Exp: "Evaluate an expression in an environment." if isinstance(x, Symbol): # variable reference return env[x] elif isinstance(x, Number): # constant number return x elif x[0] == 'if': # conditional (_, test, conseq, alt) = x exp = (conseq if eval(test, env) else alt) return eval(exp, env) elif x[0] == 'define': # definition (_, symbol, exp) = x env[symbol] = eval(exp, env) else: # procedure call proc = eval(x[0], env) args = [eval(arg, env) for arg in x[1:]] return proc(*args)
We're done! You can see it all in action:
>>> eval(parse("(begin (define r 10) (* pi (* r r)))")) 314.1592653589793
def repl(prompt='lis.py> '): "A prompt-read-eval-print loop." while True: val = eval(parse(raw_input(prompt))) if val is not None: print(schemestr(val)) def schemestr(exp): "Convert a Python object back into a Scheme-readable string." if isinstance(exp, List): return '(' + ' '.join(map(schemestr, exp)) + ')' else: return str(exp)Here is repl in action:
>>> repl() lis.py> (define r 10) lis.py> (* pi (* r r)) 314.159265359 lis.py> (if (> (* 11 11) 120) (* 7 6) oops) 42 lis.py> (list (+ 1 1) (+ 2 2) (* 2 3) (expt 2 3)) lis.py>
Expression | Syntax | Semantics and Example | ||||||
---|---|---|---|---|---|---|---|---|
quotation | (quote exp)
Return the exp literally; do not evaluate it. | Example: (quote (+ 1 2)) ⇒ (+ 1 2) assignment | (set! symbol
exp) | Evaluate exp and assign that value to
symbol, which must have been previously defined (with a
define or as a parameter to an enclosing procedure).
| Example: (set! r2 (* r r)) procedure | (lambda (symbol...)
exp) | Create a procedure
with parameter(s) named symbol... and exp as the body. | Example: (lambda (r) (* pi (* r r))) |
The lambda special form (an obscure nomenclature choice that refers to Alonzo Church's lambda calculus) creates a procedure. We want procedures to work like this:
lis.py> (define circle-area (lambda (r) (* pi (* r r))) lis.py> (circle-area (+ 5 5)) 314.159265359There are two steps here. In the first step, the lambda expression is evaluated to create a procedure, one which refers to the global variables pi and *, takes a single parameter, which it calls r. This procedure is used as the value of the new variable circle-area. In the second step, the procedure we just defined is the value of circle-area, so it is called, with the value 10 as the argument. We want r to take on the value 10, but it wouldn't do to just set r to 10 in the global environment. What if we were using r for some other purpose? We wouldn't want a call to circle-area to alter that value. Instead, we want to arrange for there to be a local variable named r that we can set to 10 without worrying about interfering with any global variable that happens to have the same name. The process for calling a procedure introduces these new local variable(s), binding each symbol in the parameter list of. the function to the corresponding value in the argument list of the function call.
pi: 3.141592653589793
*: <built-in function mul> ...
|
When we look up a variable in such a nested environment, we look first at the innermost level, but if we don't find the variable name there, we move to the next outer level. Procedures and environments are intertwined, so let's define them together:
class Env(dict): "An environment: a dict of {'var': val} pairs, with an outer Env." def __init__(self, parms=(), args=(), outer=None): self.update(zip(parms, args)) self.outer = outer def find(self, var): "Find the innermost Env where var appears." return self if (var in self) else self.outer.find(var) class Procedure(object): "A user-defined Scheme procedure." def __init__(self, parms, body, env): self.parms, self.body, self.env = parms, body, env def __call__(self, *args): return eval(self.body, Env(self.parms, args, self.env)) global_env = standard_env()We see that every procedure has three components: a list of parameter names, a body expression, and an environment that tells us what other variables are accessible from the body. For a procedure defined at the top level this will be the global environment, but it is also possible for a procedure to refer to the local variables of the environment in which it was defined (and not the environment in which it is called).
An environment is a subclass of dict, so it has all the methods that dict has. In addition there are two methods: the constructor __init__ builds a new environment by taking a list of parameter names and a corresponding list of argument values, and creating a new environment that has those {variable: value} pairs as the inner part, and also refers to the given outer environment. The method find is used to find the right environment for a variable: either the inner one or an outer one.
To see how these all go together, here is the new definition of eval. Note that the clause for variable reference has changed: we now have to call env.find(x) to find at what level the variable x exists; then we can fetch the value of x from that level. (The clause for define has not changed, because a define always adds a new variable to the innermost environment.) There are two new clauses: for set!, we find the environment level where the variable exists and set it to a new value. With lambda, we create a new procedure object with the given parameter list, body, and environment.
def eval(x, env=global_env): "Evaluate an expression in an environment." if isinstance(x, Symbol): # variable reference return env.find(x)[x] elif not isinstance(x, List):# constant return x op, *args = x if op == 'quote': # quotation return args[0] elif op == 'if': # conditional (test, conseq, alt) = args exp = (conseq if eval(test, env) else alt) return eval(exp, env) elif op == 'define': # definition (symbol, exp) = args env[symbol] = eval(exp, env) elif op == 'set!': # assignment (symbol, exp) = args env.find(symbol)[symbol] = eval(exp, env) elif op == 'lambda': # procedure (parms, body) = args return Procedure(parms, body, env) else: # procedure call proc = eval(op, env) vals = [eval(arg, env) for arg in args] return proc(*vals)
To appreciate how procedures and environments work together, consider this program and the environment that gets formed when we evaluate (account1 -20.00):
|
|
Each rectangular box represents an environment, and the color of the box matches the color of the variables that are newly defined in the environment. In the last two lines of the program we define account1 and call (account1 -20.00); this represents the creation of a bank account with a 100 dollar opening balance, followed by a 20 dollar withdrawal. In the process of evaluating (account1 -20.00), we will eval the expression highlighted in yellow. There are three variables in that expression. amt can be found immediately in the innermost (green) environment. But balance is not defined there: we have to look at the green environment's outer env, the blue one. And finally, the variable + is not found in either of those; we need to do one more outer step, to the global (red) environment. This process of looking first in inner environments and then in outer ones is called lexical scoping. Env.find(var) finds the right environment according to lexical scoping rules.
Let's see what we can do now:
>>> repl() lis.py> (define circle-area (lambda (r) (* pi (* r r)))) lis.py> (circle-area 3) 28.274333877 lis.py> (define fact (lambda (n) (if (<= n 1) 1 (* n (fact (- n 1)))))) lis.py> (fact 10) 3628800 lis.py> (fact 100) 9332621544394415268169923885626670049071596826438162146859296389521759999322991 5608941463976156518286253697920827223758251185210916864000000000000000000000000 lis.py> (circle-area (fact 10)) 4.1369087198e+13 lis.py> (define first car) lis.py> (define rest cdr) lis.py> (define count (lambda (item L) (if L (+ (equal? item (first L)) (count item (rest L))) 0))) lis.py> (count 0 (list 0 1 2 3 0 0)) 3 lis.py> (count (quote the) (quote (the more the merrier the bigger the better))) 4 lis.py> (define twice (lambda (x) (* 2 x))) lis.py> (twice 5) 10 lis.py> (define repeat (lambda (f) (lambda (x) (f (f x))))) lis.py> ((repeat twice) 10) 40 lis.py> ((repeat (repeat twice)) 10) 160 lis.py> ((repeat (repeat (repeat twice))) 10) 2560 lis.py> ((repeat (repeat (repeat (repeat twice)))) 10) 655360 lis.py> (pow 2 16) 65536.0 lis.py> (define fib (lambda (n) (if (< n 2) 1 (+ (fib (- n 1)) (fib (- n 2)))))) lis.py> (define range (lambda (a b) (if (= a b) (quote ()) (cons a (range (+ a 1) b))))) lis.py> (range 0 10) (0 1 2 3 4 5 6 7 8 9) lis.py> (map fib (range 0 10)) (1 1 2 3 5 8 13 21 34 55) lis.py> (map fib (range 0 20)) (1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765)We now have a language with procedures, variables, conditionals (if), and sequential execution (the begin procedure). If you are familiar with other languages, you might think that a while or for loop would be needed, but Scheme manages to do without these just fine. The Scheme report says "Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language." In Scheme you iterate by defining recursive functions.
bash$ grep "^\s*[^#\s]" lis.py | wc 117 497 4276
From there Tony and I split paths. He reasoned that the hard part was the interpreter for expressions; he needed Lisp for that, but he knew how to write a tiny C routine for reading and echoing the non-Lisp characters and link it in to the Lisp program. I didn't know how to do that linking, but I reasoned that writing an interpreter for this trivial language (all it had was set variable, fetch variable, and string concatenate) was easy, so I wrote an interpreter in C. So, ironically, Tony wrote a Lisp program (with one small routine in C) because he was a C programmer, and I wrote a C program because I was a Lisp programmer.
In the end, we both got our theses done (Tony, Peter).
To learn more about Scheme consult some of the fine books (by Friedman and Fellesein, Dybvig, Queinnec, Harvey and Wright or Sussman and Abelson), videos (by Abelson and Sussman), tutorials (by Dorai, PLT, or Neller), or the reference manual.
I also have another page describing a more advanced version of Lispy.