December 2017, rev. Nov. 2018

I used to believe it's better to implement eval in Lisp. That given some core operators, the rest of Lisp could be written in itself. This is a powerful feature because it lets the programmer change the language at runtime.

Then I noticed writing eval in Lisp presents a bootstrap problem. How does eval get evaluated when eval doesn't exist yet?

Supposedly Lisp runs in a REPL: it reads (R) source code, evaluates (E) the source code with eval, prints (P) the results, and then loops (L) to do the same thing again. How did eval get there? If eval will be written in Lisp as a function and therefore doesn't exist yet, how is eval already part of the REPL?

What evaluates eval is something else that came to life before the REPL and before eval: there is a pre-eval.

Pre-eval is what evaluates the language core operators. There is some part of the language that says, more or less, if I get as input (def eval ...) I'm going to create a function called eval. If I get (= x 1) I'm going to create a variable called x. It's impossible to write eval without pre-eval.

Here's an example Lisp that writes eval as a function. The language core doesn't have a built-in REPL but instead runs only one expression and then exits. Since there's no REPL yet, there's no ability to loop over expressions, so the REPL has to be written in a single expression:

(list (def eval ...) (loop (print (eval (read)))))

To evaluate list, Lisp needs pre-eval. Irrespective of the language used to write the REPL, be it C or Lisp, eval depends on pre-eval.

Read Depends On Pre-Read

The trail of dependencies in the REPL grows longer once we notice that computation doesn't begin at eval but at read. Eval depends on read. Imagine the following Lisp that has no built-in REPL and as a result must implement a REPL outside the language core:

(list (read))

If read is a built-in operator that reads data from the input stream, what reads list which came before it? What reads the first (?

What's really going on is that Lisp uses a built-in pre-read. Pre-read is what reads input from the input stream and tells apart lists from atoms (lists contain parentheses). It also tells apart strings from symbols (strings are within double quotes), numbers from decimals (decimals contain a dot), numbers from strings, etc. Pre-read reads before finding read, the operator that reads.

One surprise here is that internally the built-in pre-read contains a loop too. The loop isn't only in the REPL. To read an entire string, for example, pre-read must keep reading characters one by one until it finds the ending double quote. In principle read can be implemented as a function in Lisp that calls read-char, a built-in operator that reads a character at a time, similar to getc() in the C library. But something else must have come to life to read this function before being able to use it. That's pre-read.

Pre-read has an additional twist. Pre-read can only call functions that don't need eval. If a Lisp is designed to have reader macros, when pre-read detects a matching character it must call a corresponding function for it. Since eval doesn't exist yet, this function must not need eval. Or if it needs eval, reader macros can be used only after eval has been defined as a function.

See the dependencies? If Lisp is designed with the REPL baked in the core, then it's impossible to have a clean separation between read, eval, print, and loop. Eval depends on pre-eval which depends on pre-read which may need eval. There's a circular dependency.

Given these limitations, how could one define the hypothetically clean and maximally rewritable Lisp, the Lisp that keeps the REPL phases separate and outside the language core?

One way is to process with pre-read and pre-eval only one expression, as shown in the list example earlier, and let the first few subexpressions inside the expression have fewer features:

Define read as a function but without support for reader macros or read-syntax, define eval as a function, then define any reader macros or read-syntax, then redefine read with support for reader macros or read-syntax, and then define the REPL, all within a single list block that is read by pre-read and evaluated by pre-eval.

I hope this isn't too hygienic of a way to build a REPL.