November 2018

Some features in Lip seem at odds with each other. Reader macros change input while it's read from the stream. Read-syntax changes tokens. Source code walking changes everything.

Are all three necessary? Could one or more of these features get built on top of the others?

Reader macros take the input stream as a parameter and as a result can hijack the reading process. They get triggered with single character hooks. When triggered a reader macro can parse many tokens together and return a list instead of parse only a single token.

For example, a reader macro expands the shorter sequence of three tokens [+ _ 1] into the proper but longer (fn (_) (+ _ 1)) by adding a hook on [ that parses input from the stream on its own until it reaches the ]. [1]

This type of processing can't be done with read-syntax because the input spans three tokens and read-syntax takes as input a single token. It's also hard to do with a code-walker because [+ _ 1] isn't just (or even) a list: it could be anywhere in the parse tree. A code-walker has to add hooks everywhere to find this expression. After reader macros, the easiest way to find this expression may be with a function that descends recursively over the source code represented as a list.

Reader macros show their power in another example, parsing `,@(list 1 2) to convert it to (backquote (unlist (list 1 2))). Read-syntax can't parse this because it can't jump the gap from the end of a token to the beginning of a list: it breaks between `,@ and (. The reach of read-syntax ends where the token ends.

Given this power it doesn't seem that reader macros are going away. Now how about read-syntax?

Read-syntax is the best way to change an unknown token. Reader macros don't work here, because the single character they're activated through must be at the beginning of a token and not all tokens have a distinct character in the front. One example is symtab!name!syms which refers to nested hash tables. Lisp has to read the whole token before it can figure out what to do with the ! characters sandwiched in. A reader macro activated on ! misses the beginning "symtab". [2]

A source code walker can change symtab!name!syms, so in principle it can replace read-syntax. This comes at a cost, because a code walker makes a second pass over the source code after all of it has been read. This cost may not be as bad in comparison because read-syntax also goes over every token while it's being read. A big difference is that read-syntax needs less effort to configure.

Source code walking may have more to do with writing output than it does with reading input. It's key advantage is knowing the structure of the parse tree. Want to output Javascript from Lisp? The code walker knows when an if statement starts compared to a quoted list that contains an if. But when reading input, a recursive descent solution may be easier to write.


[1]  If proper equals longer, inappropriate equals shorter.

[2]  The source of power for both reader macros and read-syntax, that they operate on input while it's read, is also a source of weakness. (This sounds familiar.) After they transform source code, the original source code is gone from memory. Just as it's powerful to replace 6.years.ago into a numeric unix timestamp, it's hard for a human to read a numeric unix timestamp and infer instantly that it means 6 years ago. The solution may be to preserve the original meaning: (ago (* year 6)).