September 2017

While writing the first version of Lip, I bumped into three programming abstractions that as far as I know no other language had. This is rare in languages. Most programming abstractions were discovered about 50 years ago. Three abstractions surfacing now may be a sign of a new era.

To be clear, these aren't new programming abstractions but old ones that hadn't been fully fleshed out. I can spot an early form of them in existing work. All three can be more general.

What lead to this discovery was that I tried to write a language that keeps programs small. Keeping the program small isn't far from the best way to program. [1] And since the language is a program itself, I thought it has to be worth keeping the language small too. I kept reusing code and trimming code where I could until the language had me cornered. I saw no other way to do what I needed in the fewest lines of code without compromising on power unless the language had these abstractions.

1. Source code paging

The least ground-breaking abstraction is source code paging: the ability to load and unload source code on demand. [2]

Some languages have the ability to load code for a missing function when the function is called. When a function is missing in Perl for example, a function called AUTOLOAD runs to load it.

Languages are missing two features on autoloading. One, they trigger the autoloader for missing functions but not for missing variables. This looks like a minor detail, but it throws everything off in languages like Lisp where everything is treated the same. Missing the ability to autoload variables makes the programmer do more work. [3]

Two, there's no auto-unloader. If one wrote a program that continually loaded other programs, it'd eventually make the computer run out of memory or swap enough that it kills the program. [4] The other half of loading code automatically is removing code automatically.

These are small details though compared to the bigger problem. Before the code is loaded in memory it's loaded on disk with a package manager. Why must the programmer install manually the code that will be autoloaded later? How's that automatic if it's manual? It'd be less work if the package manager got out of the way.

The package manager is a bad autoloader in this respect. It can't add, remove, or upgrade code in memory, and it takes an extra step. An autoloader is the more general solution. Which suggests two changes to get a better language: (a) have an autoloader, and (b) drop the package manager.

The final difference is to name the autoloader a source code pager. Why source code? Why not have the autoloader work on binary?

The reason is that source code is more powerful than binary. It's less work to change source code to produce a different binary than to change the binary directly. The autoloader can still work on binaries if it needs to, but most times the programmer changes source code.

Ever got that error message 'puhs' is not a git command? Git then asks if you meant push. If Git can tell what you meant, why not just do it?

Interlisp did, with a feature called DWIM (Do-What-I-Mean). [5] It could notice typeos in variable names and use the variable with the closest name. If the user missed pressing the Shift key and typed a 9 instead of the left parethesis, it'd fix it for the user automatically.

The more general case of source code paging is to not only find and load on demand the source code the user asked but also to understand what the user meant when they asked. Do what I mean is valuable enough of an idea to have made it into Google search.

I added in Lip two hooks (page-fn and page-var) that get triggered when a program calls either a missing function or a missing variable. These hooks load source code while the program runs and through the network instead of a package manager, caching what they load. Paging code out is missing and there's no DWIM.

2. Code walking

A forgotten way to change source code is with a code walker: a program that changes a program. There are two ways to use it.

The first is to use the code walker as a source to source transformer. To take source code as input, change it, and produce source code as output. There's nothing running live. Let's call this a static code walker.

The second way is to use the code walker live. Right before the code executes, the code walker can change it. Imagine running a = b + 1 in a dynamic language. When the language is about to evaluate this statement, the code walker takes over and changes it to issue an update to a database.

Similar to autoloading, the general case of both static and live code walking is to operate on source code, not binary. [6] A code walker can change source code into binary but not the other way around. [7] So while for example a static code walker could take binary as input and produce binary as output, it's far from the easiest way to work. It's easier to work with source code.

Now, when would a programmer want to code walk? There's a big range of possibilities here.

Ordinarily a programmer changes source code by hand. It's less work to make a single change manually. But when the change has to happen in many places then it's less work and more reliable to make the change automatically.

What kind of change? Any kind. A code walker can change variables, array indexes, loops, conditionals — any language operator instead of just functions. It can add new abilities in existing source code, or take parts of the source code out. It's common to wrap existing primitives, like do more before and after assignment to a specific variable. [8] It's also common to add logging, or to change one function call for another. A code walker isn't only a wrapper like a macro but is more powerful because it can gather state from the whole program instead of only the macro call.

Can't it all be done statically? Not always, because not all of the source code is available ahead of time. The source code that ends up running might not be written by the user who runs it but by others, loaded from the network at runtime, and from locations that are unknown when the program starts. The web works this way. So it seems powerful to have a live code walker. [9]

Which is an exciting prospect considering none of the most popular or powerful languages have one. C, Go, Javascript, Lisp, OCaml, Python, and Smalltalk don't. Most don't have a static code walker either. Code walking may be the most powerful programming abstraction and yet it's missing from most languages.

I can't help but wonder how well it'd work to run the web on code walkers. Rather than just run a web app, tweak it before running it. Change not just what is sent to the browser, but also what runs on the server. It's kind of paralyzing to think to what extent source code could be changed if one could access all of it. Bugs could get fixed automatically. Algorithms optimized. Monoliths parallelized. This must be worth a try.

Most code walkers so far worked statically and were written as third party libraries. [10] But because the general solution is to also have a live code walker, I added this ability in the language core.

To get a live code walker, I added in eval the ability to look at a code-walkers hash table that has code walker functions for the language operators. Rather than evaluate the code, if this variable exists Lip calls the code walker with the source code as a parameter and evaluates the code walker's result.

To get a static code walker, I added in Lip a new operator called walk. Internally walk calls eval with a flag that means don't evaluate the code when eval recurses in it but instead return source code. [11] The static code walker also uses the code-walkers variable.

3. Safe mode

Another abstraction I haven't seen in other programming languages is safe mode: the ability to run untrusted code safely.

I'd like to hand off some code to the language and say, there, run it, but make sure the code doesn't harm anything. I don't want the code to accidentally wipe out my hard drive. I want code written by others and sent over the network to run safely on my local machine.

This feature seemed to be the safest if hardcoded in the language core. If it were implemented outside as a library, a bug in the library or how the library is loaded can break safety. Browsers got hacked this way. I don't want to take such chances. I want the guarantee of safety.

I added this feature in Lip as extra parameters to eval. In Lisp eval takes as parameters the expression it will evaluate and in some dialects an environment. In Lip it takes more parameters, one of which is an access control list that tells eval which language operators and functions it's allowed to use.

For example, to lock Lip down to run only as a calculator: [12]

(eval '(+ 1 2)
      t
      (hash 'only (hash '+ t
                        '- t
                        '* t
                        '/ t)))

Impact

I realize there's not much new here. All these abstractions already exist in one form or another. But once I made all three more general and put them in the same language I started to think about the options they open up.

One option is to remove abstractions. I don't see why the abstraction of a client and a server must exist. It's easier to think of an application as if it runs on a single cpu. Some types of private code will still need to run only on some machines, but other than that, a client and a server that can't harm each other are both one and the same. With safe mode they can be. To fix problems with latency, add caching at the right places. To make the program run on other machines, rewrite it with a code walker. I haven't tried this.

Same goes with the abstraction of a network protocol. Why have an application-level network protocol? Given a basic protocol like TCP/IP, an advanced protocol is a closure as source code sent from the client that runs on the server under safe mode. [13]

I tried removing the abstraction of an application-level network protocol in LipOS, a middleware I wrote in Lip that pages source code over the network. I had to add one more feature to safe mode: the ability to drop its privileges. [14] With this change closures written on the client ran safely on the server, with the server eval controlling which operators can run.

If all of this turns out to work, that'd be five abstractions that work differently than what we're used to. I can't possibly know if this is how people will want to program in the future but I don't want to go back to programming without these options.





Thanks to the reviewer who pointed me to a couple of past references to look at.









Notes:

[1]  The advantage of keeping code small isn't only that you do less work, but also that it makes it easy to notice when a piece doesn't fit well with the others. Smallness fuels rigor.

[2]  Operating systems have a similar ability called memory paging that loads and unloads binary instead of source code. With source code paging this ability is placed elsewhere and earlier. Instead of the OS doing the paging, now the language does it. The language can still choose to compile the source code into binary and page that but the starting point can always be source code. The OS isn't designed to do this.

[3]  Like trigger the autoloader before the variable is used, to load the right module that has the variable. It also pollutes the code. Logic that should be only in the autoloader now spreads into the programmer's code. Plus it may load more code than necessary when all you need is a variable.

[4]  This happens often with my browser. My terminal doesn't fair very well either.

[5]  Do what I mean appeared first in BBN Lisp 1.85. The next version became Interlisp.

Although DWIM in Interlisp was missing source code paging, DWIM seems like a better end goal.

[6]  A code walker is different than an autoloader. A code walker takes as input code that exists. An autoloader finds code that's missing. One may trigger the other.

[7]  There are limits to what binary instrumentation can do without help from hardware or the language. In general it's impossible in binary to handle data in code and code in data, or to handle self-modifying code.

[8]  Databases are an example of code that wraps primitives. A database trigger is a live code walker on assignment (for insert and update) and key deletion (for delete). A database view is a live code walker on variable access (select). One way to model transactions is with a live code walker that on BEGIN saves data in temp space and on COMMIT assigns the data.

They're the same by the way, a database and a language, because the problem they try to solve is a general one. Their difference is they approach the problem from opposite ends. Today's database is an ugly, inflexible programming language; today's language is a database without transactions and no persistence. This isn't a solved problem if we judge by how much progress they each made towards the other.

[9]  Experts may have noticed a code walker isn't far from a compiler. A compiler is a collection of code walkers; a code walker is one internal pass in a compiler.

[10]  The earliest version of a live code walker on function call seems to be a feature called advising in the PILOT system in Warren Teitelman's 1966 PhD thesis. Later it was added in Interlisp.

A similar system called LISP70 used rewrite rules. It seems to be a static code walker that pattern matched code. So there may be room for a pattern matching live code walker.

[11]  This had to be a different operator than quote because quote ends the recursive descent.

[12]  The idea isn't new. I'm told that in the Burroughs 5000 a process ran only the language operators and the environment that was granted to it. The granted data structure was external to the code and could be changed on the fly, and every entity in the environment on the B5000 could be "advised". This was the first pass and the basis of capability architectures, and in the hardware of a higher-level machine from 1962.

I'm guessing this feature was provided by the OS and not integrated in the programming language. I doubt safe mode is a feature of the languages that ran on the B5000, ALGOL and COBOL. So this permutation of the idea may be new.

[13]  Running safely on the server closures written on the client has been done over the years. When a Postscript closure is sent to the printer, it prints by running safely inside a closed address space. Same for HTML and Javascript.

Lip's safe mode though not only provides a closed address space but also lets the server choose which operators and functions to disallow. Postscript, HTML, and Javascript don't.

[14]  A later version of LipOS didn't need the ability to drop privileges. This feature is still useful in safe mode though, to let privileged functions run user-supplied functions without granting them elevated privileges.