April 2019

When the datatype of an expression isn't known until run-time, it's impossible to guarantee at compile-time that the expression will be a variable that should be written to, in a language without GC and no syntax.

Example:

(= a (hash 1 "one" 2 (hash)))
(= ((a 2) "some") 7)

Here (a 2) refers to a hash and the language sets ("some" 7) in it. The naive version of the IR generated for this assignment reads a, reads (a 2) from that, writes ("some" 7) in (a 2), and then writes a back in the environment. Writing the value back is needed to avoid GC.

['r1', '=', 'hashget3(env,a)']
['r2', '=', 'mfa(r1,2)']
['r3', '=', 'setkv1(r2,"some",7)']
['r4', '=', 'setkv3(env,h,r1)']

This works if a is a hash, but what if it isn't? If a is a function, (a 2) returns the result of the function call and a shouldn't be saved back in the environment. The function a shouldn't be redefined.

It's impossible to guarantee at compile-time that the left-hand-side of this indirect assignment — the a in (a 2) — is a variable that should be written to. If the program is multi-threaded, a can be set to a different datatype by another thread. So if IR was generated for a as a hash and a changes to a function, the IR would be invalid.

Same for (= ((list 1 2 3) 0) 7) which doesn't refer to a variable, in a highly dynamic Lisp that may replace expressions at run-time with a live code walker. But lets keep live code walking out of this because it's not the root cause of the need to generate IR differently. Multi-threading is sufficient to trigger this problem too. The root cause is that for variables a copy must be saved back to avoid GC. IR_SETKV3 is needed for variables to avoid GC.

In both examples nothing should be saved back in the environment. The value that is indirectly assigned to is the return value of the function or the expression, not a variable. An unknown expression may not be a variable.

I didn't notice this problem before because most times I used indirect access to read. Like (list 1 (a 2) 3). Now I'm using it to write. Writing is harder. So although IR_MFA seemed unnecessary in Lisp for reading, it seems necessary for indirect writing when the goal is to avoid GC.

Other languages used syntax to handle this. But adding syntax removes the possibility for the expression to change at run-time.

In Python, indirect assignment of a variable is expressed with square bracket syntax: a[2] = {"some": 7}. The square brackets imply a can be changed and the assignment operator implies a does get changed. The absence of parentheses implies a isn't a function. [1]

For a function call, the syntax that implies the variable a doesn't get changed is the parentheses: a(2)["some"] = 7. Function a doesn't get redefined.

For a list, square brackets around the list (not after it) imply no variable in the global environment gets changed: [1, 2, 3][0] = 7.

In Python, the syntax defines ahead of time whether a variable gets changed. It makes the language easier to write.

It's kind of amusing that square brackets and parentheses came to the rescue here for Python and most languages but not for Lisp. Lisp is full of parentheses and instead of benefiting from them it's as if it drowned in them. Why? Because adding syntax is a limiting solution and somehow Lisp knew better and avoided syntax.

Indirect assignment of unknown expressions without GC and no syntax may be a new problem. A type check at runtime may not be a bad solution.









Notes:

[1]  The square brackets don't imply that a is a variable. What's on the left of the square brackets may be a list, as shown in a later example.