September 2017, rev. November 2017, rev. April 2018

I keep forgetting how references work in Lip. I'd better write this down.

The need for references started from vm_ir_fnorarray. It calls vm_ir_hashget that returns a reference because it's faster than making a copy of the value that what was in the hash table (the global environment). I want to free only the reference instead of also free what the reference points to. [1]

Later I wanted vm_mk_list_deref to create references to each element of the list. [2] The list is implemented as an array instead of a linked list to reduce use of malloc. [3] It'd be faster to replace a list element without having to recreate the list. But creating a reference here is different than creating a reference to a hash table. If only the reference is freed, the element it points to leaks memory.

So I added a flag to tell these two types of references apart. I'm calling them free and chained. A chained reference frees what it points to and it's used to avoid memory leaks. [4] A free reference avoids copying a value from a hash table, and avoids copying a parameter passed in a function call. [5]

It's safe for one data structure to contain another using a chained reference when there's only one entrypoint to the data structure. A list can contain a list by chained reference. A list can contain a hash; a hash a hash; a hash a list. When there's only one entrypoint, all references are chained, not free. Which means there's no way to access the contained data structures by some other variable.

Trouble begins with free references, because they open the possibility to access deallocated memory. When an object is freed, accessing it from another object that points to it with a free reference will crash.

Lip's solution is to eliminate this possibility. It doesn't give the user the option to create free references. If the user wants to create a list that points to another list, they get a chained reference, not a free one: the list makes a new copy. The only exception is Lip itself using free references in internal commands, like follow a free reference in vm_ir_fnorarray. There Lip frees the free reference on its own.

So both free and chained references are set indirectly by the language. Up to this point, the user has no direct control to create references.

This is still wasteful though when making copies of big data structures. Copies take time and memory. So it's important to give the user some way to refer to another data structure — just not with free references. This was the idea behind having references outside the core in the Lip interpreter. To let the reference be looked up by the program, not the language core.

Based on what I know so far, I expect the user to create user references: some way to lookup another variable. A user reference is something like a search query in a database. It contains enough to find what you want but doesn't point to the data directly. A user reference is not a memory address that points to a value directly. A user reference must first be interpreted.

Say there are two lists x and y, and x should point to y rather than make a copy. I expect the user to add in x something that, when read, looks up the value in y or returns nil (if y or the value in y is missing), and when written, writes the value in y or does nothing (if y is missing). For example, a user reference can be a function (or a tagged type containing a function) that looks up the list element in the environment. Or it could be a code walker on assignment and on list dereference.

At worst the programmer can use symbols. When a program encounters a symbol, use conditional logic that looks up the symbol in the environment. NewLisp tried this and it works.

It may turn out that user references and free references are the same thing. That what the language core needs to represent a user reference can also be used to represent a free reference.

Another source of trouble is functions. Function parameters are always freed after a function call returns. If a free reference was passed as a function parameter, the free reference is freed (and the value it pointed to isn't, which is what we want). If a reference was passed as a function parameter and the parameter was used as the return value, freeing both the parameter and the return value would cause a double free.

Once again Lip's solution is to eliminate this possibility. A function's return value is always dereferenced and deep copied, whether it uses a free reference or a chained one, and irrespective of the object type.

Whether free or chained, setting a variable to a new value when the old value was a reference preserves the reference. It makes the reference point to the new value (and frees what the reference pointed to), rather than free the reference and set the variable to the new value.









Notes:

[1]  Later I learned that freeing a reference in vm_ir_fnorarray (which implements the IR_FOA instruction) works only if the reference was a free reference. I fixed vm_ir_fnorarray to not free a chained reference, because the chained reference has an IR_FREE instruction generated for it that frees the reference later.

For example, vm_ir_fnorarray should not free for ((cut (list 1 2 3) 1 nil) 0) because the return value of cut is freed later.

I also learned that it's better to always make a copy of what was in the global environment instead of create a reference.

[2]  Had it not created references, list elements could contain VM registers, which don't stick across REPL iterations. It's semantically invalid to save a VM register on the heap.

This may be incomplete design, and the solution may involve a list stack. But changing the design here doesn't eliminate the need to creates references for values looked up in hash tables. References avoid the overhead of making copies.

[3]  This may still call malloc to create each object the reference points to. But only for dynamic objects like lists and hash tables, and only if these objects are currently allocated in the data section. Objects already on the heap don't need another malloc.

It avoids malloc for integer, decimal, and bool because they're saved directly. A linked list doesn't. Creating in Lip a list of 3 numbers calls malloc only once.

It avoids malloc in one more way. The reference (not what the reference points to) is saved directly in the array. So there's no need to malloc space on the heap to save it.

Objects in Lip have a flag that tells if the object is allocated inline (e.g. in the data section) or the heap.

[4]  This is the opposite of a strong and a weak reference used in garbage collection. In GC a weak reference means don't protect an object from being collected; let it possibly get collected. Here a chained reference means collect the object instead of protect it; since there's no GC, no object is ever protected.

[5]  A free reference is not used when copying a list or a hash. Anything copied is treated as a copy. As explained later, if a user wants a reference instead of a copy, they must create a user reference manually.