April 2017, rev. July 2017

No programming language I know of lets the programmer create struct variables on the stack that have an unknown number of fields of an unknown and different data type for each one.

Like a list of strings and ints. You can't define a struct in C with an unknown number of fields.

It can be done on the heap. Many languages have built-in data types or libraries that use runtime logic to save lists of fields on the heap.

But not on the stack. The number of fields is unknown. The type of fields is unknown. Without knowing how big the variables are, the language can't calculate ahead of time how much stack space to reserve for them. This calculation needs runtime logic. No wonder there's no syntax to define stack-bound variables like these in powerful and popular languages like C, Java, or OCaml.

This is unfortunate, because it's faster to create variables on the stack than the heap. On the stack you get space fast by incrementing a register. On the heap you call malloc, a function call that's slower.

To be more precise, the problem is that the variables are saved in a contiguous memory area. The stack is always contiguous but the heap isn't.

The closest thing we have for dealing with this problem is variadic functions in C. These functions allow passing on the stack an unknown number of arguments, by running a loop that extracts them one by one. Variadic functions are uncommon but they're not that hard to write.

Since variadic functions are possible, why wasn't this idea also applied to data types? Why is there no such thing as variadic data types?

One place to look for the answer is the loop that extracts the parameters. When all parameters are of the same type, it's easy to increment a pointer by a constant value each time to get to the next parameter; the value is the size of the type. For example, if all parameters are integers, always increment the pointer by 4. It's easy to write this once in a loop and get it right.

But it's hard to get it right with many data types of varying sizes. It's confusing. The programmer continually has to think about the order of the variables and their types, and if one data type or pointer increment goes wrong then everything breaks. It gets more confusing if types contain other types, like lists or hash tables. And even more confusing if there are many loops like this, say in recursive programs like parsers, because it's harder in a recursive program to pinpoint which of the many loops in many functions has a bug as opposed to a program with a single function.

Type safety doesn't help here. Since the language doesn't offer a way to express variadic data types, the type verifier doesn't have a way to check them.

One thing that helps get variadic data types as parameters passed and extracted correctly is being focused and methodical. Another is having a debugging function at hand that dumps the raw data in binary. Another is having a testsuite, especially for recursive programs. A single small testcase there can help fix 50%-80% of the bugs.

I don't know how hard it'd be to add variadic data types into programming languages. The first thing to change is our mental model of what a data type can be. A data type isn't only a list of symbols. A data type is something more. In general, a data type definition also needs code.

A programming language that wants to include stack-bound variadic data types in its type system must let the programmer define in the data type the loop that saves and loads the data. A better way to access the variables would help too rather than move pointers around by hand.