Efficient Mutation

Here's the thing – I'm not very good at the whole concept of dealing with non-mutating data. Partly because I have a super difficult time understanding how you can write programs as the primary function of a program is to mutate data. So it's hard for me to grok the reason for design decisions such as "no pointers" and type-defined value/reference semantics. To me, they are actually a detriment to understanding how a program works.

So, I thought I'd post this to see if I can get some help from the community. I'll be glad to be corrected on this issue as I genuinely do not understand why there is this huge strive for immutable data everywhere.

Disclaimer: I completely understand why portions of your code would use immutable data; protecting yourself or others from accidentally causing side-effects is usually a good thing. That's not what I'm asking about though.

Ok, so what's the problem? That's easy: a syntax tree. I always find it best to ground my examples in real-world examples, and as I'm working on a programming language (Proteus) in my spare time, this seems like a great candidate.

The question is: how can I build up a tree over time efficiently without mutating but by retrieving new copies of the data? Further, how can I modify just a portion of the tree without needing to perform a copy of the entire tree?

It's the latter question that I'm really more interested in. I can completely see how the compiler could optimize away the copy while a type is being additively modified, especially since there are no other references to it. However, let's say you're building a code editor and you change the name of a function. There's no reason to re-parse the entire file; you know the location of the function in the file and where it sits in the AST, so the parse should be able to just parse the surrounding change and apply the delta to the tree.

Another way that I think about this problem is through the lens of a stream of changes that happen to the tree over time. So let's say we have the following stream:

  1. Parse top-level node (delta: add root node)
  2. Parse func decl (delta: add child to root node)
  3. Parse func body (delta: add children to func decl node)

Right, these are some very high-level deltas. With these three deltas, I don't see how to avoid the copy in the non-mutable world of the top-level node multiple times. In the world of mutation, I'd simply append the child to the parent for each of these deltas.

Am I just misunderstanding something?

To me, the efficiency of the program is extremely important. That's the time that the user is going to spend waiting for your code to run; it's important.

Efficient Mutation