Language Design: Building a Modern C, Round 1

When I think about building a modern C language, there really are just a set of a few things I’m thinking:

  1. Cleaner, more consistent syntax
  2. Proper meta-programming (e.g. a real preprocessor and introspection)
  3. Consistency across types
  4. Better built-in types (e.g. a proper string type and bounded arrays)
    So let’s just start with that.

Here is a very basic C program that was the start of the Proteus compiler:

#include <stdio.h>
#include <stdlib.h>

int main(int ArgumentCount, char** ArgumentValues) {
      if (ArgumentCount != 2) {
          printf("Invalid usage!\n");
          return -1;
      }

    printf("compiling file: %s\n", ArgumentValues[1]);
    FILE *InputFile = fopen(ArgumentValues[1], "r");
    if (InputFile == 0) {
        printf("Unable to open file.\n");
        return -1;
    }

    fseek(InputFile, 0L, SEEK_END);
    long FileLength = ftell(InputFile);
    rewind(InputFile);

    char *FileBuffer = (char *)malloc(sizeof(char) * (FileLength + 1));
    if (FileBuffer == 0) {
        printf("Unable to allocate memory for the file buffer.\n");
        return -1;
    }

    if (fread(FileBuffer, 1, FileLength, InputFile) != FileLength) {
        printf("Unable to read the entire file.\n");
        return -1;
    }
    FileBuffer[FileLength] = '\0';

    printf("file size: %ld\n", FileLength);
    printf("file contents: \n%s\n", FileBuffer);

    fclose(InputFile);

    return 0;
}

This is a translation of some of the items I mentioned above:

import stdlib

let arguments = environment.arguments
guard if arguments.count != 2:
    println "Invalid usage!"; exit -1

let filename = arguments[1]
println "compiling file: {0}\n" filename
guard let file = sys.io.fopen filename sys.io.flags.read:
    println "Unable to open file."; exit -1

guard let content = sys.io.read file:
    println "Unable to read the entire file."; exit -1

println "file size: {0}" file.size
println "file contents: \n{0}" content

This includes:

  • module system
  • improvements to the standard library
  • type inference
  • syntactical noise reduction (the : is the “open scope” syntactic sugar)
  • error handling mechanism (guard, which forces the code to exit scope if the condition fails)

I’m not entirely convinced on the lack of () to invoke functions, but they seem fairly superfluous. Removing them also makes this type of code clearer, in my opinion: foo 2 (3 + 2) vs. foo(2, (3 + 2)). Reading the second option requires a context shift to understand that the first ( is a function call group, where as the second ( is a grouping of work that should be done first. Plus, who doesn’t want to be more like Haskell?

Language Design: Building a Modern C, Round 1

Dependent Types: I”m Missing Something

I’ve been seeing a bunch of mentions about dependent types on Twitter lately and saw this blog post get mentioned.

Clearly I’m missing something.

The goal is provably correct code. Ok, I’ll suspend the fact that in the real-world computer hardware actually fails or memory can be corrupted, and I’ll focus on the provability of the code being provably correct in the abstract sense.

I still don’t get it.

One of the article’s example is this:

func compare(x: UInt, y: UInt) -> NSComparisonResult {
    // BACKWARDS! IT LIES!
    // (And how many times have you accidentally flipped these?)
    return (x <= y) ? .OrderedDescending : .OrderedAscending
}

func compare(x: UInt, y: UInt) -> UInt {
    // I hope you weren't actually planning on relying on this function.
    return .OrderedAscending
}

Ok… let’s continue on.

Dependent types are provided to fix this:

enum Order(left: UInt, right: UInt) -> Type {
    /// Represents that `(left <= right)`.
    /// Witnessed by a proof that `left <= right`.
    case lessThanOrEqual(because: LEQ(x: left, y: right)):
        Order(left, right)

    /// Represents that `(left >= right)`.
    /// Witnessed by a proof that `right <= left`.
    case greaterThanOrEqual(because: LEQ(x: right, y: left)):
        Order(left, right)
}

This is the part I don’t get: we allow for incorrect code to be written in the non-dependent-typed case, but we assume that we can’t do the same with dependently-typed code? Why? What’s preventing me from swapping left and right in the Order type that is returned?

I need to go and author more proofs to validate that the dependent types are all indeed correct.

Joe Groff tweeted this for some potential clarification:

//platform.twitter.com/widgets.js
In that link, we can see this example:

Definition negb (b : bool) : bool :=
match b with
    | true => false
    | false => true
end.

Definition negb' (b : bool) : bool :=
if b then false else true.

Theorem negb_inverse : forall b : bool, negb (negb b) = b.
destruct b.

But here’s the problem, this code also satisfies the proofs given so far:

Definition negb (b : bool) : bool :=
match b with
    | true => true
    | false => true
end.

Definition negb' (b : bool) : bool :=
if b then true else true.

Theorem negb_inverse : forall b : bool, negb (negb b) = b.
destruct b.

It’s up to the programmer to realize that we have actually not created all of the proofs required to prove correctness. So here’s my fundamental question: how do we know that we have enough proofs to actually prove our code is correct? I’ve only briefly looked over the topic, so maybe I’m simply missing the answer to this.

The paper does provide that last proof that we need:

Theorem negb_ineq : forall b : bool, negb b <> b.
destruct b; discriminate.
Qed.

But again, my question is how can we prove that we indeed have all of the proofs we need? Without that proof, I simply do not understand how dependent-types can actually produce provably correct code.

It’s an interesting topic; I’m not arguing against that or it’s application in domain spaces. I’m just trying to grok how this is objectively better than non-dependent-typed code with good test coverage. From a complexity standpoint, to me, it looks like a big loss; I get others will have less trouble on that end.

Dependent Types: I”m Missing Something

Proteus Development Starts

Today, development on Proteus officially starts!

I’ve created a Patreon if you feel keen to supporting the development. The plan is simple: I’ll work on it weekly posting updates every two weeks. Backers at different levels will get a few additional perks.

The goals of the language are simple: create a modern-day C language, and on top of that, create a modern-day Objective-C language.

I’ll be writing the parser using Swift, so other platform support will come when Swift is open sourced and working on various platforms. I expect to see a lot of interesting tangential posts comparing the two languages.

This is an experiment. At the end of it, I hope to have a language more streamlined and “safe” than C without sacrificing the raw hardware access that it provides. On top of that will be the dynamic runtime that will serve as the modern-day Objective-C implementation.

Eventually, the project will be completely open-sourced.

Although this project was inspired by Handmade Hero, it will be not “handmade”. I’ll be writing the lexer and parsing for the language by hand, and likely with few other tools. However, the compilation steps will be leveraging LLVM. So, I expect we’ll see some interesting Swift-to-LLVM C API challenges along the way.

More details to come later!

Proteus Development Starts