Error Handling – Take Two

Make sure to see the update below for a bit for more information on the causes of memory usage.

In my seemingly never ending and not quite achievable goal of beating NSJSONSerialization in both performance and memory utilization for parsing a JSON string, I've come across another pearl of wisdom with regards to Swift: ignore my Error Handling in Swift piece and others that recommend using the Either<T,U> as in other languages (at least for the current version of Swift, as of Beta 6).

I have been able to get my parsing speed to within 0.01s of NSJSONSerialization; while my goal is domination, I also am pragmatic (at times). Next up was memory utilization. Unfortunately, I was (and still am), far behind the total memory usage of the ObjC version. So like a good little software engineer, I fired up Instruments and started investigating what I saw.

When you investigate memory usage, there are three primary concerns that we need to watch out for:

  1. Total amount of memory used over the life of the scenario
  2. Total amount of memory every actually in use at any given time
  3. Highest spike in memory used over the life of the scenario

Instruments visualizes this data pretty nicely for us:

screenshot of instruments with multiple memory profiles visualized in the editor

The picture above is showing the results of the NSJONSerialization code path. My implementation actually has a better "total persistent bytes" overall of 1.92MB vs. the 2.51MB shown above. However, the total memory used in mine was about 6.5MB while we see that NSJONSerialization only used about 4.7MB.

Taking a Dive

There are a couple of approaches we can take to tracking down and solving memory issues:

  1. Examine the code 2. Examine the profiles

Unfortunately, the profiles were not really helping me track down root cause of the issues, but were illustrative in helping me understand that I was creating many, many copies of objects all around the place.

Examining the Error type I first took a quick look over my code to see if I could see anything obvious. There was one thing I noticed right off the bat: FailableOf<T> stores an Error object in its Failure case. Well, the Error type is a struct with three values in it, and since I return a FailableOf<T> in all of my parsing calls, I'm going to need to return a copy of that Error, even if it's empty, all of the time.

Knowing that the Error object is going to be copied so many times throughout the call chain, we can instead mark the Error type as public final class.

When we do this, the total memory usage drops to 6.06MB.

The other option is to create a backing class to store all of the data: that class looks like this:

public struct Error {
    public typealias ErrorInfoDictionary = [String:String]

    class ErrorInfo {
        let code: Int
        let domain: String
        let userInfo: ErrorInfoDictionary?

        init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
            self.code = code
            self.domain = domain
            self.userInfo = userInfo
        }
    }

    var errorInfo: ErrorInfo

    public var code: Int { return errorInfo.code }
    public var domain: String { return errorInfo.domain }
    public var userInfo: ErrorInfoDictionary? { return errorInfo.userInfo }

    public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
        self.errorInfo = ErrorInfo(code: code, domain: domain, userInfo: userInfo)
    }
}

However, that seems to be a lot more complicated over simply do this:

public final class Error {
    public typealias ErrorInfoDictionary = [String:String]

    public let code: Int
    public let domain: String
    public let userInfo: ErrorInfoDictionary?

    public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
        self.code = code
        self.domain = domain
        self.userInfo = userInfo
    }
}

And since all my values are immutable to begin with, I'm not sure why I would chose the struct approach for this problem.

Investigating the FailableOf<T> Since I'm having copying issues with the Error (gist) type, it is only logical to look at the FailableOf<T> type next. Instead of using my JSON parser as the test ground, I decided to create a little sample app that would loop many times calling a function that returned the following types:

  • FailableOf<T> – my implementation of the Either<T, U> concept (gist)
  • Either<T, U> – a more generic solution to my FailableOf<T> problem (gist)
  • (T, Error) – a tuple that contains the two pieces of information

The sample program is straight forward:

func either<T>(value: T) -> Either<T, Error> {
    return Either(left: value)
}

// test: either
for var i = 0; i < 100_001; i++ {
    let r = either(i)
    if (r.right != nil) {
        println("error at \(i)")
    }
}

Each of the different constructs have the same form (gist).

This is where I found something interesting: both the FailableOf<T> and Either<T, U> take up about 3MB of memory, while the (T, Error) tests only take 17KB. Clearly, there has to be some missed compiler optimizations in Swift. Regardless, the tuple approach is clearly the one we should be taking, at least for now, if we really care about every ounce of memory.

In order to work with it better in my code, I create a typealias and use named tuples:

/// The type that represents the result of the parse.
public typealias JSParsingResult = (value: JSValue?, error: Error?)

After updating all of the JSON.parse code to return this new type, memory usage is down to 5.33MB!! Simply switching from a struct-based approach to this named tuple approach (which I think is just a good, frankly), I was able to shave off another 700KB of unnecessary memory creation.

I'm not done investigating other opportunities right now, but things are starting to look really promising here.

UPDATE After some more investigating, I realized why the enum case was causing such memory bloat: we need to box all of the types that get stored in them until Swift implements the proper generic support for an enum.

Error Handling – Take Two

Swift Proposal: protected

There has been much said about protected and how Swift needs, I mean, NEEDS, the "protected" keyword. In fact, there has been so much ruckus about it that the Swift team wrote a blog entry on it: Access Control and protected.

While I whole heartedly agree that the protected keyword is a terrible idea from an inheritance perspective, the intent of the notion has great value. I'm going to define the intent as this:

The ability to separate concerns of implementors and consumers.

: .callout

If we focus the definition, it's really not that hard to image how we can extend the existing public, internal, and private access modifiers that Swift already offers with a fourth option: protected.

I propose that we could enable the following:

  1. Introduce the protected keyword
  2. Modify the import rules to include a protected modifier

The rule for the protected keyword would be quite simple:

Protected access enables entities to be used within any source file from their defining module, and also in a source file from another module that imports the defining module with the protected modifier. You typically use protected access to specify the public interface for those wishing to extend the functionality of your types, but hiding that functionality from the consumers of your API.

: .callout

An example would be this:

Defined in module FooMod

public struct Foo {
    public func foo() {}
    protected func bar() {}

    public var fizzy: Int
    protected var fuzzy: Int
}

protected func MakeSuperFoo() -\> Foo {}

Then, in another module, you would have to use the following in order to gain access to the protected members.

import FooMod                  // Brings in all of the public members
import protected FooMod        // Brings in all of the protected members

let f: Foo = MakeSuperFoo()
f.foo()
f.bar()

I think this fits into the existing access control mechanism perfectly and provides a way to provide the high-level intent of what people are asking for with protected.

Swift Proposal: protected

The Reasoning Behind the Choices

Sharing code in public is interesting in many ways. Sometimes the choices we make about design are somewhat arbitrary as there are many options before us. Sometimes those choices are deliberate and methodical with a well reasoned approach on how you got there. Then there are those times where you just do something dumb…

If you’re going to be willing to share your code for the world to see, you really need to be OK with being wrong about something and learning from it. But you also need to know how to stick to your guns when you think you are doing things right. This is post is going to be a bit about both using my latest JSON parsing articles as illustrations: Generators Need a current Value and Improving Code with Generics.

The primary goal of the code that I wrote was to enable the ability to parse through a JSON string and create a JSON object representation from that string. However, in that article, I presented a much lower level view of the problem and framed it in such a way as to remove all of the context on why and how I reached that decision.

Wes Campaigne posted some great feedback over on GitHub about the approach I took to the problem.

I thought the whole

buffer.next()
while buffer.current != nil {
    if let unicode = buffer.current { // ... somewhere, buffer.next() is called

dance was kind of ugly: you’re dealing with the overhead of using a generator, but receiving none of the benefits it provides (e.g. for in loops). Also, using a struct for your BufferedGenerator seems odd — you end up using a class as a backing store anyway, and having it as a struct means using inout parameters all over the place. There’s a discussion on the dev forums that argues the case why GeneratorTypes should, in general, just be reference types.

Wes makes some great points, and his RewindableGenerator<S> is a very good class that solves the specific problem I was looking at better (both in terms of the applicability of the use cases and in how the code that consumes it should work).

The only real problem, which I forgot when I first looked at his solution, was that the performance difference between using the GeneratorType and the Index types for Strings is fairly significant, nearly a 2.5x slowdown.

When I was first solving this problem, I looked at the following approaches:

  1. String.Index based approach grabbing individual characters. This lead me to find out how String works with unicode combining characters.
  2. Then I tried using String.UTF8View.Index, after all, they are both indexes it should be a fairly easy change. Well… turns out that String.Index is a BidirectionalIndexType but String.UTF8View.Index is only a ForwardIndexType. At this point, I realized that I basically needed to re-write a significant portion of my algorithm. I did so making sure that all of my previous() calls were updated; this also required some fairly ugly hacks to get everything to work. Then I found out two new things after more investigation in the topic:
    1. Performance of the GeneratorType construct was significantly faster than the Index based construct.
    2. There is a better view into the string String.UnicodeScalarView. With the String.UTF8View, I had to create strings by passing a pointer to an UInt8 array that I had to keep track of while parsing the string. It was fairly ugly, but it worked. =)

Both of these lead me to the realization that another parser re-write was coming… however, this time, I knew I needed to use GeneratorType and I knew that I wanted to get rid of a lot of the hacks I did. This was the start of the Generators Need a current Value and Improving Code with Generics posts.

Well, I was able to get rid of some of my hacks, but then Wes’ comments came. I already wasn’t very pleased with the implementation of the JSON parser as it still had some hacks in it and some somewhat cryptic logic, but hey, it worked! But as I thought about Wes’ comments some more, I knew there was a better way.

So I started integrating Wes’ solution into my parsing code. But, I had already forgotten a lesson I had learned earlier: Index based approaches suck at perf, big time!

At this point, I had already re-written the parsing to provide some significantly better error messages (thanks in-part to using for (idx, scalar) in enumerate(generator) {} that was now possible due to Wes’ updates) and a much cleaner logic flow. However, I wanted to get my performance back down.

That’s when I came up with this class: ReplayableGenerator

final public class ReplayableGenerator<S: SequenceType> : GeneratorType, SequenceType {
    typealias Sequence = S

    private var firstRun = true
    private var usePrevious = false
    private var previousElement: Sequence.Generator.Element? = nil
    private var generator: Sequence.Generator

    public init(_ sequence: Sequence) {
        self.generator = sequence.generate()
    }

    public func next() -> Sequence.Generator.Element? {
        switch usePrevious {
        case true:
            usePrevious = false
            return previousElement

        default:
            previousElement = generator.next()
            return previousElement
        }
    }

    public func replay() {
        usePrevious = true
        return
    }

    public func generate() -> ReplayableGenerator {
        switch firstRun {
        case true:
            firstRun = false
            return self

        default:
            self.replay()
            return self
        }
    }

    public func atEnd() -> Bool {
        let element = next()
        replay()

        return element == nil
    }
}

I’ve been experimenting with using switch-statements over if-statements; I’m greatly likely their readability in many cases. However, there does seem to be a bug where case true and case false do not create an exhaustive list, so I use default.

: .info

These were the constraints:

  1. Index based iterators and lookups are significantly slower than GeneratorType and for-loop; they cannot be used.
  2. The GeneratorType is only a forward-moving iterator.
  3. There is no ability to inspect the previous character in the construct. This is vital because when we parse values, often times we need to inspect the next value to determine if we stop parsing the current value. However, once we do this, we are in a bit of a situation as the parser really needs to start parsing from that previous character because it’s going to call next() and skip over the just visited character. Bad mojo.

This class provided everything I needed, while the semantics of it also allowed me to create a much better parse(). The integration was also easy as I simply needed to replace the previous() calls with a replay() call.

With this implementation, I was able to get my performance back down to 0.25s vs. 0.17s (JSON.parse vs. NSJSONSerialization).

Remember, often times people are able to look at a problem have been working on and shed new light on the situation. While Wes’ solution was not applicable to my situation, his thought process on why his implementation better was superbly helpful in rethinking the semantics of what I was doing. Ultimately, I’m fairly happy with the results of the parser now… except for that perf! =)

So thanks Wes for helping me think about the problem better. Oh, and you can judge my parsing code here: JSValue.Parsing.

The Reasoning Behind the Choices

Improving Code with Generics

Update: I updated the post to make use of S: SequenceType instead of T: GeneratorType; it's a cleaner API.

: .info

Yesterday, I wrote about how we needed to build the following class:

struct UnicodeScalarParsingBuffer {
    var generator: String.UnicodeScalarView.Generator
    var current: UnicodeScalar? = nil

    init(_ generator: String.UnicodeScalarView.Generator) {
        self.generator = generator
    }

    mutating func next() -> UnicodeScalar? {
        self.currentUnicodeScalar = generator.next()
        return self.currentUnicodeScalar
    }
}

When we look at the code above, we can observe a few things:

  1. The code is tightly coupled to String.UnicodeScalarView.Generator
  2. The code is tightly coupled to UnicodeScalar
  3. The code loosely conforms to GeneratorType

We can make this code better and more suitable for other instances of GeneratorType; or to put it another way, generic.

Let's start from bullet #3; we should be conforming to the GeneratorType protocol because this really is simply another type of generator.

The definition starts to take shape like this:

struct BufferedGenerator : GeneratorType {
    var generator: GeneratorType
    mutating func next() -> UnicodeScalar?
}

Bullets #1 and #2 are aspects of the same coin as Generator and Generator.Element are really defined from the same construct.

The interface now looks more like this:

struct BufferedGenerator<S: SequenceType> : GeneratorType {
    typealias Sequence = S

    var generator: Sequence.Generator
    var current: Sequence.Generator.Element? = nil

    init(_ sequence: Sequence) {
        self.generator = sequence.generate()
    }

    mutating func next() -> Sequence.Generator.Element? {
        self.current = generator.next()
        return self.current
    }
}

This implementation now let's us use any type of SequenceType as a BufferedGenerator.

We use SequenceType as the generic constraint instead of GeneratorType because it creates a better ownership model for the underlying generator. The call to next() should only be done from a single generator; this code puts that burden on BufferedGenerator<S> instead of the caller.

: .info

Generics can be a great way to reduce type information that simply doesn't need to be there. In this case, there was no reason that the original UnicodeScalarParsingBuffer needed to be tied to a specific type. Generics can also help greatly in code reuse, which is almost always a good thing.

The full source for the json-swift library can be found over on GitHub.

Improving Code with Generics

Generators Need a current value

When you build a parser, you need the ability to scan through your tokens to build up your output. Swift offers us a few different constructs for iteration; I needed the one that would be the fastest, after-all, I'm building a parser!

I wrote a small test suite to test the various iteration types, which essentially boils down to two options for Strings:

  1. The tradition for-loop that uses an index value
  2. The GeneratorType based approach

Index-based for-loop Here's the code for this one:

var string = ""

let scalars = self.largeJSON.unicodeScalars
self.measureBlock() {
    for var idx = scalars.startIndex; idx < scalars.endIndex; idx = idx.successor() {
        let scalar = scalars[idx]
        scalar.writeTo(&string)
    }
}

It's pretty straight-forward; simply start at startIndex and traverse your way through the string until you hit endIndex. There are couple of gotchas though, the most significant being that Swift doesn't allow Int-based indexing – all of the types have their own special indexing type. The thing to watch out for, not all of them are bi-directional.

GeneratorType Approach

var string = ""

self.measureBlock() {
    for scalar in self.largeJSON.unicodeScalars {
        scalar.writeTo(&string)
    }
}

This one is fairly simple as well: simply loop through all of the unicode values. We can also write this loop in a slightly different way:

var string = ""

self.measureBlock() {
    var generator = self.largeJSON.unicodeScalars.generate()
    for var scalar = generator.next(); scalar != nil; scalar = generator.next() {
        scalar?.writeTo(&string)
    }
}

In my testing, I found that the GeneratorType-based approach was about 18% faster. This is significant enough for me to use it. =)

Implementing the Parsing

Next up is actually parsing the JSON string. The basic idea is to look for specific tokens and call into one these methods:

  1. parseObject – used to parse out a JSON object (e.g. dictionary)
  2. parseArray – used to parse an array
  3. parseNumber – used to parse a number value
  4. parseString – used to parse out a string, also used when parsing keys from a dictionary
  5. parseTrue – used to parse the boolean value true
  6. parseFalse – used to parse the boolean value false
  7. parseNull – used to parse the literal value null

I think that about covers the basics of what I need. And here comes the problem… each of these a one more or pieces of the information:

  1. The current generator value so that increments can be done
  2. The current character the generator is pointing to
  3. The character used at the start of the parse call

When we look at the API for GeneratorType, we find that it only supports next(). Hmm… that's not going to be sufficient. So now we are left with two choices:

  1. Pass the current unicode token around with the our generator instance, or
  2. Package up the generator and the current unicode token into a single class

To me, this is a no-brainer. As soon as we introduce this coupling, it is best to package up the dependencies and maintain that state with a single value.

Ideally, we would simply be able to extend the GeneratorType instance for String.UnicodeScalarView, however, we cannot extend types with stored properties, so we are left with creating an entirely new type to box this functionality.

: .info

To work around this limitation, I created the following type:

struct UnicodeScalarParsingBuffer {
    var generator: String.UnicodeScalarView.Generator
    var current: UnicodeScalar? = nil

    init(_ generator: String.UnicodeScalarView.Generator) {
        self.generator = generator
    }

    mutating func next() -> UnicodeScalar? {
        self.currentUnicodeScalar = generator.next()
        return self.currentUnicodeScalar
    }
}

I find this to be a deficiency in the current implementation of GeneratorType. While it may be the case that you are always working in the same scope, it is also necessary at times to pass this context around. Once you start doing that, you're going to need that current value, otherwise you need to pass both generator and current – no one really wants to do that.

The full source for the json-swift library can be found over on GitHub; the parsing code is here.

Generators Need a current value

Combining Characters

In my JSON Parsing post, I talked about an issue I was having with a particular character set:

let string = "\"\u{aaef}abcd"
countElements(string)           // 5
countElements(string.utf8)      // 8

Well, it turns out that \u{aaef} is a unicode combining character that modifies the character before it. There are some combining characters that create a single character, but there are also combining characters that still result in multiple visible characters, as seen above.

However, it seems there is a view into the string that gave me what I wanted:

let string = "\"\u{aaef}abcd"
countElements(string.unicodeScalars)     // 6

If we take a look at a few other examples, we can see that the unicodeScalars seems to give us the full make-up of the unicode values that are making up the string.

let single = "è"    // \u{e8}
for scalar in single.unicodeScalars {
    println("\(scalar) (\(scalar.value))")      // prints: è
}

let combined = "e\u{300}"
for scalar in combined.unicodeScalars {
    println("\(scalar) (\(scalar.value))")      // prints: e, `
}

Notice the difference in the two: the first is an single unicode value, the second is the letter "e" combined with the accent grave (`).

The only downside that I've run into with this approach is that it seems to be significantly slower the UT8-based approach I was using earlier.

Combining Characters

JSON Parsing

As part of my diving into Swift I've been using JSON as one of my learning projects: json-swift. Continuing on that tract, I took a look at what it would take to create a JSON parser using only Swift and no ObjC bridging. The results: ok, but lots of room for improvement.

One of the nice things about Swift is that it tries to abstract away all of the unicode information from you and create a nice, simple API for you to work with. Well, that's nice when it works, but there are cases that I ran into where it seemed to simply not be doing what I expected.

One such example:

let string = "\"Í´Øabcd"
countElements(string)           // 5
countElements(string.utf8)      // 8

// The raw bytes:
34      // "
234     // makes up "Í´Ø
171
175
97      // b
98      // c
99      // d
100     // e

In case that character isn't showing up, this is: unicde character, not sure the name of it.

I do not know enough about unicode so I don't know all of the ins and outs of why the " is attached to the unicode character (probably something to do with it being only three bytes), so some of you might be: duh!. That's ok. =)

: .info

That's not the worse part though, evident the value "Í´Ø is a single character, that is treated somewhat like a quote so you have to escape it, hence: \"Í´Øabcd.

I really wanted to try the String class out, so I had to use String.UTF8View as the mapping and compare each of the bytes and build up my strings manually using UnsafePointer and String.fromCString:

static func parseString(string: String.UTF8View, inout startAt index: String.UTF8View.Index, quote: UInt8) -> FailableOf<JSValue> {
    var bytes = [UInt8]()

    index = index.successor()
    for ; index != string.endIndex; index = index.successor() {
        let cu = string[index]
        if cu == quote {
            // Determine if the quote is being escaped or not...
            var count = 0
            for byte in reverse(bytes) {
                if byte == Token.Backslash.toRaw() { count++ }
                else { break }
            }

            if count % 2 == 0 {     // an even number means matched slashes, not an escape
                index = index.successor()

                bytes.append(0)
                let ptr = UnsafePointer<CChar>(bytes)
                return FailableOf(JSValue(JSBackingValue.JSString(String.fromCString(ptr)!)))
            }
            else {
                bytes.append(cu)
            }
        }
        else {
            bytes.append(cu)
        }
    }

    let info = [
        ErrorKeys.LocalizedDescription: ErrorCode.ParsingError.message,
        ErrorKeys.LocalizedFailureReason: "Error parsing JSON string."]
    return FailableOf(Error(code: ErrorCode.ParsingError, domain: JSValueErrorDomain, userInfo: info))
}

Since I do not know enough about unicode, I'm not sure if these are Swift bugs, limitations, or simply a lack in my own understanding.

Some of the problems I ran into:

  1. Unable to take a character from String and figure out the bytes that made it, so I have to use String.UTF8View.
  2. The String.UTF8View.Index is not Comparable, though it Equatable; I was doing index < endIndex initially.
  3. The String.UTF8View.Index is forward indexing only; originally, part of my algorithm would also step backwards through part of the string.
  4. Performance of my algorithm is about 3x slower than the NSJSONSerialization algorithm: 0.12s vs. 0.4s to parse a roughly 688KB file. I still need to investigate the memory usage.

I'm going to keep playing with it, especially as the later betas come out. I may try a purely functional based algorithm as well and see how that plays out in Swift.

JSON Parsing

Implicit Chaining and Context

If you’ve been following along, I’ve been struggling with the optional chaining syntax.

The Swift team has been fairly active in helping people with their troubles and confusion, and Chris Lattner responded to mine here (login required).

I won’t quote all of it (mainly because I don’t think I’m supposed to), but the gist of of it was this: if Swift supported implicit optional chaining, then it would be unclear which code was executed in the chain and which was not (this is my paraphrase, not Chris’ exact words).

An example:

No implicit chaining

my.delegate?.can().call()?.some(stuff())
foo.bar(baz())

With implicit chaining

my.delegate.can().call().some(stuff())
foo.bar(baz())

Chris’ argument is that with implicit chaining, the code above is ambiguous. In line #1, it’s explicit that there are many potential breaks along the code so it’s clear that everything to the right of a ? has the potential of not being called. For line #2, it’s clear there no breaks. However, with implicit chaining, lines #1 and lines #2 do not carry that information and it’s unclear if stuff() or baz() are ever called.

I understand his point; afterall, it is clear (in any editing/reading environment) that there are multiple failure (or maybe more accurately, short-circuiting) points along the way when we use ?. But, it still didn’t sit right with me.

I was asking myself why… I think I know why it didn’t sit well: I think the example is void of any and all context that could potentially already answer that question for us.

For example, if we know that Swift supports implicit optional chaining, then we already know that member lookups can potentially fail. Now, you might argue that I’ve simply made every member lookup ambiguous. But I don’t think that is the case either.

You see, the code samples above are all taken out of context. Code doesn’t live in isolation, but it participates in the context around it.

Here’s a snippet of code from a JSON tests:

func testValidateSingleValueNumberUsagePatternOptionalChaining() {
    var json: JSValue = 123

    let value = json.number?.distanceTo(100) ?? 0
    XCTAssertEqual(value, -23)
}

If I had written:

func testValidateSingleValueNumberUsagePatternOptionalChaining() {
    var json: JSValue = 123

    let value = json.number.distanceTo(100) ?? 0
    XCTAssertEqual(value, -23)
}

Or even:

func testValidateSingleValueNumberUsagePatternOptionalChaining() {
    var json: JSValue = 123

    let value = json.number.distanceTo(distanceToMoon()) ?? 0
    XCTAssertEqual(value, distanceToMoon() - 123)
}

The code carries no loss in value. Even without that the aid of coding tools, the ?? operator already tells me I’m working with a left-hand side that is Optional<T> and a righ-hand side this of type T. Context is also how I know that the type of value is Double (not an Int like you might be expecting) because JSValue.number holds a Double?.

I think it’s easy to take a code snippet, or worse, make up a line of code that also has no intrinsic meaning, and make a good case for why it has the potential to cause ambiguity. My argument is not that it’s not possible write ambiguous code with implicit chaining, but rather, that the context of your code ensures that the ambiguity is seldom, if ever, there.

In the end, I think it simply comes down to this:

Writing code is like writing in any language – often times we have constructs and words that look the same but are different, such as “I read a book yesterday” and “I read every morning”. The usage of “read”, when taking out of the context of its environment carries too little information to reveal its full meaning. However, once we provide the surrounding context, the meaning is made explicitly clear.

As a corrolary, we shouldn’t also then say that because I can write, “I read in the morning”, that the usage of “read” should not be allowed because it creates ambiguity: do I read every morning, or did I read this morning. Rather, we say that the sentence is ambiguous when taking out of its full context and we should add more context, or, if the statement is meant to stand on its own, make the disambiguate the meaning.

I think pragmatically, the use of Optional<T> is the same: context reveals the explicit meaning of the code.

Implicit Chaining and Context

The Case for Implicit Optional Chaining

Update #2, Chris Lattner has stated this isn't a good idea because it can lead to ambiguity in code that may have side effects, such as foo.bar(baz()). The question is, does baz() get called or not. With foo?.bar(baz()) it is clear that there is a potential for baz() to not get called.

I understand what he is saying, though, I'm not sure I'm in agreement yet. Anyhow, it's nice to have this dialoge with the fine folks on the Swift team.

: .info

I've been wrestling with the verbosity of Optional Chaining for quite some time now. It started with my first JSON Parsing article and really hasn't gone away since then. Much of the work there was done to explicity hide the fact that a lookup could fail. Why do we need that? The language already has a construct to help us with this: Optionals.

Here's the example code that's essentially in the Swift Programming Language guide linked above:

class Residence {
    var numberOfRooms: Int = 1
}

class Person {
    var residence: Residence?
}

var john = Person()
// john.residence = Residence()  // uncomment this to toggle with if-else branch you get

if let roomCount = john.residence?.numberOfRooms {
    println("John's residence has \(roomCount) room(s).")
} else {
    println("Unable to retrieve the number of rooms.")
}

The above, the lone ? isn't that big of a deal. Though, I do think it's superfluous.

Let's get the cat out the bag: Swift is all about type inference. We can seldom look at code out of context and reason the type that is store in any given variable. For instance, what is the type for numberOfRooms? It's like an Int, but it could be Int?, Double, a UInt, maybe an Int8, or whatever.

There is only way to know: look at the definition.

A handy way to do that: ‚å• + Click

Showing type interference for 'let roomCount' is Int?

However, I'm also going to assert this: we do know the type while we are authoring the code. We know because we are going to use it and we need to know. This is subtley different from above but just as important. The context we are in gives us clarity over what it is.

This is what I think the code should look like:

if let roomCount = john.residence.numberOfRooms {
    println("John's residence has \(roomCount) room(s).")
} else {
    println("Unable to retrieve the number of rooms.")
}

That's it, just remove the ?. Wait, doesn't that add confusion? How did numberOfRooms end up as an Int? requiring us to use if-let? Well, we know because at the time of authoring, we knew that residence is backed by Residence?. It doesn't matter when we come to modify the code how roomCount became an Int?, it only matters that it is one and you need to work with it as one.

The only time you will care why roomCount became an Int? is when you need to modify that variable itself, but then you are going to need to understand the entire chain john.residence.numberOfRooms at that point anyway. We've actually lost very little here and gained clarity in syntax.

Let's change the example above to contain a little richer data:

class RoomInfo {
    var description: String = ""
    var width: Double = 0
    var depth: Double = 0
}

class Residence {
    var rooms: [RoomInfo]? = nil
}

class Person {
    var residence: Residence?
}

var john = Person()

if let roomCount = john.residence?.rooms?.count {
    println("John's residence has \(roomCount) room(s).")
} else {
    println("Unable to retrieve the number of rooms.")
}

This is where things are string to get really ugly for me. Why are there two ?? Yes, I know that both residence and rooms are an Optional<T>, but I don't really care. The purpose of the code is this: retrieve the number of rooms at John's residence. The ? do not help me get there. Further, I'll add that if you really care about the ?, you should be checking for which of the items, residence or rooms is nil, but the simple truth is this: we do not actually care; we only care about the end result in the chain.

So, instead, simply make the code what we want:

if let roomCount = john.residence.rooms.count {
    println("John's residence has \(roomCount) room(s).")
} else {
    println("Unable to retrieve the number of rooms.")
}

The code above tells me one simple truth: when I get a roomCount, it's going to be an Optional<T>, because something down the chain failed; I do not care what failed, just that it failed and I'm going to handle that.

The code with the ?. tells me something different. It tells me that on your way to roomCount the members residence or rooms could have returned nil. I can get that info from the type declaration too, if I really wanted it. Don't make me repeat myself, especially when I'm doing so and it is not even important enough for me to do something about.

Update: August 13th, 2014

: .info

There was an interesting question asked about Optional<T> methods and extensions that collide with the type T. Here's an example (current Swift syntax; no implicit optional chaining):

extension Optional: Printable {
    public var description: String {
        return "a message for you"
    }
}

// Add the extension for String:
extension String: Printable {
    public var description: String {
        return "printable: string"
    }
}

let string: String? = nil
let s1 = string.description     // s1: String
let s2 = string?.description    // s2: String?

Today that is not ambiguous, we know which to call. With implicit chaining:

let string: String? = nil
let s1 = string.description

So, do we call the description on String or on Optional<T>?

I'm going to assert the following:

  1. We want to optimize for the non-Optional<T>; after all, that's why I want implicit chaining.
  2. The Optional<T> is already a special construct in the language and I'm ok with adding more rules in certain cases for when we need to deal with it directly as I think those cases are the exception, not the rule.

This means that I want the String.description version called.

If we really want the Optional<T> version, we have a way to do so:

let s1 = (string as Optional<String>).description

In the rare case where there is a collision, the compiler could do the following:

  1. Create an error or warning (I'd prefer a warning) that this is ambiguity, and
  2. Provide two "fix-it" options for the two valid cases for you to disambiguate, or
  3. Do nothing… I tend to think this is actually the right answer.

We could also (as mentioned by Wallacy Freitas on the devforums) simply invert the ? usage to treat it as explicitly working with the optional:

let s1 = string?.description

Again, I think this is simply an explicit example of the rare case. With the ? today, we need to always defend against this possibility. I would rather optimize for the normal case and provide a way around the corner case.

The Case for Implicit Optional Chaining

Fixed Enum Layout Sizes

WARNING: Turns out there are some nasty side-effects, see the update below.

: .warning

If you have every attempted to create a generic enum, you have most certainly encountered the following error message:

error: unimplemented IR generation feature non-fixed multi-payload enum layout

: .warning

Yikes! Well, it turns out that Swift (whether by-design or a current limitation) needs to know the full layout size of the enum at compile time. So code like this:

enum FailableOf<T> {
  case Success(T)
  case Failure(Error)

  init(_ value: T) {
    self = .Success(value)
  }

  init(_ error: Error) {
    self = .Failure(error)
  }

  var failed: Bool { /* ... */ }
  var error: Error? { /* ... */ }

  var value: T? {
    switch self {
    case .Success(let value):
      return value

    default:
      return nil
    }
  }
}

Is just not possible to write. I had worked around this by using a wrapper class FailableWrapper<T> (see Error Handling in Swift).

However, Rob Napier tweeted a much more elegant solution.

//platform.twitter.com/widgets.js
When I saw that, I thought, "well duh!". So much better than my workaround.

Here's the source for my full FailableOf<T> class; no more need for the wrapper class.

public enum FailableOf<T> {
    case Success(@autoclosure() -> T)
    case Failure(Error)

    public init(_ value: T) {
        self = .Success(value)
    }

    public init(_ error: Error) {
        self = .Failure(error)
    }

    public var failed: Bool {
        switch self {
        case .Failure(let error):
            return true

        default:
            return false
        }
    }

    public var error: Error? {
        switch self {
        case .Failure(let error):
            return error

        default:
            return nil
        }
    }

    public var value: T? {
        switch self {
        case .Success(let value):
            return value()

        default:
            return nil
        }
    }
}

UPDATE August 6th: Thanks for John Vasileff for pointing out, what should have been, an obvious design flaw with this approach.

: .info

//platform.twitter.com/widgets.js
Yep… he's right. =)

Back to my wrapper class for now.

Fixed Enum Layout Sizes