The Problem with Enums

In a bit of back and forth on Twitter about tuples, structs, and enums, Chris wrote up a good overview of why you might use one or the other. Check it out here: Tuples, Structs and Enums. It's a good read.

Here's my dirty secret: I really dislike enums for exactly the reason they exist. The entire purpose of an enum is to solve problems like Chris mentions, "In the previous example, we've used String to use the currency, but what if we want to restrict our program to only currencies we know about?".

I emphasized the problem, well, my problem with enums: a complete lack of extensibility in the case options.

If you build a library that handles currencies and model them with enums, you have forever locked your users into using the currencies you have explicitly allowed in your enum definition.

Take Chris' example:

enum Currency {
   case EUR
   case USD
   case YEN
}

If you, as the consumer of the library, want to add CAD to the set of supported currencies, you are out of luck.

You can get the source code and make changes, sure. However, then you need to manage two different versions unless you can get that change pushed back. This is not always possible though.

Whenever you make an enum, ask yourself the question: do I need to constrain this a specific set of possibilities (Either is a good example), or do I need to constrain it to a certain class of possibilities. It might actually be more appropriate to use a interface instead.

An example of that would be the following:

private let usd = USD()
private let eur = EUR()
private let yen = YEN()

protocol CurrencyType {
    class var currency: Self { get }
    class var symbol: String { get }
}

final class USD : CurrencyType {
    class var currency: USD { return usd }
    class var symbol: String { return "$" }
}

final class EUR : CurrencyType {
    class var currency: EUR { return eur }
    class var symbol: String { return "€" }
}

final class YEN : CurrencyType {
    class var currency: YEN { return yen }
    class var symbol: String { return "¥" }
}

typealias Currency = (value: Int, currency: CurrencyType)     // or as a struct

let usd = Currency(5, USD.currency)

There are many other ways to model what I did above; this is just an illustrative example.

I really wish Swift would have solved the problem of case extensions for enums in Swift. It would help in these cases where we really do want a nice set of possible values, but we would also like to have the ability for others to extend those cases in their own code.

The Problem with Enums

Tuples Are The New Struct – Revisited

Yesterday I wrote about thinking about using tuples in place of your dumb data types. However, my example used a class of problem that was typically modeled using an Either<T,U> type. That, understandably, added some confusion that was not intended.

Today, let's instead take a look at an example that is hopefully a little less contentious: the Point type (just two-dimensional).

At a quick glance, I see the following as ways in which we might model the Point type as a tuple, struct, class, or even as an enum.

Here's some sample uses for each of them:

Tuple

typealias Point = (x: Int, y: Int)

let p = Point(2, 6)                   // Point(x: 2, y: 6) is also valid
println("(x, y) = (\(p.x), \(p.y))")

Struct

struct Point {
    var x: Int
    var y: Int
}

let p = Point(x: 2, y: 3)
println("(x, y) = (\(p.x), \(p.y))")

Class

class Point {
    var x: Int
    var y: Int

    init(x: Int, y: Int) {
        self.x = x
        self.y = y
    }
}

let p = Point(x: 2, y: 3)
println("(x, y) = (\(p.x), \(p.y))")

Enum

enum Point {
    case Components(Int, Int)
}

let p = Point.Components(2, 3)
switch p {
case let .Components(comp):
    println("(x, y) = (\(comp.0), \(comp.1))")
}

Now, each of the above approaches has positives and negatives to their approaches. However, to me, the tuple has all of the right behavior out of the box. The enum based approach is the most verbose and I'm unclear of any distinct advantage it has over both the tuple and struct/class options.

If you're anything like me, you might tend to write your code in stages:

  1. There's the initial prototyping and scaffolding to make sure your thoughts apply to code.
  2. Then the roughing in with types and better names.
  3. Finally we get to the flushed out public API surface.

Along the way there is a lot of back and forth between the stages. I tend to start with the least amount of code so that it's easier to through away. So, in flushing out my API, the Point might remain a tuple throughout all of the stages.

Also, by starting with a tuple, I have to specifically ask myself the question: do I really need to add this function or private data here? Is there a better way to model this?

Don't forget about the tuple. Of course, your mileage may vary.

Tuples Are The New Struct – Revisited

Tuples Are The New Struct

I published a "revised" look at this that uses a less controversial example as people were getting stuck on this example being an Either<T,U> (enum based) instead of looking at structs vs. tuples.

: .info

I've been playing around with using named tuples instead of structs for pure data types. One such use case was in returning errors from a function.

Let's say we a function foo and we want to return an Int or an Error?. There are lots of ways to model this: structs, enums, tuples, inout parameters, global error state, etc…

Of course, each has their positives and negatives. However, I want to look at what the difference of the struct and the tuple implementation looks like.

The function definition will look like this:

func foo() -> IntOrError {}

The struct would look like this (Error is my own custom error type, you could use NSError too):

struct IntOrError {
    let value: Int
    let error: Error?
}

The tuple would look like this:

typealias IntOrError = (value: Int, error: Error?)

Usage of the two looks exactly the same:

let result = foo()
if let error = result.error {
    println("Uh oh! An error occurred")
}
else {
    println("The value is: \(result.value)")
}

So should you use tuples? Well, I don't know. =) There are two big disadvantages of the tuple approach:

  1. No generics support; I cannot create: typealias ErrorOf<T> = (value: T, error: Error?). I consider this a deficiency in the generics system of Swift though.
  2. No ability to add functionality to the type itself; essentially no OOP-style programming. This also extends to specific initializers.

However, the I see some benefits for the tuple approach too:

  1. Much faster prototyping while maintaining good readability of code
  2. Currently (as of Beta 6), tuples as return types perform much better than structs. This should get better though. Update: I think the root cause of this was due to rdar://18111139: Swift: Optimizer Perf Bug with inline/external class definitions.
  3. "Upgrading" to a full-blown struct requires only updating the definition of your code (beware, this would be a breaking change for code linking your code though).

If all you need is a dumb data type, try out the named tuple!

Tuples Are The New Struct

Error Handling – Take Two

Make sure to see the update below for a bit for more information on the causes of memory usage.

In my seemingly never ending and not quite achievable goal of beating NSJSONSerialization in both performance and memory utilization for parsing a JSON string, I've come across another pearl of wisdom with regards to Swift: ignore my Error Handling in Swift piece and others that recommend using the Either<T,U> as in other languages (at least for the current version of Swift, as of Beta 6).

I have been able to get my parsing speed to within 0.01s of NSJSONSerialization; while my goal is domination, I also am pragmatic (at times). Next up was memory utilization. Unfortunately, I was (and still am), far behind the total memory usage of the ObjC version. So like a good little software engineer, I fired up Instruments and started investigating what I saw.

When you investigate memory usage, there are three primary concerns that we need to watch out for:

  1. Total amount of memory used over the life of the scenario
  2. Total amount of memory every actually in use at any given time
  3. Highest spike in memory used over the life of the scenario

Instruments visualizes this data pretty nicely for us:

screenshot of instruments with multiple memory profiles visualized in the editor

The picture above is showing the results of the NSJONSerialization code path. My implementation actually has a better "total persistent bytes" overall of 1.92MB vs. the 2.51MB shown above. However, the total memory used in mine was about 6.5MB while we see that NSJONSerialization only used about 4.7MB.

Taking a Dive

There are a couple of approaches we can take to tracking down and solving memory issues:

  1. Examine the code 2. Examine the profiles

Unfortunately, the profiles were not really helping me track down root cause of the issues, but were illustrative in helping me understand that I was creating many, many copies of objects all around the place.

Examining the Error type I first took a quick look over my code to see if I could see anything obvious. There was one thing I noticed right off the bat: FailableOf<T> stores an Error object in its Failure case. Well, the Error type is a struct with three values in it, and since I return a FailableOf<T> in all of my parsing calls, I'm going to need to return a copy of that Error, even if it's empty, all of the time.

Knowing that the Error object is going to be copied so many times throughout the call chain, we can instead mark the Error type as public final class.

When we do this, the total memory usage drops to 6.06MB.

The other option is to create a backing class to store all of the data: that class looks like this:

public struct Error {
    public typealias ErrorInfoDictionary = [String:String]

    class ErrorInfo {
        let code: Int
        let domain: String
        let userInfo: ErrorInfoDictionary?

        init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
            self.code = code
            self.domain = domain
            self.userInfo = userInfo
        }
    }

    var errorInfo: ErrorInfo

    public var code: Int { return errorInfo.code }
    public var domain: String { return errorInfo.domain }
    public var userInfo: ErrorInfoDictionary? { return errorInfo.userInfo }

    public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
        self.errorInfo = ErrorInfo(code: code, domain: domain, userInfo: userInfo)
    }
}

However, that seems to be a lot more complicated over simply do this:

public final class Error {
    public typealias ErrorInfoDictionary = [String:String]

    public let code: Int
    public let domain: String
    public let userInfo: ErrorInfoDictionary?

    public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
        self.code = code
        self.domain = domain
        self.userInfo = userInfo
    }
}

And since all my values are immutable to begin with, I'm not sure why I would chose the struct approach for this problem.

Investigating the FailableOf<T> Since I'm having copying issues with the Error (gist) type, it is only logical to look at the FailableOf<T> type next. Instead of using my JSON parser as the test ground, I decided to create a little sample app that would loop many times calling a function that returned the following types:

  • FailableOf<T> – my implementation of the Either<T, U> concept (gist)
  • Either<T, U> – a more generic solution to my FailableOf<T> problem (gist)
  • (T, Error) – a tuple that contains the two pieces of information

The sample program is straight forward:

func either<T>(value: T) -> Either<T, Error> {
    return Either(left: value)
}

// test: either
for var i = 0; i < 100_001; i++ {
    let r = either(i)
    if (r.right != nil) {
        println("error at \(i)")
    }
}

Each of the different constructs have the same form (gist).

This is where I found something interesting: both the FailableOf<T> and Either<T, U> take up about 3MB of memory, while the (T, Error) tests only take 17KB. Clearly, there has to be some missed compiler optimizations in Swift. Regardless, the tuple approach is clearly the one we should be taking, at least for now, if we really care about every ounce of memory.

In order to work with it better in my code, I create a typealias and use named tuples:

/// The type that represents the result of the parse.
public typealias JSParsingResult = (value: JSValue?, error: Error?)

After updating all of the JSON.parse code to return this new type, memory usage is down to 5.33MB!! Simply switching from a struct-based approach to this named tuple approach (which I think is just a good, frankly), I was able to shave off another 700KB of unnecessary memory creation.

I'm not done investigating other opportunities right now, but things are starting to look really promising here.

UPDATE After some more investigating, I realized why the enum case was causing such memory bloat: we need to box all of the types that get stored in them until Swift implements the proper generic support for an enum.

Error Handling – Take Two

Swift Proposal: protected

There has been much said about protected and how Swift needs, I mean, NEEDS, the "protected" keyword. In fact, there has been so much ruckus about it that the Swift team wrote a blog entry on it: Access Control and protected.

While I whole heartedly agree that the protected keyword is a terrible idea from an inheritance perspective, the intent of the notion has great value. I'm going to define the intent as this:

The ability to separate concerns of implementors and consumers.

: .callout

If we focus the definition, it's really not that hard to image how we can extend the existing public, internal, and private access modifiers that Swift already offers with a fourth option: protected.

I propose that we could enable the following:

  1. Introduce the protected keyword
  2. Modify the import rules to include a protected modifier

The rule for the protected keyword would be quite simple:

Protected access enables entities to be used within any source file from their defining module, and also in a source file from another module that imports the defining module with the protected modifier. You typically use protected access to specify the public interface for those wishing to extend the functionality of your types, but hiding that functionality from the consumers of your API.

: .callout

An example would be this:

Defined in module FooMod

public struct Foo {
    public func foo() {}
    protected func bar() {}

    public var fizzy: Int
    protected var fuzzy: Int
}

protected func MakeSuperFoo() -\> Foo {}

Then, in another module, you would have to use the following in order to gain access to the protected members.

import FooMod                  // Brings in all of the public members
import protected FooMod        // Brings in all of the protected members

let f: Foo = MakeSuperFoo()
f.foo()
f.bar()

I think this fits into the existing access control mechanism perfectly and provides a way to provide the high-level intent of what people are asking for with protected.

Swift Proposal: protected

The Reasoning Behind the Choices

Sharing code in public is interesting in many ways. Sometimes the choices we make about design are somewhat arbitrary as there are many options before us. Sometimes those choices are deliberate and methodical with a well reasoned approach on how you got there. Then there are those times where you just do something dumb…

If you’re going to be willing to share your code for the world to see, you really need to be OK with being wrong about something and learning from it. But you also need to know how to stick to your guns when you think you are doing things right. This is post is going to be a bit about both using my latest JSON parsing articles as illustrations: Generators Need a current Value and Improving Code with Generics.

The primary goal of the code that I wrote was to enable the ability to parse through a JSON string and create a JSON object representation from that string. However, in that article, I presented a much lower level view of the problem and framed it in such a way as to remove all of the context on why and how I reached that decision.

Wes Campaigne posted some great feedback over on GitHub about the approach I took to the problem.

I thought the whole

buffer.next()
while buffer.current != nil {
    if let unicode = buffer.current { // ... somewhere, buffer.next() is called

dance was kind of ugly: you’re dealing with the overhead of using a generator, but receiving none of the benefits it provides (e.g. for in loops). Also, using a struct for your BufferedGenerator seems odd — you end up using a class as a backing store anyway, and having it as a struct means using inout parameters all over the place. There’s a discussion on the dev forums that argues the case why GeneratorTypes should, in general, just be reference types.

Wes makes some great points, and his RewindableGenerator<S> is a very good class that solves the specific problem I was looking at better (both in terms of the applicability of the use cases and in how the code that consumes it should work).

The only real problem, which I forgot when I first looked at his solution, was that the performance difference between using the GeneratorType and the Index types for Strings is fairly significant, nearly a 2.5x slowdown.

When I was first solving this problem, I looked at the following approaches:

  1. String.Index based approach grabbing individual characters. This lead me to find out how String works with unicode combining characters.
  2. Then I tried using String.UTF8View.Index, after all, they are both indexes it should be a fairly easy change. Well… turns out that String.Index is a BidirectionalIndexType but String.UTF8View.Index is only a ForwardIndexType. At this point, I realized that I basically needed to re-write a significant portion of my algorithm. I did so making sure that all of my previous() calls were updated; this also required some fairly ugly hacks to get everything to work. Then I found out two new things after more investigation in the topic:
    1. Performance of the GeneratorType construct was significantly faster than the Index based construct.
    2. There is a better view into the string String.UnicodeScalarView. With the String.UTF8View, I had to create strings by passing a pointer to an UInt8 array that I had to keep track of while parsing the string. It was fairly ugly, but it worked. =)

Both of these lead me to the realization that another parser re-write was coming… however, this time, I knew I needed to use GeneratorType and I knew that I wanted to get rid of a lot of the hacks I did. This was the start of the Generators Need a current Value and Improving Code with Generics posts.

Well, I was able to get rid of some of my hacks, but then Wes’ comments came. I already wasn’t very pleased with the implementation of the JSON parser as it still had some hacks in it and some somewhat cryptic logic, but hey, it worked! But as I thought about Wes’ comments some more, I knew there was a better way.

So I started integrating Wes’ solution into my parsing code. But, I had already forgotten a lesson I had learned earlier: Index based approaches suck at perf, big time!

At this point, I had already re-written the parsing to provide some significantly better error messages (thanks in-part to using for (idx, scalar) in enumerate(generator) {} that was now possible due to Wes’ updates) and a much cleaner logic flow. However, I wanted to get my performance back down.

That’s when I came up with this class: ReplayableGenerator

final public class ReplayableGenerator<S: SequenceType> : GeneratorType, SequenceType {
    typealias Sequence = S

    private var firstRun = true
    private var usePrevious = false
    private var previousElement: Sequence.Generator.Element? = nil
    private var generator: Sequence.Generator

    public init(_ sequence: Sequence) {
        self.generator = sequence.generate()
    }

    public func next() -> Sequence.Generator.Element? {
        switch usePrevious {
        case true:
            usePrevious = false
            return previousElement

        default:
            previousElement = generator.next()
            return previousElement
        }
    }

    public func replay() {
        usePrevious = true
        return
    }

    public func generate() -> ReplayableGenerator {
        switch firstRun {
        case true:
            firstRun = false
            return self

        default:
            self.replay()
            return self
        }
    }

    public func atEnd() -> Bool {
        let element = next()
        replay()

        return element == nil
    }
}

I’ve been experimenting with using switch-statements over if-statements; I’m greatly likely their readability in many cases. However, there does seem to be a bug where case true and case false do not create an exhaustive list, so I use default.

: .info

These were the constraints:

  1. Index based iterators and lookups are significantly slower than GeneratorType and for-loop; they cannot be used.
  2. The GeneratorType is only a forward-moving iterator.
  3. There is no ability to inspect the previous character in the construct. This is vital because when we parse values, often times we need to inspect the next value to determine if we stop parsing the current value. However, once we do this, we are in a bit of a situation as the parser really needs to start parsing from that previous character because it’s going to call next() and skip over the just visited character. Bad mojo.

This class provided everything I needed, while the semantics of it also allowed me to create a much better parse(). The integration was also easy as I simply needed to replace the previous() calls with a replay() call.

With this implementation, I was able to get my performance back down to 0.25s vs. 0.17s (JSON.parse vs. NSJSONSerialization).

Remember, often times people are able to look at a problem have been working on and shed new light on the situation. While Wes’ solution was not applicable to my situation, his thought process on why his implementation better was superbly helpful in rethinking the semantics of what I was doing. Ultimately, I’m fairly happy with the results of the parser now… except for that perf! =)

So thanks Wes for helping me think about the problem better. Oh, and you can judge my parsing code here: JSValue.Parsing.

The Reasoning Behind the Choices

Improving Code with Generics

Update: I updated the post to make use of S: SequenceType instead of T: GeneratorType; it's a cleaner API.

: .info

Yesterday, I wrote about how we needed to build the following class:

struct UnicodeScalarParsingBuffer {
    var generator: String.UnicodeScalarView.Generator
    var current: UnicodeScalar? = nil

    init(_ generator: String.UnicodeScalarView.Generator) {
        self.generator = generator
    }

    mutating func next() -> UnicodeScalar? {
        self.currentUnicodeScalar = generator.next()
        return self.currentUnicodeScalar
    }
}

When we look at the code above, we can observe a few things:

  1. The code is tightly coupled to String.UnicodeScalarView.Generator
  2. The code is tightly coupled to UnicodeScalar
  3. The code loosely conforms to GeneratorType

We can make this code better and more suitable for other instances of GeneratorType; or to put it another way, generic.

Let's start from bullet #3; we should be conforming to the GeneratorType protocol because this really is simply another type of generator.

The definition starts to take shape like this:

struct BufferedGenerator : GeneratorType {
    var generator: GeneratorType
    mutating func next() -> UnicodeScalar?
}

Bullets #1 and #2 are aspects of the same coin as Generator and Generator.Element are really defined from the same construct.

The interface now looks more like this:

struct BufferedGenerator<S: SequenceType> : GeneratorType {
    typealias Sequence = S

    var generator: Sequence.Generator
    var current: Sequence.Generator.Element? = nil

    init(_ sequence: Sequence) {
        self.generator = sequence.generate()
    }

    mutating func next() -> Sequence.Generator.Element? {
        self.current = generator.next()
        return self.current
    }
}

This implementation now let's us use any type of SequenceType as a BufferedGenerator.

We use SequenceType as the generic constraint instead of GeneratorType because it creates a better ownership model for the underlying generator. The call to next() should only be done from a single generator; this code puts that burden on BufferedGenerator<S> instead of the caller.

: .info

Generics can be a great way to reduce type information that simply doesn't need to be there. In this case, there was no reason that the original UnicodeScalarParsingBuffer needed to be tied to a specific type. Generics can also help greatly in code reuse, which is almost always a good thing.

The full source for the json-swift library can be found over on GitHub.

Improving Code with Generics