The Optimization Game

I guess I'll play this game for a bit… my response to Josephs' response

(Overcoming Swift Resistance Explored) to my blog article about debug build performance, especially regarding the performance of arrays.

First I want to acknowledge that Swift Debug builds are hideously slow.

We should be able to stop right there. We both agree that non-optimized builds are hideously slow. But…

I still don't think it is an absolute block on any real development because

there are fairly simple workarounds.

Maybe, just maybe, we have different "real" development that we're doing. Could there possibly be a significant difference between the types of apps we are attempting to build?

The answer is obviously: yes.

I'm trying to prototype a game layer that must run at the minimum of 30Hz. Most apps are not real-time systems; every millisecond counts in a game. With a handful of milliseconds you could have better particle effects, better game AI, or more realistic physics. You only get 0.033s (33ms) in each frame

(@30Hz) to do your work. The more frames you have to use to compute things, the worse overall experience your players are going to have.

The other problem with this statement (and the solution presented) is that it assumes that we can optimize the slow parts of the game by putting the code into a framework. Ok… besides the annoying mess that is to pull stuff in and out of there, and completely ignoring that this is a trivial example of one algorithm that would need to be moved, it completely misses the fact that it's that very slow part that I need to debug because something strange is going on. If my debug build is now running at 2 or 3 Hz, it becomes extremely difficult

(impossible really) to actually play to game to get the error that is to be debugged.

So while it is most definitely not that huge of a deal for most app developers, it most certainly is to any one processing a large amount of data or building a real-time system. It creates a tremendous developer workflow problem.

These are the developers I'm talking to. These are the ones the blog post is for. Do not use arrays (as of Xcode 6.1.1) because they will attempt to suck your soul from your very body during development if you process a lot of data from them in tight loops.

If you are not iterating over a lot of data frequently, don't worry too much about it. But if you're building a SpriteKit game… don't use an array to store all of your elements that you'll be iterating over because it's going to suck if you have a bunch of them.

Again, my focus is on developer workflow. That is extremely important to me. Every slow down I hit is an impediment to me and solving my code problems.

Swift is Fast!

We get it, you think Swift is fast enough. Great!

So, I downloaded your code and ran the both the ObjC and Swift versions (CMD+R within Xcode).

-Onone: [Pixel] avg time: 0.0175069s, stddev: 0.000430218s, diff: 82%
-O:     [Pixel] avg time: 0.0177036s, stddev: 0.000400837s, diff: 11%

Ironically, I kept getting the -Onone build to be slightly faster, but there is some variance here, so all good. 82% faster than the ObjC debug version and 11% faster than the ObjC release build. Great! The workarounds suck, but ok.

There's one major oversight in these numbers though: the ObjC version is nowhere near the optimal code. We can help the compiler out in a very trivial way by changing the ObjC1 render function to this:

void RenderGradient(RenderBufferRef buffer, int offsetX, int offsetY)
{
    int width = buffer->width + offsetX;
    int height = buffer->height + offsetY;
    uint32_t *pixel = (uint32_t *)&buffer->pixels[0];

    for (int y = offsetY; y < height; ++y) {
        for (int x = offsetX; x < width; ++x) {
            *pixel = 0xFF << 24 | (x & 0xFF) << 16 | (y & 0xFF) << 8;
            ++pixel;
        }
    }
}

That's just taking 5 minutes to break out the common operations. Maybe more can be done, but I wouldn't want to go further than that without understanding if there was really a need because the code above is still clear and readable.

The timings after:

-Onone: [Pixel] avg time: 0.0178055s, stddev: 0.000500147s, diff: 53%
-O:     [Pixel] avg time: 0.0184924s, stddev: 0.00152481s, diff: -94%

The debug speed is good. You beat me there. Of course, yours is the optimized version. Fortunately, I can do the same hack you did, but much easier since compiler flags per source file is supported for ObjC.

So, pulling RenderGradient out and telling the compiler to always compile with the -Os flag gives us this result instead:

-Onone: [Pixel] avg time: 0.018294s, stddev: 0.00158887s, diff: -90%
-O:     [Pixel] avg time: 0.0180182s, stddev: 0.00145248s, diff: -89%

So no, Swift is not faster than ObjC if you take the time to optimize the code path as you did with the Swift version. In fact, Swift is now twice as slow as the more optimized version of the ObjC code.

Pull request to see what I did is here: https://github.com/josephlord/OvercomingSwiftResistance/pull/1.

  1. I keep calling it ObjC code because it's in a .m file, but this is just C.
The Optimization Game

Swift Resistance Explored

Evidently I struck a nerve with some people on this one, while others simply missed the entire point of the blog post. Let's revisit Swift Resistance and dig deeper into the problem.

I'm going to put my claim right up here so that it will not be missed:

Claim: The performance of DEBUG (that is, -Onone optimization) builds of Swift can have vastly different performance characteristics depending on the type of Swift intrinsics and foundational types that are being used. Some of these choices can lead down a path where your DEBUG builds are all but useless to use during development.

I'll talk about the implications of this towards the end.

Goals

Problem Statement: Design an algorithm that fills in a buffer of pixel data with a gradient starting with green in the bottom right corner and turning into blue in the top left corner. RGB colors should be used with values in the range [0, 255].

This algorithm must be written in Swift. In addition to that, it is meant to be used in a game loop with a desired framerate of 30 FPS for the software rendered algorithm at a resolution of 960×540 (this is 1/8th the desired rate of the hardware accelerated algorithm at 1920×1080@60Hz).

Additional Information: An existing algorithm already exists in an ObjC program; we will take the algorithm from there and use that as a baseline for both performance and functionality. When rendered to screen, your image should look something like the image below:

A picture of a window with green-to-blue gradient squares.

In addition, the data must be processed sequentially; parallelization of this algorithm is not allowed.

Strategy

Given that an existing algorithm exists, the first attempt at such an algorithm should be a straight port ignoring the language features. After that algorithm is working, it would be good to explore algorithms that may be more natural to the language.

So the two approaches that will be used are:

  1. Use an UnsafeMutablePointer to create a buffer of memory that will be

manipulated. 2. Use an Array to act as the buffer so that we are not using "unsafe" Swift

code.

ObjC Baseline

Here is the algorithm for the ObjC version of the code:

#import <Foundation/Foundation.h>

typedef struct {
    uint8_t red;
    uint8_t blue;
    uint8_t green;
    uint8_t alpha;
} Pixel;

typedef struct {
    Pixel *pixels;
    int width;
    int height;
} RenderBuffer, *RenderBufferRef;

RenderBufferRef RenderBufferCreate(int width, int height)
{
    assert(width > 0);
    assert(height > 0);

    RenderBufferRef buffer = malloc(sizeof(RenderBuffer));
    assert(buffer);

    buffer->pixels = malloc(width * height * sizeof(Pixel));
    assert(buffer->pixels);

    buffer->width = width;
    buffer->height = height;

    return buffer;
}

void RenderBufferRelease(RenderBufferRef buffer)
{
    if (buffer->pixels) {
        free(buffer->pixels);
    }

    buffer->width = 0;
    buffer->height = 0;
}

void RenderGradient(RenderBufferRef buffer, int offsetX, int offsetY)
{
    int offset = 0;
    for (int y = 0, height = buffer->height; y < height; ++y) {
        for (int x = 0, width = buffer->width; x < width; ++x) {
            Pixel pixel = { 0, y + offsetY, x + offsetX, 0xFF };
            buffer->pixels[offset] = pixel;
            ++offset;
        }
    }
}

int main(int argc, const char * argv[]) {
    uint64_t start = mach_absolute_time();

    RenderBufferRef buffer = RenderBufferCreate(960, 540);

    const int NUMER_OF_ITERATIONS = 30;
    for (int i = 0; i < NUMER_OF_ITERATIONS; ++i) {
        RenderGradient(buffer, i, i * 2);
    }

    RenderBufferRelease(buffer);

    uint64_t elapsed = mach_absolute_time() - start;
    printf("elapsed time: %fs\n", (float)elapsed / NSEC_PER_SEC);

    return 0;
}

This code was compiled as a command-line tool under two different optimization flags: -O0 and -Os. These are the default "debug" and "release" configs.

  • The timing output for -O0 (debug) was: 0.099769s
  • The timing output for -Os (release) was: 0.020427s

Both of these timings fall well within the target goal of 30Hz1.

Swift Implementations

We already know that we are going to need to have multiple algorithms for the Swift version, so it's important to setup out test harness in a reusable way. One of the things to note about Swift is that while types and functions can be private to a file, they still collide in name usage. This means that each test implementation we write will need to be wrapped in a function.

The first thing we'll do is create a command-line tool in Swift, create the two schemes (for debug and release) to make timing easier.

Ok, let's start with what our test rig will look like:

import Foundation

let NUMBER_OF_ITERATIONS = 30

#if DEBUG
let BASELINE: Float = 0.099769
#else
let BASELINE: Float = 0.020427
#endif

func timing(samples: Int, iterations: Int, fn: (Int) -> Float) -> (avg: Float, stddev: Float, diff: Int) {
    var timings = [Float](count: samples, repeatedValue: 0.0)
    for s in 0..<samples {
        timings[s] = fn(iterations)
    }

    let avg = reduce(timings, 0.0, +) / Float(samples)

    let sums = reduce(timings, 0.0) { sum, x in ((x - avg) * (x - avg)) + sum }
    let stddev = sqrt(sums / Float(timings.count - 1))
    let diff = Int(((BASELINE - avg) / BASELINE * 100.0) + 0.5)
    return (avg, stddev, diff)
}

println("Swift Rendering Tests: \(NUMBER_OF_ITERATIONS) iterations per test")
println("---------------------")

There is a simple timing function that allows us to capture the average time across some number of samples and some iterations of the rendering function.

NOTE: You'll need to add a custom build flag (-D DEBUG) to your Swift compiler options so the #if DEBUG will match.

The Unsafe Swift Approach

The naïve approach is to simply copy and paste the ObjC code into main.swift. Then simply make the necessary updates get it to compile.

The result is this (remember, we need to wrap everything in a closure so that the names do not collide as we add more tests that may want to use the same name for a struct but lay it out a little differently):

import Foundation

func unsafeMutablePointerTest(iterations: Int) -> Float {
    struct Pixel {
        var red: Byte
        var green: Byte
        var blue: Byte
        var alpha: Byte
    }

    struct RenderBuffer {
        var pixels: UnsafeMutablePointer<Pixel>
        var width: Int
        var height: Int

        init(width: Int, height: Int) {
            assert(width > 0)
            assert(height > 0)

            pixels = UnsafeMutablePointer.alloc(width * height * sizeof(Pixel))

            self.width = width
            self.height = height
        }

        mutating func release() {
            pixels.dealloc(width * height * sizeof(Pixel))
            width = 0
            height = 0
        }
    }

    func RenderGradient(var buffer: RenderBuffer, offsetX: Int, offsetY: Int)
    {
        var offset = 0
        for (var y = 0, height = buffer.height; y < height; ++y) {
            for (var x = 0, width = buffer.width; x < width; ++x) {
                let pixel = Pixel(
                    red: 0,
                    green: Byte((y + offsetY) & 0xFF),
                    blue: Byte((x + offsetX) & 0xFF),
                    alpha: 0xFF)
                buffer.pixels[offset] = pixel;
                ++offset;
            }
        }
    }

    let start = mach_absolute_time()

    var buffer = RenderBuffer(width: 960, height: 540)

    for (var i = 0; i < iterations; ++i) {
        RenderGradient(buffer, i, i * 2);
    }

    buffer.release()

    return Float(mach_absolute_time() - start) / Float(NSEC_PER_SEC)
}

Here are the timings:

  • DEBUG: avg time: avg time: 0.186799s, stddev: 0.0146862s, diff: -86%
  • RELEASE: avg time: 0.0223397s, stddev: 0.00101094s, diff: -8%

The timing code is here (add this to main.swift):

let timing1 = timing(10, NUMBER_OF_ITERATIONS) { n in unsafeMutablePointerTest(n) }
println("UnsafeMutablePointer<Pixel> avg time: \(timing1.avg)s, stddev: \(timing1.stddev)s, diff: \(timing1.diff)%")

This is not looking too bad; both are well within our target rate in both configurations. However, we can see that both of these implementations are slower than the ObjC version.

Takeaway: While this implementation is slower than the ObjC version, there is nothing blocking us at this time from being able to maintain a solid 30Hz in both debug and release builds. This is great news.

The "Safe" Swift Approach

UPDATE: I made a pretty obvious (well, easy to overlook, but still should

have been obvious) and significant error in this section… of course that would

happen in a post I try to better show the issues. I used var instead of

inout on the buffer… which, of course, creates a copy of the buffer array

each time… yeah, it's was that bad. Ironically, it didn't affect the debug

performance, but it did help the release build.

The func RenderGradient(var buffer: RenderBuffer, offsetX: Int, offsetY: Int)

should have been defined as: func RenderGradient(inout buffer: RenderBuffer, offsetX: Int, offsetY: Int).

The original implementation created a copy of the buffer each time which left

us with an empty buffer outside of the function call. Not what we wanted.

Again, this mistake only had two repercussions: incorrect functionality and

a 2x regression on the release build. The debug build is still as

painfully slow.

Let me know if you spot any other mistakes.

Now, there is another way that Swift allows us to access a region of contiguous memory: arrays. After all, that is really what the semantics are. So let's give it a try:

import Foundation

func pixelArrayTest(iterations: Int) -> Float {
    struct Pixel {
        var red: Byte
        var green: Byte
        var blue: Byte
        var alpha: Byte
    }

    struct RenderBuffer {
        var pixels: [Pixel]
        var width: Int
        var height: Int

        init(width: Int, height: Int) {
            assert(width > 0)
            assert(height > 0)

            let pixel = Pixel(red: 0, green: 0, blue: 0, alpha: 0xFF)
            pixels = [Pixel](count: width * height, repeatedValue: pixel)

            self.width = width
            self.height = height
        }
    }

    func RenderGradient(inout buffer: RenderBuffer, offsetX: Int, offsetY: Int)
    {
        var offset = 0
        for (var y = 0, height = buffer.height; y < height; ++y) {
            for (var x = 0, width = buffer.width; x < width; ++x) {
                let pixel = Pixel(
                    red: 0,
                    green: Byte((y + offsetY) & 0xFF),
                    blue: Byte((x + offsetX) & 0xFF),
                    alpha: 0xFF)
                buffer.pixels[offset] = pixel;
                ++offset;
            }
        }
    }

    let start = mach_absolute_time()

    var buffer = RenderBuffer(width: 960, height: 540)

    for (var i = 0; i < iterations; ++i) {
        RenderGradient(&buffer, i, i * 2);
    }

    return Float(mach_absolute_time() - start) / Float(NSEC_PER_SEC)
}

The nice thing about this change is that it was super easy to do; it was really only a handful of changes. This method has the benefit that I should not ever leak the buffer because I forgot to call dealloc on the UnsafeMutablePointer value.

Let's check out the timings:

  • DEBUG: avg time: 27.9754s, stddev: 0.0333994s, diff: -27939%
  • RELEASE: avg time: 0.0287606s, stddev: 0.00180078s, diff: -40%

What on earth just happened… 27.9s seconds to compute that loop above 30 times… this loop, right here:

int offset = 0;
for (int y = 0, height = buffer->height; y < height; ++y) {
    for (int x = 0, width = buffer->width; x < width; ++x) {
        Pixel pixel = { 0, y + offsetY, x + offsetX, 0xFF };
        buffer->pixels[offset] = pixel;
        ++offset;
    }
}

That's 960 * 540 (518,400) iterations. This is unacceptable. This is what my previous blog post was entirely about. There is not a SINGLE argument that you can make where you can justify this performance characteristic. Not one.

Now, had this performance been like 300% slower, I might have been able to take that and the safety justifications… maybe. At least if it was only 300% slower I'd still be in a spot where I could run my game at 30Hz with reasonable head room left over for other logic to run.

But no… we are talking about this loop taking 1 entire SECOND to compute. It was nearly 28,000% slower…

Here's a screenshot of the profile with the NUMBER_OF_ITERATIONS dropped down to 2 (there's no way I was going to sit through another full 10 samples of 30 iterations).

A screenshot of the Instruments 'profile of death' for this monstronsity.

Ok… now, I'm left with a few of choices:

  1. Say screw it and leave Swift on the table, convulsing from the seizure this

basic loop setting a value in an array just caused it. 2. Go back to the UnsafeMutablePointer method that was back in the land of all

things sane was in, but then I get to risk all of my consumers forgetting to

call release(). But you know… we're all adults here (in spirit at least),

we should be able to handle our own memory. And really, if I'm going to need

to resort to naked pointers, I might as well stick with C, yeah? 3. I can create another wrapper around the array so that I can use a backing

array to keep track of the memory for me, but expose an unsafe pointer to

that array. 4. Say screw it with the non-optimized builds and live in the land of crap(ier)

debugging which completely breaks the logic flow of your algorithms. Yes,

there is a time for this land, but that time is not the in the beginning of

your project when you are still prototyping, scaffolding, and shaping your

program into what it will one day be.

Of these options, #3 is the worst choice, in my opinion. This is a choice where you are explicitly said you want a "safe" array, but, in this part of the code, I'm just going to go hog wild. It exists for a reason; I'm guessing that one of those reasons is because performance sucks.

Here's the code update for that option:

buffer.pixels.withUnsafeMutableBufferPointer { (inout p: UnsafeMutableBufferPointer<Pixel>) -> () in
    var offset = 0
    for (var y = 0, height = buffer.height; y < height; ++y) {
        for (var x = 0, width = buffer.width; x < width; ++x) {
            let pixel = Pixel(
                red: 0,
                green: Byte((y + offsetY) & 0xFF),
                blue: Byte((x + offsetX) & 0xFF),
                alpha: 0xFF)
            p[offset] = pixel
            ++offset;
        }
    }
}

Let's check out the timings:

  • DEBUG: avg time: 1.18535s, stddev: 0.00684964s, diff: -1087%
  • RELEASE: avg time: 0.0402743s, stddev: 0.0018447s, diff: -96%

Ok… at least I cannot literally go the bathroom, get a drink from the kitchen, come back to my computer and wait for another few minutes while all the iterations are going. However, it's still too slow in DEBUG mode and it's still twice as slow as the ObjC version in RELEASE mode.

Conclusion

OK, so let's be explicitly clear here: this post is not about how Swift is horrendously slow in builds you'll be giving your customers. No, it's about how terribly slow and painful your life as a developer will be trying to write any amount of Swift code that works on any reasonable amount of data inside of arrays in Swift. However, you can see that none of the Swift options are faster or even as fast as the C version. And frankly, none of them are really that much clearer… but that's a different topic.

And don't tell me to open another Swift bug; I have. I have opened many Swift bugs since last WWDC. This post is a way for me to better reflect the current state of Swift to others. It's a way to let the people at Apple see the real impact developers like myself are feeling when trying to even the most basic of things in Swift, and so that when people do run into these issues, they won't have to bang their heads against the wall trying to figure out to solve the problem.

Here is the source for both the Swift and the ObjC versions: SwiftResistance.zip. I release it all in the public domain; do whatever you want with it.

  1. Hz, or hertz, is simply a measurement of cycles per second.
Swift Resistance Explored

Swift Resistance

I feel like every time I sit down to actually start using Swift, I'm constantly battling with the language. My latest challenge: creating a backing buffer of pixel data that will be rendered to the screen.

This really shouldn't be a difficult thing to do, so hopefully someone can point out the obvious thing I'm doing wrong in Swift so that my performance is so terrible.

Here's the basic code that I wanted to profile (of course, there is a version for each of the languages):

void RenderWeirdGradient(BackingBuffer *buffer)
{
    for (int y = 0; y < buffer->height; ++y) {
        int row = y * buffer->width;
        for (int x = 0; x < buffer->width; ++x) {
            Pixel p = { 0, y + GreenOffset, x + BlueOffset, 255 };
            buffer->data[row + x] = p;
        }
    }
}

- (void)drawRect:(NSRect)dirtyRect
{
    CFTimeInterval start = CACurrentMediaTime();

    if (buffer.data) {
        RenderWeirdGradient(&buffer);

        CGContextRef contextRef = [[NSGraphicsContext currentContext] graphicsPort];
        assert(contextRef != NULL);

        NSData *data = [NSData dataWithBytesNoCopy:buffer.data
                                            length:buffer.width * buffer.height * sizeof(Pixel)
                                      freeWhenDone:NO];
        CGDataProviderRef providerRef = CGDataProviderCreateWithCFData((CFDataRef)data);

        CGImageRef imageRef = CGImageCreate(buffer.width, buffer.height,
                                            buffer.bitsPerComponent,
                                            buffer.bitsPerPixel,
                                            sizeof(Pixel) * buffer.width,
                                            buffer.colorSpace,
                                            buffer.bitmapInfo,
                                            providerRef,
                                            NULL,
                                            true,
                                            kCGRenderingIntentDefault);

        CGContextDrawImage(contextRef, NSMakeRect(0, 0, buffer.width, buffer.height), imageRef);

        CGImageRelease(imageRef);
        CGDataProviderRelease(providerRef);
    }

    CFTimeInterval elapsed = CACurrentMediaTime() - start;
    NSLog(@"elapsed: %f", elapsed);
}

Hopefully the code is straight forward, basically it just renders a window that looks the picture below with the pattern moving at each update call.

![window with blue-green gradient squares][window_pic]

Let's just start with the timings…

|———————–|———–|

| Language | Timing |

|:———————-|———-:|

| ObjC (NO ARC, -O0) | 0.01137s |

| ObjC (NO ARC, -Os) | 0.01051s |

|———————————–|

| ObjC (ARC, -O0) | 0.01159s |

| ObjC (ARC, -Os) | 0.00983s |

|———————————–|

| Swift * (ARC, -Onone) | 0.03005s |

| Swift * (ARC, -O) | 0.01707s |

|———————————–|

| Swift [] (ARC, -Onone)| 1.19796s |

| Swift [] (ARC, -O) | 0.02701s |

|———————————–|

: style="width: 50%"

The "Swift *" is a version that uses UnsafeMutablePointer<UInt8> to handle the buffer. The "Swift []" is one that uses an array of pixel data, where pixel is simply a struct:

struct Pixel {
    var red: Byte       = 0x00
    var green: Byte     = 0x00
    var blue: Byte      = 0x00
    var alpha: Byte     = 0xFF
}

OK… that is not a freaking typo up there. It literally takes 1.2 SECONDS to render each frame. Both ObjC versions did it in 0.01s and even the pointer version of Swift only took 0.03s seconds.

THAT IS 120 TIMES SLOWER!!!!

I can hear you now: oh, that's a debug build, you should always do your performance profiling with optimizations on.

My answer is this: I'm not trying to profile the app, I'm simply trying to build a small game. I cannot do it because the performance is so flipping terrible in debug mode. Trying to debug your app with optimized code is just a pain. Sure, at the end of the project, that's probably the right thing to be doing, but until you get there, builds without optimizations are just so much easier to work with.

So what are we to do?

Well… if you want to use Swift, you have to bang your head against your desk and try different things until something gets better. In this case, I can make the performance significantly better (though, let's be honest here, it's still

37 TIMES slower than the ObjC version):

func renderWeirdGradient(blueOffset: Int, _ greenOffset: Int) {
    for var y = 0; y < buffer.height; y++ {
        let row = y * buffer.width
        for var x = 0; x < buffer.width; x++ {
            // Simply using the "FAST" code block changes the timing from 1.2s
            // per call to 0.37s per call.

            // ---- SLOW CODE BLOCK
            buffer.pixels[row + x].green = Byte((y + greenOffset) & 0xFF)
            buffer.pixels[row + x].blue = Byte((y + blueOffset) & 0xFF)
            // ---- END SLOW CODE BLOCK

            // ---- FASTER CODE BLOCK
            let pixel = Pixel(
                red: 0,
                green: Byte((y + greenOffset) & 0xFF),
                blue: Byte((x + blueOffset) & 0xFF),
                alpha: 255)

            buffer.pixels[row + x] = pixel
            // ---- END FASTER CODE BLOCK
        }
    }

    self.needsDisplay = true
}

I ran into this same type of stuff when I was working on the JSON parser. The fundamental problem with Swift, with regards to performance, is that it is impossible to reason what the performance of your code is going to be. It is impossible to know if your code is just horrendously slow in debug, or if the optimizing is going to get ride of all of the extra crap that is going on.

This sucks.

What I'm Left With

Seriously, what are we supposed to be doing here? Is it actually possible to build highly responsive apps that process a lot of complex information in Swift?

If most of your code is Swift code that is simply bridging into ObjC or C, you will probably read this post and not have a clue what I'm talking about. You are benefiting from the speed of ObjC.

The sad state of affairs is that, today, Swift is slow. I don't see that changing by next WWDC.

Swift Source: https://github.com/owensd/handmade-swift/tree/day005_badperf

ObjC Projects: BackingBufferObjC.zip

Swift Resistance

Response to Monads (and Functional Programming)

NOTE: I've posted an update at the end based on some of the Twitter

conversations.

Rui Peres asked me for my opinion about a piece he wrote about

Monads, partially the context of Swift. Sure!

I'm by no means an expert in functional programming, so be sure to question all things I say. =) Chris Eidhof knows this stuff much, MUCH better than I do and could probably offer a much different perspective.

So, my thoughts… well. It's more complicated than that. =) It's more complicated because it actually feeds into something I've been feeling for many years and only recently really been trying to understand and figure out if there is something tangible I can do to make it better, if for no one else, at least for myself.

So, general critique of the article: I think it was a fair overview of monads and somewhat of an introduction to functional programming in general. I would have maybe added some mention about monads and their use for capturing side-effects in programs, but that's about it. Oh, and make sure the code samples work and state the version of Xcode the Swift samples work on as that's a moving target.

As for functional programming in general: I think there is a tendency to treat it as a magic bullet for solving many of the problems in computer science. Now, this is not to knock on functional programming in the slightest; I believe that it's a paradigm that can teach us many good things. But like everything, it must be used in proper moderation and with care in order to be truly effective.

When I think of functional programming, I see it as trying to apply some core principles to your code base. I would call these out as:

  1. Functional evaluation – everything is a function and should be composable. 2. Clean state – side effects should be scoped to their context.

And that's really it; everything else stems from those two things. Composable functions like map are just a satisfaction of goal #1. Monads are a way to make goal #2 happen. At the end of the day, these are all abstractions.

Here's the thing to remember though: there's no such thing as zero-cost abstractions (slides, video). I'll touch on this a bit more later.

Back to the article… let's look at one of Rui's examples (modified to make Xcode 6.1.1 happy):

public final class Box<T>
{
    let value : T

    public init(_ value : T) {
        self.value = value
    }
}

func flatten<T>(box : Box<Box <T>>) -> Box <T> {
    return box.value
}

func map<T,U>(box : Box<T>, f : T -> U) -> Box<U> {
    return Box(f(box.value))
}

infix operator  >>=  {associativity left}
func >>= <T,U>(box : Box<T>, f : T -> Box<U>) -> Box<U> {
    return flatten(map(box,f))
}

let box = Box(4)
let sum3 = { Box($0 + 3) }
let toString = { Box(String($0 + 0)) }
let iReallyLike = { Box("I really like the number " + $0) }

Each line is perfectly understandable. Each is also fairly near to its most generic abstraction for each case (at least in terms to make Swift happy). But it's the next line that gets me:

let luckyNumber7 = box >>= sum3 >>= toString >>= iReallyLike

These are the questions that I ask myself for most of my code:

  1. What does it do? 2. Is it maintainable? 3. Is the performance OK? 4. How about memory usage? 5. Is it safe? 6. Reentrant? 7. Does it need to be?

Now, #1 can be tricky. It's a pretty common convention to define the >>= as composition type operator (or more formally, the bind operator), so it's fairly safe to say what it does without explicitly checking. However, you still need to do so if you really want to be sure.

The basic point of the operator is to take something like this (from an imperative view)1:

iReallyLike(toString(sum3(box.value).value).value).value

And turn it into the most compostable solution presented above. Granted, if we were not boxing the types we'd simply have:

iReallyLike(toString(sum3(box)))

Of course, if we didn't have the boxed types, well, then we wouldn't have the monads, and that is, after all, the point of the post.

What you need to realize is that a monad is an abstraction. Again, that doesn't make it a bad abstraction.

In the example above, we have multiple considerations to make:

  1. How much memory does each Monad copy take? 2. Can the memory be safely shared to reduce the amount of memory necessary? 3. How much function overhead are we actually imposing with every call? 4. Can we write a compiler that is smart enough to know when it is safe to

deconstruct the unwrapping of the monad? 5. If we hit a performance issue, do I have a way that I can make this better

without getting rid of the abstractions? Without making the code difficult to

maintain?

These are just some of the basic questions we need to ask ourselves.

You really should be asking yourself these questions. I feel like we have

regressed significantly as an industry because we've gotten fat and are used

to fast machines with lots of memory. Even our smart phones have gigs of RAM.

Too often we blindly make the trades without even understanding we are doing

it.

Here's an example that I saw recently from an email conversation I was having with another individual about a blog post I had written:

func functionalCount(sentence: String) -> Int {
    let words = sentence.componentsSeparatedByString(" ")
    return words
        .map { countElements($0) }
        .filter { $0 > 3 }
        .reduce(0, combine: +)
}

The purpose was to count the number of characters in all of the words that had more than three characters in them. Now, maybe this is a naïve approach to writing this solution, but it is a straight-forward implementation of using the individual components to glue together a solution.

Here's the same, basic algorithm in the imperative form:

func imperativeCount(sentence: String) -> Int {
    var count = 0
    let words = sentence.componentsSeparatedByString(" ")

    for word in words {
        let characters = countElements(word)
        if characters > 3 {
            count += characters
        }
    }

    return count
}

The rough timing and memory estimates I did showed basically a 2x improvement for the imperative version and a 70KB memory reduction. The sample text was 10 instances of the Gettysburg Address. I'm sure we could spend a good deal of time fine tuning the performance of each of them. But, if we are in somewhat of an agreement that this is idiomatic code for the two styles, it's a fair question to ask: what's the point of the functional version and is it inherently better?

Here's my high-level, meta point: the functional version doesn't offer any intrinsic value at understanding the problem as a whole. In fact, both are really just a means to the same end: counting those characters.

Update: January 19th, 2015

I received a bit of feedback on Twitter, and since Twitter is a terrible place to actually have a conversation of any substance, I'll put my replies here.

First off, I'm not surprised to see the recommendations for improvements to the functional implementation. While that misses the point somewhat, especially since that code was not mine to begin with, I'll make the updates so we can see some of the potential benefits.

func lazyFunctionalCount(sentence: String) -> Int {
    let words = sentence.componentsSeparatedByString(" ")
    return reduce(
        lazy(words)
            .map { countElements($0) }
            .filter { $0 > 3 },
        0, +)
}

This is faster than the initial version, and essentially the same memory footprint as the imperative version. I'd argue that readability has suffered though.

Next, my "imperative" version wasn't really all that imperative. I agree. But that is one of the points of the article: we should let the good ideas of the functional world flow back into the imperative world. I think that a synergy can be had when we merge the two instead of attempt to stay wholly in one of the realms.

Rob Napier mentions that "Binding (as in for-in) rather than storing vars is the heart of FP. Bind vs. store is the whole argument." I agree with that. I think another way of looking at it is in the context of "what languages" vs. "how languages". I think there is great value in expressing your logic as a description of what you want to do instead of how you want to do it.

Take for example this imperative approach to the problem:

func reallyImperativeCount(sentence: String) -> Int {
    var count = 0, size = 0
    let utf16 = sentence.utf16

    for var idx = utf16.startIndex; idx != utf16.endIndex; idx++ {
        let c = utf16[idx]

        if c == 32 {
            if size > 3 {
                count += size
            }
            size = 0
        }
        else {
            size++
        }
    }

    if size > 3 {
        count += size
    }

    return count
}

This is ugly… but boy is it fast. About 26x faster than the initial imperative solution. The above solution tells you the "how to do it" and has lost much of the "what to do". It's also error-prone; I had to spend about 10 minutes tracking down a couple of bugs in this implementation that were gracefully handled in the other two solutions above.

But I think this illustrates one of my main points: abstractions have a cost. It's not always important to write the truly imperative solution above, but when it is necessary, it should be possible, and preferably, fit into the same domain space as the original problem and not feel foreign in the language.

  1. Yeah, yeah. The bind operator is for transferring one monadic value

    to another monadic value via a function that takes the non-monadic

    value as a parameter.

Response to Monads (and Functional Programming)

Swift Makefiles – Take 2

A couple of days I ago I started a basic Makefile for compiling Swift code. That was a pretty basic Makefile and didn't actually result in a usable executable or library at the end.

Today we'll remedy that.

Makefile

Below is a Makefile that is starting to pull in more of the options and settings that Xcode uses. My intention is to create a Makefile that can capture most of those settings and allow me to simply clone it for my other projects.

# A more complicated build file that is starting to look a lot like how
# Xcode internally structures its builds. The goal here is to show how
# one might actually go about building a full Xcode-capable build system
# without all of the requirements of Xcode, xcodebuild, and xcproj files.
#
# There will likely be limitations to this process, but lets see how far
# the wind takes us.

## USER CONFIGURABLE SETTINGS ##
CONFIG       = debug
PLATFORM     = macosx
ARCH         = x86_64
MODULE_NAME  = tool
MACH_O_TYPE  = mh_execute

## GLOBAL SETTINGS ##
ROOT_DIR            = $(shell dirname $(realpath $(lastword $(MAKEFILE_LIST))))
BUILD_DIR           = $(ROOT_DIR)/bin
SRC_DIR             = $(ROOT_DIR)/src
LIB_DIR             = $(ROOT_DIR)/lib

TOOLCHAIN           = Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/$(PLATFORM)
TOOLCHAIN_PATH      = $(shell xcode-select --print-path)/$(TOOLCHAIN)

SWIFT               = $(shell xcrun -f swift) -frontend -c -color-diagnostics

## COMPILER SETTINGS ##
CFLAGS       = -g -Onone
SDK_PATH     = $(shell xcrun --show-sdk-path -sdk $(PLATFORM))

## LINKER SETTINGS ##
LD           = $(shell xcrun -f ld)
LDFLAGS      = -syslibroot $(SDK_PATH) -lSystem -arch $(ARCH) \
                -macosx_version_min 10.10.0 \
                -no_objc_category_merging -L $(TOOLCHAIN_PATH) \
                -rpath $(TOOLCHAIN_PATH)
OBJ_EXT      = 
OBJ_PRE      =

ifeq (mh_dylib, $(MACH_O_TYPE))
    OBJ_EXT  = .dylib
    OBJ_PRE  = lib
    LDFLAGS += -dylib
endif

## BUILD LOCATIONS ##
PLATFORM_BUILD_DIR    = $(BUILD_DIR)/$(MODULE_NAME)/bin/$(CONFIG)/$(PLATFORM)
PLATFORM_OBJ_DIR      = $(BUILD_DIR)/$(MODULE_NAME)/obj/$(CONFIG)/$(PLATFORM)
PLATFORM_TEMP_DIR     = $(BUILD_DIR)/$(MODULE_NAME)/tmp/$(CONFIG)/$(PLATFORM)

SOURCE = $(notdir $(wildcard $(SRC_DIR)/*.swift))

## BUILD TARGETS ##
tool: setup $(SOURCE) link

## COMPILE RULES FOR FILES ##

%.swift:
    $(SWIFT) $(CFLAGS) -primary-file $(SRC_DIR)/$@ \
        $(addprefix $(SRC_DIR)/,$(filter-out $@,$(SOURCE))) -sdk $(SDK_PATH) \
        -module-name $(MODULE_NAME) -o $(PLATFORM_OBJ_DIR)/$*.o -emit-module \
        -emit-module-path $(PLATFORM_OBJ_DIR)/$*~partial.swiftmodule

main.swift:
    $(SWIFT) $(CFLAGS) -primary-file $(SRC_DIR)/main.swift \
        $(addprefix $(SRC_DIR)/,$(filter-out $@,$(SOURCE))) -sdk $(SDK_PATH) \
        -module-name $(MODULE_NAME) -o $(PLATFORM_OBJ_DIR)/main.o -emit-module \
        -emit-module-path $(PLATFORM_OBJ_DIR)/main~partial.swiftmodule

link:
    $(LD) $(LDFLAGS) $(wildcard $(PLATFORM_OBJ_DIR)/*.o) \
        -o $(PLATFORM_BUILD_DIR)/$(OBJ_PRE)$(MODULE_NAME)$(OBJ_EXT)

setup:
    $(shell mkdir -p $(PLATFORM_BUILD_DIR))
    $(shell mkdir -p $(PLATFORM_OBJ_DIR))
    $(shell mkdir -p $(PLATFORM_TEMP_DIR))

While there seems to be a lot going on in this file, it is all fairly straight forward. Basically it just does the following:

  1. Configures the settings for the build.
  2. Compiles each individual file, treating main.swift as special.
  3. Links the object files together to create a combined binary.

The Makefile doesn't support everything yet; for instance, there is no optimization flags being set for release builds. But, it is a start.

Testing Our Work

To test it out, create our go to foo.swift and main.swift files:

foo.swift

public func foo() -> String {
    let message = testing()
    return "Foo - \(message)"
}

main.swift

import Foundation

public func testing() -> String { return "testing!" }

println("Hello, World!")
println("testing: \(testing())")
println("foo: \(foo())")

As you can see, there is a bit of circular usage as both foo and testing are used in both files. This allows us a basic check that each file can see each others symbols and invoke them.

Running make gets us the following output (raw output):

$ make
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift -frontend -c -color-diagnostics -g -Onone -primary-file /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/src/foo.swift \
        /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/src/main.swift -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk \
        -module-name tool -o /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/foo.o -emit-module \
        -emit-module-path /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/foo~partial.swiftmodule
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift -frontend -c -color-diagnostics -g -Onone -primary-file /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/src/main.swift \
        /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/src/foo.swift -sdk /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk \
        -module-name tool -o /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/main.o -emit-module \
        -emit-module-path /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/main~partial.swiftmodule
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld -syslibroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.10.sdk -lSystem -arch x86_64 -macosx_version_min 10.10.0 -no_objc_category_merging -L /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/macosx -rpath /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/swift/macosx /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/foo.o /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/obj/debug/macosx/main.o \
        -o /Users/dowens/Projects/Playground/SwiftOptions/SwiftTool/SwiftTool/bin/tool/bin/debug/macosx/tool

The thing to note is that a tool executable has been built as can be seen in that last line.

$ bin/tool/bin/debug/macosx/tool 
Hello, World!
testing: testing!
foo: Foo - testing!

Running it, we can see that everything seems to be in working order.

So that's it, a somewhat basic Makefile that can be used to create executables and libraries without having to rely on Xcode. Now, should we be doing this? I'll leave that question as an exercise to the reader.

Swift Makefiles – Take 2

Compiling Individual Files

Continuing the adventure of compiling Swift files without Xcode, you may have the need to compile individual files. I’m not sure if we can use this method to build up our own incremental builds for Swift as I don’t have any projects large enough to experiment with yet, but it could be interesting.

I’m going to start out with a couple of files: foo.swift and main.swift.

foo.swift

public func foo() -> String { return "Foo" }

main.swift

println("Hello, World!")

Now, if you simply run:

$ swiftc -sdk $(xcrun --show-sdk-path) -emit-object foo.swift -o foo.o

You’ll get a binary that will contain a main function as the compiler will add one in for your implicitly. When you go to link the files, you’ll end up with duplicate symbol collision.

$ nm foo.o 
00000000000000d8 s EH_frame0
0000000000000065 s L___unnamed_1
0000000000000010 T __TF3foo3fooFT_SS
0000000000000118 S __TF3foo3fooFT_SS.eh
                 U __TFSSCfMSSFT21_builtinStringLiteralBp8byteSizeBw7isASCIIBi1__SS
                 U __TFSsa6C_ARGCVSs5Int32
                 U __TFSsa6C_ARGVGVSs20UnsafeMutablePointerGS_VSs4Int8__
0000000000000030 T _main
0000000000000140 S _main.eh
0000000000000000 t _top_level_code
00000000000000f0 s _top_level_code.eh                     

Remember how I said that there are some interesting differences between swift and swiftc, well, here’s another one.

What we need to do is tell Swift to compile foo.swift but also let the compiler know that we have some other files. We can do that like this:

swift -frontend -c -sdk $(xcrun --show-sdk-path) -emit-object -primary-file foo.swift main.swift -o foo.o

NOTE!! You cannot use the -primary-file switch with swiftc; it’s only available when you use swift -frontend -c. What I believe is happening is that the primary file is getting compiled and the compiler is using all of the other input files for determining available symbols.

After updating the files to look like this:

foo.swift

public func foo() -> String {
    let message = testing()
    return "Foo - \(message)"
}

main.swift

public func testing() -> String { return "testing!" }
println("Hello, World!")

Then compiling foo.swift:

$ swift -frontend -c -sdk $(xcrun --show-sdk-path) -emit-object -primary-file foo.swift main.swift -o foo.o

We can see all of the goodies, including the all important missing main function definition.

$ nm foo.o
0000000000000268 s EH_frame0
00000000000001f2 s L___unnamed_1
0000000000000000 T __TF3foo3fooFT_SS
0000000000000280 S __TF3foo3fooFT_SS.eh
                 U __TF3foo7testingFT_SS
                 U __TFSS30convertFromStringInterpolationfMSSFtGSaSS__SS
                 U __TFSS37convertFromStringInterpolationSegmentfMSSFSSSS
                 U __TFSSCfMSSFT21_builtinStringLiteralBp8byteSizeBw7isASCIIBi1__SS
                 U __TFSa20convertFromHeapArrayU__fMGSaQ__FTBp5ownerBo5countBw_GSaQ__
                 U __TMdSS
                 U __swift_FORCE_LOAD_$_swiftFoundation
0000000000000218 S __swift_FORCE_LOAD_$_swiftFoundation_$_foo
0000000000000160 t _arraydestroy
00000000000002a8 s _arraydestroy.eh
0000000000000200 s _metadata
                 U _swift_allocObject
                 U _swift_deallocObject
                 U _swift_unknownRelease
                 U _swift_unknownRetain

There are a lot more options available to us, like:

  • -emit-module which would let us produce partial .swiftmodule files.
  • -emit-module-path for the path of the partial .swiftmodule files.
  • -emit-module-doc-path for the path of the partial .swiftdoc files.

Of course, this is only the tip of the iceberg. Next, you’ll need to merge the

swiftmodule files together and then link the object files together to actually create a usable executable or library.

That’s for another day.

Compiling Individual Files

swift vs swiftc

While working on [yesterday's post][yesterday], I noticed that Xcode uses the

swift command to perform its compiles rather than the swiftc command. What's odd, and maybe some others know more information here, is that swiftc is just a link to swift.

$ xcrun -f swift
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swift

$ xcrun -f swiftc
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/swiftc

$ xcrun -f swift | ls -l swift*
-rwxr-xr-x  1 root  wheel  35546880 Nov 17 10:16 swift
-rwxr-xr-x  1 root  wheel    505488 Nov 17 10:16 swift-demangle
-rwxr-xr-x  1 root  wheel     46832 Nov 17 10:16 swift-stdlib-tool
lrwxr-xr-x  1 root  wheel         5 Nov 18 12:38 swiftc -> swift

However, it seems the swift tool itself is detecting how it's actually being run and switching modes internally.

This is obvious if you just run the two:

  1. swift – opens the Swift interpreter
  2. swiftc – prints an error <unknown>:0: error: no input files

So why am I bringing this up? Well, I found it interesting. =) In the end, I believe that swiftc is virtually the same as swift -frontend -c; however, I've not been able to confirm that fully and the error messages are slightly different between swiftc and swift -frontend -c, so the code path has to be at least a little different.

swift vs swiftc