Guess the Protocol Behavior

In Today's post, well, this evenings' really, it's time to play "guess what happens".

Let's start with the given code:

public protocol Animal {
    func speak()
}

extension Animal {
    public func speak() {
        print("rawr?")
    }
}

Simple enough, all animals say "rawr?", by default.

Let's add a bit more to the puzzle:

public struct Dog : Animal {
    public func speak() {
        print("ruff ruff!")
    }

    public init() {} // yes, we must define this because of "good reasons"
}

This is all pretty straight forward, nothing really interesting here. So let's create a little function to get the animals talking.

public func talk(animals: [Animal]) {
    animals.forEach { $0.speak() }
}

This is where it is going to start to get interesting:

public struct Sheep : Animal {
    public init() {}
}

And then:

let animals : [Animal] = [Dog(), Sheep()]
talk(animals)

The output is:

ruff ruff!
rawr?

But what if this is added:

extension Sheep {
    func speak() {
        print("bah!")
    }
}

What if I told you the output what still:

ruff ruff!
rawr?

Can you tell me why that is?

If I did this:

Sheep().speak()

The output would be correctly this:

bah!

The issue here is where the talk() function is defined. If talk() is defined within the same module as the extension for Sheep, then the output is the following:

ruff ruff!
bah!

However, if the talk() function is defined outside of that module, well then the output is:

ruff ruff!
rawr?

This behavior is unsettling to me. For one, it makes some sense that you cannot change the functionality of another module with extensions to types belonging to that module. On the other hand, if I provide an extension for Sheep in my module, I'll be able to use the new functionality just fine there, but anytime the type gets used in another module, the functionality will fall-back to the original behavior.

This just sounds like a scary source of bugs waiting to happen. I think the solution might be to simply dissallow extensions to protocols that are not defined within the same module. I rather lose out on potential functionality to maintain certain guarantees in my program.

Thoughts?

Update August 20th, 2015 @ 7:30am HST

The above explanation of talk() is a bit incorrect; here's the version I meant to copy in:

public func talkOf(animals: [Sheep]) {
    animals.forEach { $0.speak() }
}

The issue with the original talk() function is that the extension will never be used as the type is defined within another module if the base protocol of Animal is used.

Here are my three talk()-like functions I used:

public func talk(animals: [Animal]) {
    print("in-module: talk")
    animals.forEach { $0.speak() }
}

public func talkOf<T : Animal>(animals: [T]) {
    print("in-module: talkOf")
    animals.forEach { $0.speak() }
}

public func sheepTalk(animals: [Sheep]) {
    print("in-module: sheepTalk")
    animals.forEach { $0.speak() }
}


public func talk(animals: [Animal]) {
    print("out-of-module: talk")
    animals.forEach { $0.speak() }
}

public func talkOf<T : Animal>(animals: [T]) {
    print("out-of-module: talkOf")
    animals.forEach { $0.speak() }
}

public func sheepTalk(animals: [Sheep]) {
    print("out-of-module: sheepTalk")
    animals.forEach { $0.speak() }
}

The in-module versions are "my code"; it's the code where the Sheep extension is defined (the extension being public or internal had no effect). The out-of-module code is where the Animal protocol and Sheep type are defined. The thing to note that is that even with-in my own module, where I've defined the extension for Sheep, if I use the base type Animal, I'll not see my extension's behavior:

print("\nin-module: Sheep()")
let s = Sheep()
s.speak()

print("\nin-module: Animal - Sheep()")
let a: Animal = Sheep()
a.speak()

The output is:

in-module: Sheep()
bah!

in-module: Animal - Sheep()
rawr?

In anycase, a simple error or warning when defining extensions on types defined in a different module would alleviate this problem.

Project File: ProtocolDispatch.zip

Guess the Protocol Behavior

Protocols – My Current Recommendations

The big talk about Swift lately is around protocols. Everything should be a protocol they say! Well, that's great in theory, however, in practice, that attitude can lead to some really unfortunate side-effects.

Here are the top two things that I always try to keep in mind when working with protocols in my code:

1. Don't treat protocols as a type

A lot of solutions I see (and what I did initially) are basically treating protocols as a base class in your type hierarchy. I don't think this is really where protocols shine. This design pattern is still the "object-oriented" way of thinking about the problem.

To put it another way, if your protocol really only has meaning within your type hierarchy, ask yourself if it really makes sense to make it a protocol. I don't think an answer of, "well, I want my type to be a struct so I need to use a protocol here instead" is a good reason. Decompose it and make it more applicable if that's really the case.

Futher validation of this: http://swiftdoc.org/swift-2/. Notice anything about all of those protocols (well, all of the ones not prefixed with _)? All of them can be applied to multiple different types regardless of type hiearchy.

2. Don't make your protocols generic unless you really have too!

Hopefully this is a just a point-in-time problem, but as soon as you make your protocols generic, you lose the ability have typed collection of hetergenous instances of protocols. I consider this a serious design limitation. For instance, all of the non-Self constrained functionality of a protocol should be safely callable from any place that protocol wants to be used.

This also applies to having your protocol adhere to generic protocols, such as Equatable. Generics are infectious.

Doing this:

protocol Foo : Equatable {}

Is almost certainly going to cause you some significant grief down the line.

Here's a practical example:

Let's say we want to model an HTTP response and we want to support two different types of response types: strings and JSON data.

It might be tempting to do something like this:

class HTTPResponse<ResponseType> {
    var response: ResponseType
    init(response: ResponseType) { self.response = response }
}

I think this is bad approach. As soon as this happens, we have artificially limited our ability to use this type; it cannot be used in collections in a heterogenous fashion, for example. Now, why might I want a collection of these that have different ResponseType representation? Well, let's say I want to build a response/request playback engine for testing. The collection of responses I get back will be of my supported types: string and JSON data.

One option to address this is to simply use AnyObject. This works, but that pretty much sucks.

Another approach to address this problem is with protocols. However, instead of just creating a ResponseType protocol, let's think about what we really want from this. What I really care about is that any ResponseType that is provided to an HTTPResponse can be represented as a String.

With that in mind, we end up with something like this:

protocol StringRepresentable {
    var stringRepresentation: String { get }
}

class HTTPResponse {
    var response: StringRepresentable
    init(response: StringRepresentable) { self.response = response }
}

To me, this is vastly superior as it provides the consumers of the API to be much more flexible while still maintaining some type clarity.

Of course, this doesn't come without its own drawabks, and I'd be remiss to not point it out. If you actually want to deal with the specific type for the response, you need to cast it.

class JSONResponse : StringRepresentable {
    var stringRepresentation: String = "{}"
}

let http = HTTPResponse(response: JSONResponse())
let json = http.response as? JSONResponse

This is still significally better though. I, the caller, know what the response type is supposed to be, or what the possible values could be. This is starkly different then when I'm looping over the collection pull at the out the responses and want to get the value of the response because the consumer of the code could have actually created other response types, such as XMLResponse, and now my code would have no way of knowing about it.

In a perfect world, we could do this:

class HTTPResponse<ResponseType : StringRepresentable> {
    var response: ResponseType
}

let responses = [json, string]  // responses is an array of HTTPResponse where ResponseType is unrealized

You would still need to cast the response in the collection use case, however, using the json instance directly would still give you full type validation.

Until we can get there though, I'll take the collection type of [HTTPResponse] over [AnyObject] every time.

Protocols – My Current Recommendations

Goto Fail and Swift

So this is a blog post about a pet-peeve of mine. The claim: "Swift cannot have bugs like Apple's goto-fail bug."

This is rubbish!

The biggest problem I have with much of the analysis of this bug is the focus on the missing braces around the if-statements. No, the problem is that the code is terrible to begin with, and it obviously had no tests which were trivial to implement.

So, we have to start with the following assumptions to see just how we get this code in Swift:

  1. The code structure was just poor to begin with.
  2. Compiler settings were disabled so the "unreachable code paths" warning didn't show (or was ignored).
  3. Evidently tests were thought to be optional.

So, here's basically the same code in Swift:

enum SSLHashSHA1 {
    static func update(inout hashCxt: Int, _ call: Int) -> OSStatus {
        hashCxt = call
        if call == 4 { return -1 }
        return 0
    }
}

func isAnyOneSafe() -> OSStatus {
    var err: OSStatus = 0
    var hashCtx: Int = 0

    err = SSLHashSHA1.update(&hashCtx, 0)
    if (err != 0) {
        return err
    }
    err = SSLHashSHA1.update(&hashCtx, 1)
    if (err != 0) {
        return err
    }
    err = SSLHashSHA1.update(&hashCtx, 2)
    if (err != 0) {
        return err
    }
    err = SSLHashSHA1.update(&hashCtx, 3)
    if (err != 0) {
        return err
    }
    return err
    err = SSLHashSHA1.update(&hashCtx, 4)
    if (err != 0) {
        return err
    }

    return err
}

isAnyOneSafe()  // returns 0, should return -1 though

The point of this is post is to debunk the myth that you are immune to these same type of stupid errors just because you are writing code in Swift. That's just simply not true. A "merge error" or a "copy and paste error" like the above is pretty easy to do and miss if people aren't paying attention and not code reviewing changes.

Moral of the story: the compiler tells you jack squat about the correctness of your code; it only tells you that you have passed the rules to generate machine code based on the language rules. You still need to write tests to make sure that your code is actually functioning correctly.

P.S. If your counter-argument is that this is terrible code and you shouldn't write it that way to begin with! Of course it's terrible code! However, the fact is that the C-version is also terrible code and it shouldn't have been written that way either. Unfortunately, bad code happens regardless of language.

P.P.S. Swift does provide us a nice way to write this code that is, in my opinion, even better than some of the cleaned up C-versions from the links above:

enum ResultCode : ErrorType {
    case Error
    case Success
}

enum SSLHashSHA1 {
    static func update(inout hashCxt: Int, _ call: Int) throws {
        hashCxt = call
        if call == 4 {
            throw ResultCode.Error
        }
        throw ResultCode.Success
    }
}

func isAnyOneSafe() throws {
    var hashCtx: Int = 0

    try SSLHashSHA1.update(&hashCtx, 0)
    try SSLHashSHA1.update(&hashCtx, 1)
    try SSLHashSHA1.update(&hashCtx, 2)
    try SSLHashSHA1.update(&hashCtx, 3)
    try SSLHashSHA1.update(&hashCtx, 4)
}

do {
    try isAnyOneSafe()
}
catch {
    print("yay!")
}
Goto Fail and Swift

Be Mindful of Your Filters

let items = 1...100
for i in items {
    if i % 2 != 0 { continue }
    print("\(i)")
}


let items = 1...100
items
    .filter() { $0 % 2 == 0 }
    .forEach() { print("\($0)") }

Two loops, two ways of looking at the problem. The second is better, yeah? It's cleaner, easier to read, easier to understand. All of those lead to more maintainable and less buggy code. So what's the problem?

Performance.

Sure, maybe you won't actually run into any particular issue with this usage, but what if you want to add some more filters?

let items = 1...100
items
    .filter() { $0 % 2 == 0 }
    .filter() { $0 % 3 == 0 }
    .filter() { $0 % 5 == 0 }
    .forEach() { print("\($0)") }

Now, again, in this specific case, it might not be too bad. Afterall, the first filter() loops through all 100 items, the second filter() only needs to go through 50 (all of the even numbers). The last filter() then only needs to run through 16 values. Finally, the forEach() is really only working on collection of 3 items.

This version of the construct doesn't have the performance problem though:

let items = 1...100
for i in items {
    if i % 2 != 0 { continue }
    if i % 3 != 0 { continue }
    if i % 5 != 0 { continue }
    print("\(i)")
}

If you have missed it, the performance problem is that every call of filter() is a potential O(n) operation. If you want to apply three filter() calls and a forEach(), that is going to be four times through the collection. In addition to that, each filter() is creating a new array of your filtered items.

Bad mojo.

Now, you might be muttering to yourself: premature optimization! You haven't even profiled it! To that, I say: why write code that you know has a good likely-hood of being a performance problem? Especially if you don't even need to sacrifice the coding approach to just make it better from the start?

Of course, we don't want to just throw away the chained filters because that style is a lot cleaner. Thankfully, there is already a Swift type that helps us out here: LazySequence (and its LazyCollection friend).

let items = 1...100
lazy(items)
    .filter() { $0 % 2 == 0 }
    .filter() { $0 % 3 == 0 }
    .filter() { $0 % 5 == 0 }
    .forEach() { print("\($0)") }

Simply wrapping items in a lazy() call will convert our Sequence into a LazySequence. This gives us the performance benefits of the more iterative-style approach with the benefits of the semantically broken out operations.

This is pretty interesting to watch in a playground as well with a large collection as you'll be able see the filters being applied as an iteration over each new collection (the non-lazy version) or in-sequence as the collection is iterated (lazy version).

Update: August 11th, 2015 @ 2:15pm

Just to clarify, the above performance gains that we are getting with the use of lazy() are from the following:

  1. Reducing the number of times each element in the sequence is visited.
  2. Removing the intermediate copies of the filtered collection for each filter() or map() call.

This is not reducing the number of filter() calls because that still needs to be done per element, thus we are not really changing the time complexity, per se.

Here is some quick and dirty perf tests (2012 rMBP, Release build):

let items = 1...100000000

func measure(fn: () -> ()) -> NSTimeInterval {
    let start = NSDate().timeIntervalSince1970
    fn()
    return NSDate().timeIntervalSince1970 - start
}

var counter = 0

let time = measure() {
    items
        .filter() { $0 % 2 == 0 }
        .filter() { $0 % 3 == 0 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

let lazyTime = measure() {
    lazy(items)
        .filter() { $0 % 2 == 0 }
        .filter() { $0 % 3 == 0 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

print("counter: \(counter)")
print("time: \(time)")
print("lazy time: \(lazyTime)")

Output:

counter: 333333366666660
time: 0.795416116714478
lazy time: 0.286408185958862

Another run this time incorporating some map() calls:

let time = measure() {
    items
        .filter() { $0 % 2 == 0 }
        .map() { $0 * 2 }
        .filter() { $0 % 3 == 0 }
        .map() { $0 + 1 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

let lazyTime = measure() {
    lazy(items)
        .filter() { $0 % 2 == 0 }
        .map() { $0 * 2 }
        .filter() { $0 % 3 == 0 }
        .map() { $0 + 1 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

Output:

counter: 666666500000010
time: 1.12964105606079
lazy time: 0.129108905792236
Be Mindful of Your Filters