Looping with iterate() and takeWhile()

April 13, 2016October 29, 2016 owensdswift

There’s a funny thing that happens when you remove a language construct that actually provides value: you need to re-invent ways to support that construct.

The proposal Add scan, takeWhile, dropWhile, and iterate to the stdlib provides a basic way to get back the lost functionality of the C-style for-loop, specifically with iterate() and takeWhile().

The key thing to remember for the implementation is that we must have a lazy version of iterate() in order for this to be semantically comparable to the C-style for-loop that is being replaced. Further, we need to be extremely careful when using the proposed takeWhile() (and other) extensions to be sure we’re getting the lazy versions when we need them.

So let’s look at what an implementation might look like (this is using Swift 2.2). We are going to want to replicate the following loop:

for var n = 1; n < 10; n = n \* 2 {
    print("\(n)", terminator: ", ")
}

This loop simply outputs: 1, 2, 4, 8,

Ok, first we need to define the iterate function:

// Creates a lazy sequence that begins at start. The next item in the
// sequence is calculated using the stride function.
func iterate(initial from: T, stride: T throws -> T) -> StridingSequence

This is going to require that we return a SequenceType (this is renamed to Sequence in Swift 3). But remember, we want this to be lazy, so we really need to conform to the LazySequenceType protocol. That type is going to need to know the starting point and the mechanism to stride through the desired sequence.

struct StridingSequence : LazySequenceType {
    let initial: Element
    let stride: Element throws -> Element

    init(initial: Element, stride: Element throws -> Element) {
        self.initial = initial
        self.stride = stride
    }

    func generate() -> StridingSequenceGenerator {
        return StridingSequenceGenerator(initial: initial, stride: stride)
    }
}

Of course, now the StridingSequence is going to need the underlying GeneratorType implementation: StridingSequenceGenerator (the GeneratorType protocol is renamed to IteratorProtocol in Swift 3).

struct StridingSequenceGenerator : GeneratorType, SequenceType {
    let initial: Element
    let stride: Element throws -> Element
    var current: Element?

    init(initial: Element, stride: Element throws -> Element) {
        self.initial = initial
        self.stride = stride
        self.current = initial
    }

    mutating func next() -> Element? {
        defer {
            if let c = current {
                current = try? stride(c)
            }
            else {
                current = nil
            }
        }
        return current
    }
}

OK… this is getting to be a lot of code. But there’s going to be a big payoff, right?

What we have now is an infinite sequence. We can test it out like so:

for n in iterate(initial: Int(1), stride: \{$0 \* 2}) {
    if n >= 10 { break }
    print("\(n)", terminator: ", ")
}

At this point, we are pretty close to getting what we want. The last question is how to move the condition out of the body of the loop and into the for-loop construct?

We have two basic options:

Add a while: parameter to the iterate() function, or
Add a takeWhile() function that can be chained.

The proposal that I linked to earlier proposes to add a takeWhile() function. This is probably the “better” way to go given that we are creating a sequence and it’s feasible that we may want to do other operations, like filtering.

Unfortunately, this means a bit more code.

Let’s start with the extension to LazySequenceType:

extension LazySequenceType \{
    typealias ElementType = Self.Elements.Generator.Element
    func takeWhile(predicate: ElementType -> Bool)
        -> LazyTakeWhileSequence
    {
        return LazyTakeWhileSequence(base: self.elements, takeWhile: predicate)
    }
}

This requires us to create another sequence type that knows how to walk our original sequence type but stop when the given condition is met.

struct LazyTakeWhileSequence : LazySequenceType {
    let base: Base
    let predicate: Base.Generator.Element -> Bool

    init(base: Base, takeWhile predicate: Base.Generator.Element -> Bool) {
        self.base = base
        self.predicate = predicate
    }

    func generate() -> LazyTakeWhileGenerator {
        return LazyTakeWhileGenerator(base: base.generate(), takeWhile: predicate)
    }
}

And then this is going to require another generator type that can do gives us the next item in the sequence and nil after the condition is met.

struct LazyTakeWhileGenerator : GeneratorType, SequenceType {
    var base: Base
    var predicate: Base.Element -> Bool

    init(base: Base, takeWhile predicate: Base.Element -> Bool) {
        self.base = base
        self.predicate = predicate
    }

    mutating func next() -> Base.Element? {
        if let n = base.next() where predicate(n) {
            return n
        }
        return nil
    }
}

Whew! Now we can write this:

for n in iterate(initial: Int(1), stride: \{$0 \* 2}).takeWhile({ $0 < 10 }) {
    print("\(n)", terminator: ", ")
}

Of course, we could have just written this and been done with it:

for var n = 1; n < 10; n = n \* 2 {
    print("\(n)", terminator: ", ")
}

Summary

It’s honestly really difficult for me to take this approach to be objectively better, especially when I have to write the supporting library code ;). Yes, there are clearly benefits to an iterate() function that you can then perform different operations on, and maybe if I needed to perform some type of filtering with the above loop like so:

let items = iterate(initial: Int(1), stride: \{$0 \* 2})
    .filter({ $0 != 4})
    .takeWhile({ $0 < 10 })

for n in items \{
    print("\(n)", terminator: ", ")
}

I could see the benefit for this approach for some use cases. However, there are also objectively bad things about the approach above. For one, there is a crap ton of code that needs to be written just to get this to work, and I’m not done. I need to similar stuff for collection types and the non-lazy versions as well.

The other thing, I don’t find it any less cryptic. Sure, things are labeled a bit better, but there’s a lot more syntax in the way now (using an @autoclosure would be nice, but you cannot use anonymous variables like $0). In fact, it’s only after moving the iterate() code into its own line, do things start to become a bit more clear.

Anyhow, if you’re interested in how to implement this, it’s all here. And if there is actually an easier way, PLEASE let me know.

Full gist is here: iterate.swift.

APIs Matter

April 7, 2016October 29, 2016 owensdswift

I asked a poll on Twitter today about API preference between two options (three if you count the updated version):

// the very verbose range-based loop
for n in 0.stride(through: 10, by: 2).reverse() {
    print(n)
}

// the more concise range-based loop
for n in 10.stride(through: 0, by: -2) {
    print(n)
}

// c-style loop
for var n = 10; n >= 0; n -= 2 {
    print(n)
}

And even earlier I wrote this blog article: For Loops and Forced Abstractions.

The primary point of the entry was about being forced into abstractions when they are not necessary.

One of the things that really bothered me were the examples in the Swift blog:

for i in (1...10).reverse() {
    print(i)
}

for i in 0.stride(to: 10, by: 2) {
    print(i)
}

In my opinion, those are really terrible APIs. In addition being arguably just as bad to visually parse as the c-style for-loop, they still do not convey the intent behind what is being done: they are supposed to be creating a range and only the first usage even comes close to looking like that. Not only that, there is no symmetry involved in incrementing and decrementing ranges.

For example, this is invalid in Swift: 10...0. So we have, what I would call, a broken and partial abstraction over the concept of “ranges” or “intervals”. Ironically, that’s exactly the API we need, especially when we are removing the c-style for-loop.

Let’s take a look at the Strideable protocol:

/// Conforming types are notionally continuous, one-dimensional
/// values that can be offset and measured.
public protocol Strideable : Comparable {
    /// A type that can represent the distance between two values of `Self`.
    associatedtype Stride : SignedNumberType
    /// Returns a stride `x` such that `self.advancedBy(x)` approximates
    /// `other`.
    ///
    /// - Complexity: O(1).
    ///
    /// - SeeAlso: `RandomAccessIndexType`'s `distanceTo`, which provides a
    ///   stronger semantic guarantee.
    @warn_unused_result
    public func distanceTo(other: Self) -> Self.Stride
    /// Returns a `Self` `x` such that `self.distanceTo(x)` approximates
    /// `n`.
    ///
    /// - Complexity: O(1).
    ///
    /// - SeeAlso: `RandomAccessIndexType`'s `advancedBy`, which
    ///   provides a stronger semantic guarantee.
    @warn_unused_result
    public func advancedBy(n: Self.Stride) -> Self
}

This seems fairly clear: it’s an abstraction over an item that can be incremented or decremented by some Self.Stride value. In addition, we can also determine the distance between two Stridable instances, so long as they share the same Stride associated type.

This is one layer of the abstraction onion, but OK. When applied to numeric types, this gives us the nice ability to add and subtract in a generic and type-safe manner.

The problem, in my opinion, is the extension:

extension Strideable {
    /// Returns the sequence of values (`self`, `self + stride`, `self +
    /// stride + stride`, ... *last*) where *last* is the last value in
    /// the progression that is less than `end`.
    @warn_unused_result
    public func stride(to end: Self, by stride: Self.Stride) -> StrideTo
}

WHAT!?

This makes absolutely no sense to me. I actually find this API really bad on multiple counts:

Why does a type that is responsible for incrementing itself now have the ability to create a sequence of values?
What definition of “stride” ever means “create a sequence”?
The API has a variable named stride that has a different conceptual meaning altogether than the function withthe same name.

In my opinion, this is just a bad API. Further, this goes on to confuse matters at the call sites.

If we must get rid of the c-style for loops, then we need to look at what the alternative is: for-in.

So what is a for-in loop construct?

You use the for-in loop to iterate over a sequence, such as ranges of numbers, items in an array, or characters in a string.

Source: Swift Programming Language: Control Flow.

Great! So what we really want is the ability to create such a range with as little abstraction as possible. The stride API is attempting to do that, but it fails to do so in an appropriate matter.

Instead, we want an API that can be called like this:

for n in range(from: 10, to: 0, by: 2) {
}

And here’s what the signature looks like:

func range(
    from start: T, 
    to end: T,
    by step: T.Stride = 1) -> Interval

NOTE: Sure, there needs to be other variants to support open, closed, left-open, and right-open intervals, but that’s irrelevant

for this purpose.

Wait a minute… isn’t that the same as what stride() is today. Sure, except:

range() is vastly more explicit in what is actually going on.
Instead of tacking on to the Strideable protocol like a poor man’s side-car, it composes with it instead creating anAPI that is much more natural and expressive.
Creates a much more natural call site.

I still don’t like the removal for the c-style for-loop, but thankfully, Swift v3 will be moving stride to be a free function again. It’s nice having a more “proper” API to work with out of the box.

Now to get it renamed to range…

For Loops and Forced Abstractions

April 6, 2016October 29, 2016 owensdswift

In case you haven’t heard, the traditional c-style for loop has been deprecated and is slated for removal in Swift 3.0. More info about that can be found here: New Features in Swift 2.2.

I’m not a fan, at all.

The fundamental reason I’m not a fan is quite simple: the only way to write a for loop now is by leveraging abstractions. Personally, I really dislike being required to use abstractions when they are not necessary.

The defense I hear all the time is this:

Well, the compiler will close that gap or remove the abstraction cost all together.

That’s nice in theory, but it’s patently false in practice. The optimizer can remove some of the abstractions, but it cannot guarantee to remove all of the cost of the abstraction every time.

Here’s the real-world cost of abstractions (not necessarily specific to just this for-loop construct):

Language: C, Optimization: -Os                                          Avg (ms) 
---------------------------------------------------------------------------------
RenderGradient (Pointer Math)                                              9.582 
RenderGradient (SIMD)                                                      4.608 

Language: Swift, Optimization: -O                                       Avg (ms) 
---------------------------------------------------------------------------------
RenderGradient ([Pixel])                                                22.51406 
RenderGradient ([UInt32])                                               18.39304 
RenderGradient (UnsafeMutablePointer)                                   20.67769 
RenderGradient (UnsafeMutablePointer<UInt32>)                           15.29333 
RenderGradient ([Pixel].withUnsafeMutablePointer)                       22.51703 
RenderGradient ([UInt32].withUnsafeMutablePointer)                      19.27868 
RenderGradient ([UInt32].withUnsafeMutablePointer (SIMD))               15.63351 
RenderGradient ([Pixel].withUnsafeMutablePointer (SIMD))                24.48129

Source: https://github.com/owensd/swift-perf/blob/swift-v3/reports/swift_3_0-march.txt

At best, under an optimized build, we’re looking at a 4x cost in performance. With unchecked builds, it’s possible to get the performance down to equivalent timings. With non-optimized builds, we are talking anywhere from 3 to 88 (!!) times slower than the equivalent C code.

It’s not that I don’t think that the for-in style loop isn’t useful. I do. I also completely agree that it should be the one used the majority of the time. However, please don’t force me to use abstractions when I don’t want to or when they are not appropriate.

Here’s the before and after with the upcoming changes of some real code:

for var y = 0, height = buffer.height; y < height; ++y {
    let green = min(int4(Int32(y)) &+ yoffset, 255)

    for var x: Int32 = 0, width = buffer.width; x < width; x += 4 {
        let blue = min(int4(x, x + 1, x + 2, x + 3) &+ xoffset, 255)

        p[offset++] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
    }
}

for y in 0..<buffer.height {
    let green = min(int4(Int32(y)) &+ yoffset, 255)

    for x in 0.stride(to: buffer.width, by: 4) {
        let x32 = Int32(x)
        let blue = min(int4(x32, x32 + 1, x32 + 2, x32 + 3) &+ xoffset, 255)

        p[offset] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
        offset += 1
    }
}

I personally don’t consider that a readability win.

It’s more code.
It requires type coercion for x as the Int32 type isn’t stridable.
stride(to:by:) is ambiguous compared the < operator.

And finally, this is not an acceptable alternative in my opinion:

var y = 0
var height = buffer.height
while y < height {

    var x: Int32 = 0
    var width = buffer.width
    while x < Int32(width) {
        let x32 = Int32(x)
        let blue = min(int4(x32, x32 + 1, x32 + 2, x32 + 3) &+ xoffset, 255)

        p[offset] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
        offset += 1

        x += 4
    }

    y += 1
}

Why you might ask?

It’s extremely easy to forget the increment (I actually did a few momements ago).
The iterator variables are being leaked out of scope.
All of the loop parts (initialization, condition, and increment) are scattered throughout the construct.
The suggested pattern of using defer for incrementing is fundamentally flawed:

var i = 0
	while i < 10 \{
    defer { i += 1 }
    if i == 5 { break }
}
	print(i)

What do you think i is here? What should it be? I’ll give you a hint, they aren’t the same answer.

Yes, the above examples are narrow and specific. But that’s exactly the point. When we need to write for narrow and specific cases, that’s exactly when we need to get outside of the abstraction box that makes for simpler code.

It’s strange watching Swift evolve. Maybe I’m just dense or stuck in my old ways, but I can’t see how this change is aligned with one of Swift’s aspirations of being a systems-level language.

owensd.io

A builder of things.

Month: April 2016

Looping with iterate() and takeWhile()

Summary

APIs Matter

For Loops and Forced Abstractions