Be Mindful of Your Filters

let items = 1...100
for i in items {
    if i % 2 != 0 { continue }
    print("\(i)")
}


let items = 1...100
items
    .filter() { $0 % 2 == 0 }
    .forEach() { print("\($0)") }

Two loops, two ways of looking at the problem. The second is better, yeah? It's cleaner, easier to read, easier to understand. All of those lead to more maintainable and less buggy code. So what's the problem?

Performance.

Sure, maybe you won't actually run into any particular issue with this usage, but what if you want to add some more filters?

let items = 1...100
items
    .filter() { $0 % 2 == 0 }
    .filter() { $0 % 3 == 0 }
    .filter() { $0 % 5 == 0 }
    .forEach() { print("\($0)") }

Now, again, in this specific case, it might not be too bad. Afterall, the first filter() loops through all 100 items, the second filter() only needs to go through 50 (all of the even numbers). The last filter() then only needs to run through 16 values. Finally, the forEach() is really only working on collection of 3 items.

This version of the construct doesn't have the performance problem though:

let items = 1...100
for i in items {
    if i % 2 != 0 { continue }
    if i % 3 != 0 { continue }
    if i % 5 != 0 { continue }
    print("\(i)")
}

If you have missed it, the performance problem is that every call of filter() is a potential O(n) operation. If you want to apply three filter() calls and a forEach(), that is going to be four times through the collection. In addition to that, each filter() is creating a new array of your filtered items.

Bad mojo.

Now, you might be muttering to yourself: premature optimization! You haven't even profiled it! To that, I say: why write code that you know has a good likely-hood of being a performance problem? Especially if you don't even need to sacrifice the coding approach to just make it better from the start?

Of course, we don't want to just throw away the chained filters because that style is a lot cleaner. Thankfully, there is already a Swift type that helps us out here: LazySequence (and its LazyCollection friend).

let items = 1...100
lazy(items)
    .filter() { $0 % 2 == 0 }
    .filter() { $0 % 3 == 0 }
    .filter() { $0 % 5 == 0 }
    .forEach() { print("\($0)") }

Simply wrapping items in a lazy() call will convert our Sequence into a LazySequence. This gives us the performance benefits of the more iterative-style approach with the benefits of the semantically broken out operations.

This is pretty interesting to watch in a playground as well with a large collection as you'll be able see the filters being applied as an iteration over each new collection (the non-lazy version) or in-sequence as the collection is iterated (lazy version).

Update: August 11th, 2015 @ 2:15pm

Just to clarify, the above performance gains that we are getting with the use of lazy() are from the following:

  1. Reducing the number of times each element in the sequence is visited.
  2. Removing the intermediate copies of the filtered collection for each filter() or map() call.

This is not reducing the number of filter() calls because that still needs to be done per element, thus we are not really changing the time complexity, per se.

Here is some quick and dirty perf tests (2012 rMBP, Release build):

let items = 1...100000000

func measure(fn: () -> ()) -> NSTimeInterval {
    let start = NSDate().timeIntervalSince1970
    fn()
    return NSDate().timeIntervalSince1970 - start
}

var counter = 0

let time = measure() {
    items
        .filter() { $0 % 2 == 0 }
        .filter() { $0 % 3 == 0 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

let lazyTime = measure() {
    lazy(items)
        .filter() { $0 % 2 == 0 }
        .filter() { $0 % 3 == 0 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

print("counter: \(counter)")
print("time: \(time)")
print("lazy time: \(lazyTime)")

Output:

counter: 333333366666660
time: 0.795416116714478
lazy time: 0.286408185958862

Another run this time incorporating some map() calls:

let time = measure() {
    items
        .filter() { $0 % 2 == 0 }
        .map() { $0 * 2 }
        .filter() { $0 % 3 == 0 }
        .map() { $0 + 1 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

let lazyTime = measure() {
    lazy(items)
        .filter() { $0 % 2 == 0 }
        .map() { $0 * 2 }
        .filter() { $0 % 3 == 0 }
        .map() { $0 + 1 }
        .filter() { $0 % 5 == 0 }
        .forEach() { counter = counter &+ $0 }
}

Output:

counter: 666666500000010
time: 1.12964105606079
lazy time: 0.129108905792236
Be Mindful of Your Filters