Composite Validators – Refined

I recently saw a post on how to compose different validators together. I thought it was a good start, but I think it didn’t go quite far enough, especially using the fact that Rob Napier has so kindly reminded us about Swift: types are the unit of composition in Swift.

Here’s the basic setup: there is a need to create some basic validation for email and passwords (the correctness of the validation rules is not important to the discussion).

Scott modeled out what how he conceptually thought of each, which I thought was good. I’ve put those models here, with only some minor name changes:

Email Validation
                ┌────────────────────────┐                
                │    email validation    │                
                └────────────────────────┘                
                             ▲                            
             ┌───────────────┴───────────────┐            
┌────────────────────────┐      ┌────────────────────────┐
│         empty          │      │     invalid format     │
└────────────────────────┘      └────────────────────────┘
Password Validation
                ┌────────────────────────┐          
                │  password validation   │                               
                └────────────────────────┘                               
                             ▲                                           
             ┌───────────────┴───────────────┐                           
┌────────────────────────┐      ┌────────────────────────┐               
│         empty          │      │          weak          │               
└────────────────────────┘      └────────────────────────┘               
                                             ▲                           
                                             │ ┌────────────────────────┐
                                             ├─│         length         │
                                             │ └────────────────────────┘
                                             │ ┌────────────────────────┐
                                             ├─│   missing uppercase    │
                                             │ └────────────────────────┘
                                             │ ┌────────────────────────┐
                                             ├─│   missing lowercase    │
                                             │ └────────────────────────┘
                                             │ ┌────────────────────────┐
                                             └─│     missing number     │
                                               └────────────────────────┘

I think these are conceptually good. However, I wasn’t a fan of how Scott transcribed these models over to their enum representations.

Scott’s Version
enum EmailValidatorError: Error {
    case empty
    case invalidFormat
}

enum PasswordValidatorError: Error {
    case empty
    case tooShort
    case noUppercaseLetter
    case noLowercaseLetter
    case noNumber
}

I think the PasswordValidationError missed the mark as he flattened out the tree.

My Version
enum EmailValidationError: Error {
	case empty
	case invalidFormat
}

enum PasswordValidationError: Error {
	case empty
	case weak(reasoning: [PasswordStrengthValidationError])
}

enum PasswordStrengthValidationError: Error {
	case length
	case missingUppercase
	case missingLowercase
	case missingNumber
}

I break out the weak items into their own PasswordStrengthValidationError error enum to match the conceptual model. This has the benefit of provide a high-level classification of “weak password” while still maintaining the rigor of getting the specific details about why the password is considered weak.

Validation

Next up is the modeling of the validator itself. I like that Scott chose to model this as a protocol. After all, we do know that we are going to need two different types of protocols:

  1. Single validation
  2. Composite validation

However, unlike Scott, I would have modeled both of these with a protocol, like so:

protocol Validator {
	func validate(_ value: String) -> Result
}

protocol CompositeValidator: Validator {
	var validators: [Validator] { get }
	func validate(_ value: String) -> [Result]
}

Side note: I’m also use a more generic Result type vs. Scott’s ValidationResult. It’s simply defined as:

enum Result {
	case ok(ValueType)
	case error(Error)
}

The important thing to note about the CompositeValidator is that it supports returning both a single result and a listing of all of the results. The other nice thing about this is that it’s also possible to provide a default implementation for all implementors of this protocol.

extension CompositeValidator {
	func validate(_ value: String) -> [Result] {
		return validators.map { $0.validate(value) }
	}

	func validate(_ value: String) -> Result {
		let results: [Result] = validate(value)
		let errors = results.filter {
			if case .error(_) = $0 {
				return true
			}
			else {
				return false
			}
		}

		return errors.first ?? .ok(value)
	}
}

In the single result case, we simply return the first error value, if present, or return the ok value. This default implementation will come in really handy when we start implementing things later on.

Types as the Foundation

Remember that types are the foundation? So yeah… let’s start implementing the email validators as they are the most straight-forward of the two.

class EmailEmptyValidator: Validator {
	func validate(_ value: String) -> Result {
		return value.isEmpty ? .error(EmailValidationError.empty) : .ok(value)
	}
}

class EmailFormatValidator: Validator {
	func validate(_ value: String) -> Result {
		let magicEmailRegexStolenFromTheInternet = "^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$"

		let emailTest = NSPredicate(format:"SELF MATCHES %@", magicEmailRegexStolenFromTheInternet)

		return emailTest.evaluate(with: value) ?
			.ok(value) :
			.error(EmailValidationError.invalidFormat)
	}
}

The single value ones are fairly straight forward – you just implement the logic in the validate function based on the value passed in.

Now, to compose the email validations, all we need to do is this:

class EmailValidator: CompositeValidator {
	let validators: [Validator]

	init() {
		self.validators = [
			EmailEmptyValidator(),
			EmailFormatValidator()
		]
	}
}

That’s it! No extra code to worry about as the default CompositeValidator extensions handle this correctly.

The usage code looks like this:

let emailValidator = EmailValidator()
validate(value: "", validator: emailValidator)
validate(value: "invalidEmail@", validator: emailValidator)
validate(value: "validEmail@validDomain.com", validator: emailValidator)

Which will output:

value: "" => error(EmailValidationError.empty)
value: "invalidEmail@" => error(EmailValidationError.invalidFormat)
value: "validEmail@validDomain.com" => ok("validEmail@validDomain.com")

Password Validation

The single value password validators are all straight forward to implement:

class PasswordEmptyValidator: Validator {
	func validate(_ value: String) -> Result {
		return value.isEmpty ? .error(PasswordValidationError.empty) : .ok(value)
	}
}

class PasswordLengthValidator: Validator {
	static let minimumPasswordLength: Int = 8

	func validate(_ value: String) -> Result {
		return value.characters.count >= PasswordLengthValidator.minimumPasswordLength ?
			.ok(value) :
			.error(PasswordStrengthValidationError.length)
	}
}

class PasswordIncludesUppercaseValidator: Validator {
	func validate(_ value: String) -> Result {
		return value.rangeOfCharacter(from: NSCharacterSet.uppercaseLetters) != nil ?
			.ok(value) :
			.error(PasswordStrengthValidationError.missingUppercase)
	}
}

class PasswordIncludesLowercaseValidator: Validator {
	func validate(_ value: String) -> Result {
		return value.rangeOfCharacter(from: NSCharacterSet.lowercaseLetters) != nil ?
			.ok(value) :
			.error(PasswordStrengthValidationError.missingLowercase)
	}
}

class PasswordIncludesNumbersValidator: Validator {
	func validate(_ value: String) -> Result {
		return value.rangeOfCharacter(from: NSCharacterSet.decimalDigits) != nil ?
			.ok(value) :
			.error(PasswordStrengthValidationError.missingNumber)
	}
}

There really is nothing special going on here. However, what does the validation for the weak error type look like? Well, here it is:

class PasswordStrengthValidator: CompositeValidator {
	let validators: [Validator]

	init() {
		self.validators = [
			PasswordLengthValidator(),
			PasswordIncludesUppercaseValidator(),
			PasswordIncludesLowercaseValidator(),
			PasswordIncludesNumbersValidator()
		]
	}

	func validate(_ value: String) -> Result {
		let result = validate(value) as [Result]
		let errors = result.filter { if case .error(_) = $0 { return true }; return false }

		if errors.isEmpty { return .ok(value) }

		let reasons: [PasswordStrengthValidationError] = errors.map {
			if case let .error(reason) = $0 { return reason as! PasswordStrengthValidationError }
			fatalError("This code should never be reached. It is an error if it ever hits.")
		}

		return .error(PasswordValidationError.weak(reasoning: reasons))
	}
}

As you can see, it is necessary to change the default implementation of validate as we would like to do a custom conversion to box the potential list of failures reasons into a single weak value. If we modeled the error type as Scott had them, this step would be unnecessary.

Finally, we look at the all up composite validator for passwords:

class PasswordValidator: CompositeValidator {
	let validators: [Validator]

	init() {
		self.validators = [
			PasswordEmptyValidator(),
			PasswordStrengthValidator()
		]
	}
}

Again, we can simply use the default implementations for our two validate functions as there is no special logic we need here. The usage code is:

let passwordValidator = PasswordValidator()
validate(value: "", validator: passwordValidator)
validate(value: "psS$", validator: passwordValidator)
validate(value: "passw0rd", validator: passwordValidator)
validate(value: "paSSw0rd", validator: passwordValidator)

With the output of:

value: "" => error(PasswordValidationError.empty)
value: "psS$" => error(PasswordValidationError.weak([PasswordStrengthValidationError.length, PasswordStrengthValidationError.missingNumber]))
value: "passw0rd" => error(PasswordValidationError.weak([PasswordStrengthValidationError.missingUppercase]))
value: "paSSw0rd" => ok("paSSw0rd")

In the End

This is really just a refinement of Scott’s approach. I think he started off with the right thought process. The biggest critique I have about his approach is in the end when he needed to use functions to encapsulate the construction the various composite validators. Using functions as constructors can often times mean you simply don’t have the right type abstraction yet. I do believe that is the case here, which is why I introduced the CompositeValidator protocol.

Update: Here’s a link to the full playground source code: https://gist.github.com/owensd/6a99ca908d3c13c8b19c9a42aaf3cd9d.

Composite Validators – Refined

Swift App Bundle Sizes

I’m looking at building a casual game for iOS and one of the extremely important things is to make sure that the app bundle size is as small as possible. I want users to be able to quickly download, and most importantly, not be blocked by the cellular download size limit, which is currently 100MB.

I did a directly listing of the Frameworks directory in an app bundle and was a little dismayed:

Filename                       Size (b)
------------------------------------
libswiftAVFoundation.dylib      254000
libswiftCore.dylib            14221040
libswiftCoreAudio.dylib         396048
libswiftCoreGraphics.dylib      543632
libswiftCoreImage.dylib         207888
libswiftCoreMedia.dylib         253072
libswiftDarwin.dylib            295808
libswiftDispatch.dylib         1093360
libswiftFoundation.dylib       5418720
libswiftGLKit.dylib             246576
libswiftGameplayKit.dylib       212160
libswiftObjectiveC.dylib        258544
libswiftSceneKit.dylib          250544
libswiftSpriteKit.dylib         250064
libswiftUIKit.dylib             348048
libswiftos.dylib                247696
libswiftsimd.dylib             1003200

That’s a total of 25,500,400 bytes (or approx 25 MB)!! Yikes!

Surely, that’s not really the size, right!? The xcarchive must be smaller do some thinning, yeah?

Filename                       Size (b)
------------------------------------
libswiftAVFoundation.dylib      451440
libswiftCore.dylib            45120816
libswiftCoreAudio.dylib         774336
libswiftCoreGraphics.dylib     1135952
libswiftCoreImage.dylib         257248
libswiftCoreMedia.dylib         450512
libswiftDarwin.dylib            542608
libswiftDispatch.dylib         2541392
libswiftFoundation.dylib      16854896
libswiftGameplayKit.dylib       345296
libswiftGLKit.dylib             310880
libswiftObjectiveC.dylib        488960
libswiftos.dylib                398624
libswiftSceneKit.dylib          348784
libswiftsimd.dylib              693568
libswiftSpriteKit.dylib         395776
libswiftUIKit.dylib            3010368

This is a total of 74,121,456 bytes (or approx 74 MB)… ugh, that’s a lot.

I reached out on Twitter to see who’s actually shipping Swift apps. If the app bundle size is going to really be between 25MB and 75MB for Swift support alone, that’s going to be a deal breaker for me.

Fortunately, all of these file sizes are just what the local size. The App Store is where all of the thinning is being done. The actual size bump for a shipping product with Swift support is much smaller:

Filename                       Size (b)
------------------------------------
libswiftContacts.dylib          147200
libswiftCore.dylib             7680608
libswiftCoreData.dylib          147568
libswiftCoreGraphics.dylib      268016
libswiftCoreImage.dylib         141568
libswiftDarwin.dylib            180576
libswiftDispatch.dylib          147104
libswiftFoundation.dylib        902016
libswiftObjectiveC.dylib        173168
libswiftUIKit.dylib             203408
libswiftWebKit.dylib            147392

Alrighty, so this is looking better, but still not great: 10,138,624 bytes (approx 10 MB). This of course is missing some of the frameworks that I was using above, like Swift support for AV Foundation, but seeing as libswiftCore is the primary culprit of the size, I think it’s safe to say that budgeting for 15 MB for Swift support should be sufficient.

This is still a large amount, but vastly better than I initially feared. I’m still not sure what I’m going to do, but at least I have a better ball park estimation.

Update:

I should also note that the App Store does compress your bundle as well. At the end, it’s really hard to know exactly how big your app bundle is going to be without actually publishing it up to the store. If anyone has any tips or tricks on that, please share on Twitter.

Swift App Bundle Sizes

Xcode, Frameworks, and Embedded Frameworks

So last week I spent the better part of the day trying to figure out what exactly was going when I was trying to build a component using SourceKittenFramework.

It turns out that not all frameworks are created equal in Xcode. Honestly, this wouldn’t be such a big deal if Swift properly supported static libraries, as the rabbit hole for this problem is rooted in a bunch of hacks to make command-line tools work properly with Swift that have dependencies.

I’ll provide a little story about my experience this week.

Embedded Frameworks

Xcode supports the concept of embedding frameworks into your bundle. This is essentially the same thing as the old “Copy Files” build phase where you can copy a dependency into your app bundle under a particular directly, such as “Frameworks”.

However, there is an extremely important distinction between the “Copy Files” build phase and the “Embed Frameworks” option.

The normal output of a framework looks like this:

├── Headers -> Versions/Current/Headers
├── Modules -> Versions/Current/Modules
├── MyFramework -> Versions/Current/MyFramework
├── Resources -> Versions/Current/Resources
└── Versions
├── A
│   ├── Headers
│   │   ├── MyFramework-Swift.h
│   │   └── MyFramework.h
│   ├── Modules
│   │   ├── MyFramework.swiftmodule
│   │   │   ├── x86_64.swiftdoc
│   │   │   └── x86_64.swiftmodule
│   │   └── module.modulemap
│   ├── MyFramework
│   └── Resources
│       └── Info.plist
└── Current -> A

This provides all of the necessary content to be able to use this framework both at runtime and as a developer-friendly framework; it has the headers and the module definitions necessary when building and linking against the library.

However, the frameworks that get embedded strip out all of that information and you end up with something like this:

├── MyFramework -> Versions/Current/MyFramework
├── Resources -> Versions/Current/Resources
└── Versions
├── A
│   ├── MyFramework
│   ├── Resources
│   │   └── Info.plist
│   └── _CodeSignature
│       └── CodeResources
└── Current -> A

This contains only the content required to be used at runtime. Now, it makes sense why Xcode would do this, after all, it’s being packaged up for use within a target so it’s already built. Also, this can help reduce the size of bundles by removing all of the information that simply isn’t necessary to work at runtime.

Had I only known this before…

Unexpected Outcomes

Now… the problem with all this of course is that when we start doing hacks to make thing work in the ever changing landscape of Swift and Xcode.

I ran into this when trying to use SourceKitten. As Swift doesn’t really have a good way to build testable command-line tools or static libraries, SourceKitten follows the pattern that a lot of other tools do: it builds an app target and then copies out the CLI tool and packages up its dependencies.

The start of the problems…

I don’t use Carthage or CocoaPods… but the reasons for that is outside of the scope for this. Needless to say, I simply clone the repo and ran make install as the ReadMe told me to do.

Everything is happily pulled down and everything is built properly. Then, SourceKitten is installed into my /usr/local path. Great!

Specifically, it creates a structure like this:

/usr/local/Frameworks/SourceKittenFramework.framework
/usr/local/bin/sourcekitten

The @rpath for sourcekitten is then set to @executable_path/../Frameworks.

There is nothing wrong with this setup. It works great.

However… remember, my intention is to now take SourceKittenFramework.framework, link it into my own app, and start converting my prototype that was shelling out to sourcekitten to directly use the API.

So I do what seems reasonable:

  1. Create my app
  2. Create a lib folder
  3. Copy the SourceKittenFramework.framework into lib
  4. Link the framework
  5. Add import SourceKittenFramework
  6. Build

And then I’m greeted with this:

<path/to/file:line:column> error: no such module 'SourceKittenFramework'
import SourceKittenFramework
       ^

Um… what!?

So I spend some time making sure my framework search paths are correct, inspecting the SourceKitten project, searching the web… no idea what is going on.

Ok… so I tried building from within Xcode and looking at the output of the SourceKittenFramework target itself. I still don’t understand the problem yet.

I copy over the version of SourceKittenFramework that is built from the target and I get this:

<path/to/file:line:col>: error: missing required modules: 'SWXMLHash', 'Yaml'
import SourceKittenFramework
       ^

Ok… what is happening!? I still haven’t figured out all of the specialness of “embedded frameworks” at this point.

I tried mucking with the framework search paths to point to the Yaml and SWXMLHash frameworks that I clearly see are within the frameworks, but nothing seems to be working… Including copying the Yaml and SWXMLHash frameworks to be siblings of the SourceKittenFramework.framework.

Ok… maybe source doesn’t work!? I download the framework from the releases page… SAME FLIPPING ERROR!

At this point, I’m quite frustrated, have no idea what is going on, and decide to call it for the day.

/ragequit

Taking Time

I come back the next day. OK, I’m going to get this to work!

I do, what I thought were mostly identical steps above.

Build success… ok, what the flying flip?

The key difference that I did this time was that I copied the Yaml.framework, SWXMLHash.framework, and SourceKittenFramework.framework from the target output of the SourceKittenFramework target.

At this point I was curious as to what had happened. This is where I started doing an analysis of what is different between each of the frameworks. See the “Embedded Frameworks” section above.

Conclusion

If you are providing frameworks to people that you expect to be able to develop with and not just use at runtime, please be sure to distribute the non-embedded framework version! Otherwise, well, all of your consumers will face the above issues.

SourceKitten tracking issue: https://github.com/jpsim/SourceKitten/issues/232

SourceKitten potential fix: https://github.com/jpsim/SourceKitten/pull/233

Xcode, Frameworks, and Embedded Frameworks

SE-0117 – The Proposal of Doom

Proposal SE-0117 is causing quite the ruckus. It’s actually a bit amusing to watch as people clamber against preventing inheritance by default.

In case you haven’t noticed… Swift really wants to move as much as possible to be statically defined as possible. This isn’t about “static is better” and “dynamic sucks”, it’s about the Core Team’s belief that being able to author code that the compiler can help you reason about is vastly safer than having code the compiler cannot reason about.

There are two very real and tangible side effects here:

  1. Resiliency
  2. Performance

Resiliency

By default, the code that Swift wants you to write wants to put you in a sport where you are not pigeonholed into a design that you do not want in the future. Today, when you create a class, if you forget to omit the final keyword, then that’s it! It’s a breaking change to later close down that class.

That is exactly the type of design error that Swift is trying to prevent when providing sensible defaults.

Performance

If the compiler can remove all of the virtual dispatching required in a dynamic type hierarchy, performance can get measurably better. Even with all of the optimizations of objc_msgSend and friends, it’s still magnitudes slower than static dispatch.

I get the argument: pragmatic programmers claim that they need the ability to work around bugs in frameworks that ship because developers make mistakes. Yes, they do… and code bases that use inheritance for this are writing hacks in their code. Should this be prevented? No… however, this proposal doesn’t actually say that it should be prevented either.

The problem is this: Swift is still in its infancy. Sure, version 3 is coming out soon, but really, let’s call it what it is closer to: version 0.3. It’s suitable for certain classes of applications, but it’s certainly not suitable for large-scale teams (100s of developers) to author code large swaths of code in it (I’m talking in the hundres of thousands or millions of lines of code here). Sure… you can, but it’s painful and expensive (e.g. see the upcoming Swift 2 to Swift 3 migration!).

What’s most important for Swift to get right up front is the underlying model. What’s the right set of defaults for everything to produce code that can benefit from tools to help analyze for correctness? Also, what are the right set of defaults to guard against code authors doing things wrong?

Taking a Step Back

Swift has two fundamental ways of expressing types: structs and classes. The primary difference is not that classes support inheritance and structs do not. The primary difference is that structs are value types and classes are reference types.

Swift could completely remove the idea of inheritance out of classes and they would still be important to have in the Swift type system.

Why is this important? Simple. The way in which we start to model future Swift types is not around do I need inheritance or not, it’s around do I need a raw value type, a reference type, or value semantics (usually a mixture of a struct API surface and an internal backing reference type store).

As an API author, if I need to have a value type, I cannot use inheritance. And more important, especially with regards to many of the arguments against this proposal, you cannot fix any issues with these struct types with inheritance either. It’s at this point that I find your arguments extremely weak: if it is so crucial that inheritance be enabled by default for class types, why are you not more concerned about Swift’s focus on value types and using value semantics for APIs? After all, there are pretty much no APIs in Swift’s libraries that you will be able to patch this way.

[Un]Safely Breaking Out

Let’s be honest here… the crux of the argument is that there is some error that has happened in the library you’re consuming and it’s either closed source or you are unable to modify the source for any number of reasons… so how do you fix it?

With unsafe code.

There’s absolutely nothing in this proposal that prevents Swift from providing tools to get you access to what you need. For example, imagine a world where you could download a developer Swift module that contains all of the unoptimized code.

This would allow you to write something like this:

class SuperHack : @unsafe YoDontSubclassMe {
	    @unsafe override dontTouch() {
		    // Implement your super hack
	    }
}

Why is this better? Well, I think it’s better for a variety of reasons:

  1. The code author was able to have vastly more performant code with their other design all consumers
  2. The specific use case that you needed to “fix” is clearly documented in your code as being unsafe; future maintainers will know this in a compiler verifiable way (e.g. not a comment)
  3. Future updates to a framework can easily be audited (e.g. all hacks can be used and logged against specific versions of frameworks)

So let’s maybe not get so melodramatic that providing no inheritance by default on classes is the end of the world?

SE-0117 – The Proposal of Doom

Looping with iterate() and takeWhile()

There’s a funny thing that happens when you remove a language construct that actually provides value: you need to re-invent ways to support that construct.

The proposal Add scan, takeWhile, dropWhile, and iterate to the stdlib provides a basic way to get back the lost functionality of the C-style for-loop, specifically with iterate() and takeWhile().

The key thing to remember for the implementation is that we must have a lazy version of iterate() in order for this to be semantically comparable to the C-style for-loop that is being replaced. Further, we need to be extremely careful when using the proposed takeWhile() (and other) extensions to be sure we’re getting the lazy versions when we need them.

So let’s look at what an implementation might look like (this is using Swift 2.2). We are going to want to replicate the following loop:

for var n = 1; n < 10; n = n \* 2 {
    print("\(n)", terminator: ", ")
}

This loop simply outputs: 1, 2, 4, 8, 

Ok, first we need to define the iterate function:

// Creates a lazy sequence that begins at start. The next item in the
// sequence is calculated using the stride function.
func iterate(initial from: T, stride: T throws -> T) -> StridingSequence

This is going to require that we return a SequenceType (this is renamed to Sequence in Swift 3). But remember, we want this to be lazy, so we really need to conform to the LazySequenceType protocol. That type is going to need to know the starting point and the mechanism to stride through the desired sequence.

struct StridingSequence : LazySequenceType {
    let initial: Element
    let stride: Element throws -> Element

    init(initial: Element, stride: Element throws -> Element) {
        self.initial = initial
        self.stride = stride
    }

    func generate() -> StridingSequenceGenerator {
        return StridingSequenceGenerator(initial: initial, stride: stride)
    }
}

Of course, now the StridingSequence is going to need the underlying GeneratorType implementation: StridingSequenceGenerator (the GeneratorType protocol is renamed to IteratorProtocol in Swift 3).

struct StridingSequenceGenerator : GeneratorType, SequenceType {
    let initial: Element
    let stride: Element throws -> Element
    var current: Element?

    init(initial: Element, stride: Element throws -> Element) {
        self.initial = initial
        self.stride = stride
        self.current = initial
    }

    mutating func next() -> Element? {
        defer {
            if let c = current {
                current = try? stride(c)
            }
            else {
                current = nil
            }
        }
        return current
    }
}

OK… this is getting to be a lot of code. But there’s going to be a big payoff, right?

What we have now is an infinite sequence. We can test it out like so:

for n in iterate(initial: Int(1), stride: \{$0 \* 2}) {
    if n >= 10 { break }
    print("\(n)", terminator: ", ")
}

At this point, we are pretty close to getting what we want. The last question is how to move the condition out of the body of the loop and into the for-loop construct?

We have two basic options:

  1. Add a while: parameter to the iterate() function, or
  2. Add a takeWhile() function that can be chained.

The proposal that I linked to earlier proposes to add a takeWhile() function. This is probably the “better” way to go given that we are creating a sequence and it’s feasible that we may want to do other operations, like filtering.

Unfortunately, this means a bit more code.

Let’s start with the extension to LazySequenceType:

extension LazySequenceType \{
    typealias ElementType = Self.Elements.Generator.Element
    func takeWhile(predicate: ElementType -> Bool)
        -> LazyTakeWhileSequence
    {
        return LazyTakeWhileSequence(base: self.elements, takeWhile: predicate)
    }
}

This requires us to create another sequence type that knows how to walk our original sequence type but stop when the given condition is met.

struct LazyTakeWhileSequence : LazySequenceType {
    let base: Base
    let predicate: Base.Generator.Element -> Bool

    init(base: Base, takeWhile predicate: Base.Generator.Element -> Bool) {
        self.base = base
        self.predicate = predicate
    }

    func generate() -> LazyTakeWhileGenerator {
        return LazyTakeWhileGenerator(base: base.generate(), takeWhile: predicate)
    }
}

And then this is going to require another generator type that can do gives us the next item in the sequence and nil after the condition is met.

struct LazyTakeWhileGenerator : GeneratorType, SequenceType {
    var base: Base
    var predicate: Base.Element -> Bool

    init(base: Base, takeWhile predicate: Base.Element -> Bool) {
        self.base = base
        self.predicate = predicate
    }

    mutating func next() -> Base.Element? {
        if let n = base.next() where predicate(n) {
            return n
        }
        return nil
    }
}

Whew! Now we can write this:

for n in iterate(initial: Int(1), stride: \{$0 \* 2}).takeWhile({ $0 < 10 }) {
    print("\(n)", terminator: ", ")
}

Of course, we could have just written this and been done with it:

for var n = 1; n < 10; n = n \* 2 {
    print("\(n)", terminator: ", ")
}

Summary

It’s honestly really difficult for me to take this approach to be objectively better, especially when I have to write the supporting library code ;). Yes, there are clearly benefits to an iterate() function that you can then perform different operations on, and maybe if I needed to perform some type of filtering with the above loop like so:

let items = iterate(initial: Int(1), stride: \{$0 \* 2})
    .filter({ $0 != 4})
    .takeWhile({ $0 < 10 })

for n in items \{
    print("\(n)", terminator: ", ")
}

I could see the benefit for this approach for some use cases. However, there are also objectively bad things about the approach above. For one, there is a crap ton of code that needs to be written just to get this to work, and I’m not done. I need to similar stuff for collection types and the non-lazy versions as well.

The other thing, I don’t find it any less cryptic. Sure, things are labeled a bit better, but there’s a lot more syntax in the way now (using an @autoclosure would be nice, but you cannot use anonymous variables like $0). In fact, it’s only after moving the iterate() code into its own line, do things start to become a bit more clear.

Anyhow, if you’re interested in how to implement this, it’s all here. And if there is actually an easier way, PLEASE let me know.

Full gist is here: iterate.swift.

Looping with iterate() and takeWhile()

APIs Matter

I asked a poll on Twitter today about API preference between two options (three if you count the updated version):

// the very verbose range-based loop
for n in 0.stride(through: 10, by: 2).reverse() {
    print(n)
}

// the more concise range-based loop
for n in 10.stride(through: 0, by: -2) {
    print(n)
}

// c-style loop
for var n = 10; n >= 0; n -= 2 {
    print(n)
}

And even earlier I wrote this blog article: For Loops and Forced Abstractions.

The primary point of the entry was about being forced into abstractions when they are not necessary.

One of the things that really bothered me were the examples in the Swift blog:

for i in (1...10).reverse() {
    print(i)
}

for i in 0.stride(to: 10, by: 2) {
    print(i)
}

In my opinion, those are really terrible APIs. In addition being arguably just as bad to visually parse as the c-style for-loop, they still do not convey the intent behind what is being done: they are supposed to be creating a range and only the first usage even comes close to looking like that. Not only that, there is no symmetry involved in incrementing and decrementing ranges.

For example, this is invalid in Swift: 10...0. So we have, what I would call, a broken and partial abstraction over the concept of “ranges” or “intervals”. Ironically, that’s exactly the API we need, especially when we are removing the c-style for-loop.

Let’s take a look at the Strideable protocol:

/// Conforming types are notionally continuous, one-dimensional
/// values that can be offset and measured.
public protocol Strideable : Comparable {
    /// A type that can represent the distance between two values of `Self`.
    associatedtype Stride : SignedNumberType
    /// Returns a stride `x` such that `self.advancedBy(x)` approximates
    /// `other`.
    ///
    /// - Complexity: O(1).
    ///
    /// - SeeAlso: `RandomAccessIndexType`'s `distanceTo`, which provides a
    ///   stronger semantic guarantee.
    @warn_unused_result
    public func distanceTo(other: Self) -> Self.Stride
    /// Returns a `Self` `x` such that `self.distanceTo(x)` approximates
    /// `n`.
    ///
    /// - Complexity: O(1).
    ///
    /// - SeeAlso: `RandomAccessIndexType`'s `advancedBy`, which
    ///   provides a stronger semantic guarantee.
    @warn_unused_result
    public func advancedBy(n: Self.Stride) -> Self
}

This seems fairly clear: it’s an abstraction over an item that can be incremented or decremented by some Self.Stride value. In addition, we can also determine the distance between two Stridable instances, so long as they share the same Stride associated type.

This is one layer of the abstraction onion, but OK. When applied to numeric types, this gives us the nice ability to add and subtract in a generic and type-safe manner.

The problem, in my opinion, is the extension:

extension Strideable {
    /// Returns the sequence of values (`self`, `self + stride`, `self +
    /// stride + stride`, ... *last*) where *last* is the last value in
    /// the progression that is less than `end`.
    @warn_unused_result
    public func stride(to end: Self, by stride: Self.Stride) -> StrideTo
}

WHAT!?

This makes absolutely no sense to me. I actually find this API really bad on multiple counts:

  1. Why does a type that is responsible for incrementing itself now have the ability to create a sequence of values?
  2. What definition of “stride” ever means “create a sequence”?
  3. The API has a variable named stride that has a different conceptual meaning altogether than the function withthe same name.

In my opinion, this is just a bad API. Further, this goes on to confuse matters at the call sites.

If we must get rid of the c-style for loops, then we need to look at what the alternative is: for-in.

So what is a for-in loop construct?

You use the for-in loop to iterate over a sequence, such as ranges of numbers, items in an array, or characters in a string.

Source: Swift Programming Language: Control Flow.

Great! So what we really want is the ability to create such a range with as little abstraction as possible. The stride API is attempting to do that, but it fails to do so in an appropriate matter.

Instead, we want an API that can be called like this:

for n in range(from: 10, to: 0, by: 2) {
}

And here’s what the signature looks like:

func range(
    from start: T, 
    to end: T,
    by step: T.Stride = 1) -> Interval

NOTE: Sure, there needs to be other variants to support open, closed, left-open, and right-open intervals, but that’s irrelevant

for this purpose.

Wait a minute… isn’t that the same as what stride() is today. Sure, except:

  1. range() is vastly more explicit in what is actually going on.
  2. Instead of tacking on to the Strideable protocol like a poor man’s side-car, it composes with it instead creating anAPI that is much more natural and expressive.
  3. Creates a much more natural call site.

I still don’t like the removal for the c-style for-loop, but thankfully, Swift v3 will be moving stride to be a free function again. It’s nice having a more “proper” API to work with out of the box.

Now to get it renamed to range

APIs Matter

For Loops and Forced Abstractions

In case you haven’t heard, the traditional c-style for loop has been deprecated and is slated for removal in Swift 3.0. More info about that can be found here: New Features in Swift 2.2.

I’m not a fan, at all.

The fundamental reason I’m not a fan is quite simple: the only way to write a for loop now is by leveraging abstractions. Personally, I really dislike being required to use abstractions when they are not necessary.

The defense I hear all the time is this:

Well, the compiler will close that gap or remove the abstraction cost all together.

That’s nice in theory, but it’s patently false in practice. The optimizer can remove some of the abstractions, but it cannot guarantee to remove all of the cost of the abstraction every time.

Here’s the real-world cost of abstractions (not necessarily specific to just this for-loop construct):

Language: C, Optimization: -Os                                          Avg (ms) 
---------------------------------------------------------------------------------
RenderGradient (Pointer Math)                                              9.582 
RenderGradient (SIMD)                                                      4.608 

Language: Swift, Optimization: -O                                       Avg (ms) 
---------------------------------------------------------------------------------
RenderGradient ([Pixel])                                                22.51406 
RenderGradient ([UInt32])                                               18.39304 
RenderGradient (UnsafeMutablePointer)                                   20.67769 
RenderGradient (UnsafeMutablePointer<UInt32>)                           15.29333 
RenderGradient ([Pixel].withUnsafeMutablePointer)                       22.51703 
RenderGradient ([UInt32].withUnsafeMutablePointer)                      19.27868 
RenderGradient ([UInt32].withUnsafeMutablePointer (SIMD))               15.63351 
RenderGradient ([Pixel].withUnsafeMutablePointer (SIMD))                24.48129 

Source: https://github.com/owensd/swift-perf/blob/swift-v3/reports/swift_3_0-march.txt

At best, under an optimized build, we’re looking at a 4x cost in performance. With unchecked builds, it’s possible to get the performance down to equivalent timings. With non-optimized builds, we are talking anywhere from 3 to 88 (!!) times slower than the equivalent C code.

It’s not that I don’t think that the for-in style loop isn’t useful. I do. I also completely agree that it should be the one used the majority of the time. However, please don’t force me to use abstractions when I don’t want to or when they are not appropriate.

Here’s the before and after with the upcoming changes of some real code:

for var y = 0, height = buffer.height; y < height; ++y {
    let green = min(int4(Int32(y)) &+ yoffset, 255)

    for var x: Int32 = 0, width = buffer.width; x < width; x += 4 {
        let blue = min(int4(x, x + 1, x + 2, x + 3) &+ xoffset, 255)

        p[offset++] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        p[offset++] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
    }
}
for y in 0..<buffer.height {
    let green = min(int4(Int32(y)) &+ yoffset, 255)

    for x in 0.stride(to: buffer.width, by: 4) {
        let x32 = Int32(x)
        let blue = min(int4(x32, x32 + 1, x32 + 2, x32 + 3) &+ xoffset, 255)

        p[offset] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
        offset += 1
    }
}

I personally don’t consider that a readability win.

  1. It’s more code.
  2. It requires type coercion for x as the Int32 type isn’t stridable.
  3. stride(to:by:) is ambiguous compared the < operator.

And finally, this is not an acceptable alternative in my opinion:

var y = 0
var height = buffer.height
while y < height {

    var x: Int32 = 0
    var width = buffer.width
    while x < Int32(width) {
        let x32 = Int32(x)
        let blue = min(int4(x32, x32 + 1, x32 + 2, x32 + 3) &+ xoffset, 255)

        p[offset] = Pixel(red: 0, green: green.x, blue: blue.x, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.y, blue: blue.y, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.z, blue: blue.z, alpha: 255)
        offset += 1

        p[offset] = Pixel(red: 0, green: green.w, blue: blue.w, alpha: 255)
        offset += 1

        x += 4
    }

    y += 1
}

Why you might ask?

  1. It’s extremely easy to forget the increment (I actually did a few momements ago).
  2. The iterator variables are being leaked out of scope.
  3. All of the loop parts (initialization, condition, and increment) are scattered throughout the construct.
  4. The suggested pattern of using defer for incrementing is fundamentally flawed:
var i = 0
	while i < 10 \{
    defer { i += 1 }
    if i == 5 { break }
}
	print(i)

What do you think i is here? What should it be? I’ll give you a hint, they aren’t the same answer.

Yes, the above examples are narrow and specific. But that’s exactly the point. When we need to write for narrow and specific cases, that’s exactly when we need to get outside of the abstraction box that makes for simpler code.

It’s strange watching Swift evolve. Maybe I’m just dense or stuck in my old ways, but I can’t see how this change is aligned with one of Swift’s aspirations of being a systems-level language.

For Loops and Forced Abstractions

Access Control Modifiers Proposal Thoughts

The big thread these days on swift-evolution is regarding access control modifiers. Swift supports a fairly limited set today, namely:

  • public: visible outside of the module
  • internal: visible within the module
  • private: visible within the file

There is a proposal, SE-0025: Scoped Access Level, that wants to add another layer to the mix: lexical scoping (I’ll use local in the example for it).

struct Outer {
    local let scopeVisible: Int
    private let fileVisible: Int
    func f() {
        /* scopeVisible is accessible here */
        /* fileVisible is accessible here */
    }
}

let o = Outer()
/* o.scopeVisible is not accessible here */
/* o.fileVisible is accessible here */

Today, in Swift, there is no way to provide this type of scoping mechanism within the same file.

My argument is that this proposal should be rejected, the Core Team thinks otherwise:

To summarize the place we’d like to end up:

  • “public” -> symbol visible outside the current module.
  • “internal” -> symbol visible within the current module.
  • unknown -> symbol visible within the current file.
  • “private” -> symbol visible within the current declaration (class, extension, etc).

The problem, as I see it, is that this is simply a one-off fix for one of the limitations for access modifiers today. There are other, arguably reasonable, asks for access control modifiers as well:

  • visibility to only extensions declared in the same file
  • visibility to only extensions
  • visibility to subclasses
  • visibility to specific functions or types (e.g. C++’s friend)

We could get even more fine grained as well:

  • visibility to only a specific set of modules
  • visibility within a specific submodule

I don’t think that these are all necessarily bad, in fact, some can be quite helpful. However, instead of just accepting this proposal and adding this specific change, I’d rather see the entire access modifier system to be revisited because this doesn’t really fix much, it just moves the problem.

The example used is this (where local means lexical scope):

class A {
   local var counter = 0

   // API that hides the internal state
   func incrementCount() { ++counter }

   // hidden API, not visible outside of this lexical scope
   local func advanceCount(dx: Int) { counter += dx }

   // incrementTwice() is not visible here
}

extension A {
   // counter is not visible here
   // advanceCount() is not visible here

   // may be useful only to implement some other methods of the extension
   // hidden from anywhere else, so incrementTwice() doesn’t show up in 
   // code completion outside of this extension
   local func incrementTwice() {
      incrementCount()
      incrementCount()
   }
}

Ok, so this addresses the problem that counter is not meant to be visible outside of type A. Maybe it was unintentionally being leaked. However, if counter is required to be used within one of the extensions, say a reset() function, then counter needs to be promoted to the file-based one. However, by doing so, we are again leaking counter to be more visible than it is supposed to be. So what was really the point?

At the end, if the current access modifiers are not sufficient because they are “too leaky”, then this proposal doesn’t fix the root problem. If the root problem is really sufficient enough, then I would think all of the modifiers should be revisited to provide the fine-grained access control system that is really being asked for.

Of course, I’m also just fine with the three we have and not trying to add all of the complexity required.

If you have thoughts on it, be sure to contribute here: http://thread.gmane.org/gmane.comp.lang.swift.evolution/12183/focus=12219

Access Control Modifiers Proposal Thoughts

Tooling Around – Testing in Swift

For those that don’t know, a few of us are working on a set of tools for building for Swift. As part of that work, I’ve been thinking about how some unit tests could be done in a much simpler way. D does something interesting for unit tests; it allows you to define them inline and have them runnable at build time. Pretty cool, though D’s implementation is a bit limited.

class Sum
{
    int add(int x, int y) { return x + y; }

    unittest
    {
        Sum sum = new Sum;
        assert(sum.add(3,4) == 7);
        assert(sum.add(-2,0) == -2);
    }
}

If we had the ability to create custom attributes in Swift (ok… this feature really requires custom attributes and compiler plug-ins), I was thinking that I could build something like this:

class Sum {
    func add(x: Int, _ y: Int) -> Int { return x + y }
}

@test("Sum", "add(_:_)", "checkin") {
    let sum = Sum()
    assert(sum.add(4, 5) == 9, "Math is hard!")
    assert(sum.add(-3, 3) == 0)
}

The intent is that this provides us with more functionality than what D offers, namely the ability to filter test cases by a number of factors including type, function names, test type (e.g. checkin), or any other text-based qualifier you want. Also, since it was an attribute, we could easily strip out these code paths if a flag, say -enable-testing, wasn’t used.

So, to run all of the checkin tests, you’d do something like this (assume we had some tool run-tests that is magical for now):

$ run-tests -match "checkin"

This would let us find all of the @test items with checkin as part of the metadata and run them.

Ok… that’s great, but Swift doesn’t allow us to create these attributes… so all hope is lost, right?

Nope, we can hack around to get what we want. =)

Instead, let’s do this:

class Sum {
    func add(x: Int, _ y: Int) -> Int { return x + y }
}

func __test_sum_add_checkin() throws {
    let sum = Sum()
    assert(sum.add(4, 5) == 10, "Math is hard!")
    assert(sum.add(-3, 3) == 0)
}

The idea is fairly simple:

  1. Build a static library of your module that you wish to test; make sure the -enable-testing flag is set. 2. For each Swift file with methods following our convention (top-level functions that start with __test_), create an executable that calls that function. 3. Run the executable.

Boom! Integrated unit tests.

Digging In

I’m using our build tool, but you can probably do something similar with Swift’s Package Manager.

Here’s the contents of my build file:

(package
  :name "IntegratedUnitTests"

  :tasks {
    :build {
      :tool "atllbuild"
      :sources ["Sum.swift"]
      :name "math"
      :output-type "static-library"
      :publish-product true
      :compile-options ["-enable-testing"]
    }

    :test {
      :dependencies ["generate-test-file"]
      :tool "atllbuild"
      :sources ["sum_test.swift"]
      :name "sum_test"
      :output-type "executable"
      :publish-product true
      :link-with ["math.a"]
    }

    :generate-test-file {
      :dependencies ["build"]
      :tool "shell"
      :script "echo '@testable import math' > sum_test.swift && xcrun -sdk macosx swiftc -print-ast Sum.swift | grep __test | sed 's/internal func/try/g' | sed 's/throws//g' >> sum_test.swift"
    }

    :run {
      :dependencies ["test"]
      :tool "shell"
      :script "./bin/sum_test"
    }
  }
)

The build task is responsible for creating the math.a static library. The test task is responsible for creating the test executable. The generate-test-file task actually does creates the source code for the test executable. It does the following:

  1. Creates a new file named sum_test.swift 2. Appends @testable import math to it 3. Examines the AST for Sum.swift and adds the calls for our test methods.

The final file looks like this:

@testable import math
try __test_sum_add_checkin() 

And when you run it:

assertion failed: Math is hard!: file Sum.swift, line 7

Yay! Inlined test code.

This is just a preview. I plan on flushing this out some more, but I thought it was interesting enough to post about. =)

Tooling Around – Testing in Swift

Sad State of Enums

Enums… those lovely little beasts of many uses. I really do like associated enums. Well, at least, I really like the idea of associated enums.

The problem: they really suck to work with.

Take for example you simply want to validate that an enum you got back is a specific enum case.

enum Simple {
    case SoEasy
    case Peasy
}

func simple() -> Simple { return .SoEasy }

func testSimple() {
    assertme(simple() == .SoEasy)
}

This is a cake walk with normal enums. But…

enum DoYou {
    case ReallyWantToHurtMe(Bool)
    case ReallyWantToMakeMeCry(Bool)
}

func doyou() -> DoYou { return .ReallyWantToHurtMe(true) }

func testChump() {
    assertme(doyou() == .ReallyWantToHurtMe)
}

GAH! Ok…

func testChump() {
    assertme(case .ReallyWantToHurtMe = doyou())
}

Oh… the case syntax isn’t a real expression…

func testChump() {
    if case .ReallyWantToHurtMe = doyou() { assertme(false) }
}

Yeah… that’s really less than ideal.

This is where I just get really freaking tired of working with associated enums and I do one of two things:

  1. Convert the associated enum into a struct that holds the values and a enum that is just the simple types. 2. Add getters for every case that returns an optional.

The first option has the severe limitation of only really working when the cases hold the same data types or nearly the same. It’s also a bit more annoying.

The second option is just super annoying to have to do. It signals a significant design issue with them. It’s also just a waste of time as well.

So this is what I do:

enum DoYou {
    case ReallyWantToHurtMe(Bool)
    case ReallyWantToMakeMeCry(Bool)

    var reallyWantToHurtMe: Bool? {
        if case let .ReallyWantToHurtMe(value) = doyou() { return value }
        return nil
    }

    var reallyWantToMakeMeCry: Bool? {
        if case let .ReallyWantToMakeMeCry(value) = doyou() { return value }
        return nil
    }
}

func testChump() {
    assertme(doyou().reallyWantToHurtMe != nil)
}

/cry

Sad State of Enums