Looking at my JSON Parsing Performance

I’ve been working on my JSON parser lately in the hopes to fix two major issues:

  1. Correctness
  2. Performance

A lot of the design of the parser was influenced by early Swift limitations, so I was able to go through and get rid of a bunch of the weird boxing and internal backing stores I needed to use back then. Sadly, removing those didn’t really help performance much.

However, there is one piece that is hit by pretty much every part of the parser all of the time: my ReplayableGenerator type. The idea of this was that I’d simply call next() and replay() in the parsing code to remove the a lot of that logic out.

The current implantation requires a Sequence. This is fine except that the way I had things setup, I needed to turn the string into an array of UInt8. It turns out, that is relatively expensive. Even when creating the generator by using string.utf8 and using that iterator directly, performance was still 10x worse than JSONSerialization.

Uck!

All was not lost though! Instead of using a Sequence.Iterator to back my ReplayableGenerator, I figured I’d just straight up use an UnsafeBufferPointer<UInt8>.

Results:

NSJONSerialization:
performance results: min: 0.0126, max: 0.0215, avg: 0.014

JSONLib:
performance results: min: 0.0364, max: 0.050, avg: 0.0392

Yay! Getting there. There is still more work to be done and some correctness issues to work out, but getting happier with things now.

Just one more quick thing to note: one of the biggest perf gains was changing how I was getting the string content.


let data = string.data(using: String.Encoding.utf8, allowLossyConversion: false)!
return try data.withUnsafeBytes { (ptr: UnsafePointer<UInt8>) -> JSValue in
let buffer = UnsafeBufferPointer(start: ptr, count: data.count)
let generator = ReplayableGenerator(buffer)
let value = try parse(generator)
try validateRemainingContent(generator)
return value
}

One thing I maybe should have tried, but forgot to, was getting a lazy string.utf8 back. That might have made some difference.

Looking at my JSON Parsing Performance

Making Mistakes: print()

I’m implementing a Swift version of the Language Server Protocol. The way that it integrates within Visual Studio Code (VS Code) is via stdin and stdout. That’s all fine and dandy. It also makes uses of a modified JSON-RPC message construct for its communication.

While testing out my server’s ability to handle commands coming in from stdin, I was simply using print() to output the response message.  Anyhow, input a message, and the output was working great.

However, when I went to test it within VS Code, I would get the initialize request, send a message back, and nothing. What I expected to have happen was for VS Code to start sending me more messages.

So what was the problem? I honestly had no idea.

Problem #1: From the LSP spec, it isn’t immediately obvious what the response messages should look like. Should it include the message header? Should it just have the JSON-RPC part? Is my message even formatted correctly? The spec calls for \r\n instead of \n, did I mess that up?

I go through and validate the message and output in all of the different permutations I can think of, but nothing. After spending some time digging around other LSP implementations, I come to the conclusion that I am indeed sending back the right message format, so what could it be?

Problem #2: Esoteric history and undocumented (or implied) behavior.

Ok, so if the message format is correct, maybe the output isn’t actually working as it looks like it is. So I run mkfifo output, mkfifo input, and tail output. Let’s see what is happening.

Running cat initialize.lsp > input (my saved message content for an initialize request) gets my language server to handle the message, but no output.

images

It turns out that Swift’s print() simply routes to the underlying stdio output. Which, if you don’t know, does buffered or unbuffered output depending on what its actually being output to. In the case of the console, it’s output immediately. In the case of a file descriptor, it’s buffered.

Solution (temporary):

setbuf(stdout, nil)

It’s temporary because I actually need to write the proper version of the output code to ensure that I’m writing the content correct and only the number of bytes that are specified in the response message.

Retrospective

Here’s the thing, I actually knew about how stdio buffers it’s output. However, when I looked at the print() documentation, I simply became complacent and assumed since it didn’t mention buffering that it indeed immediately wrote the content out. Later testing, of course, would prove otherwise.

 

The problem here is somewhat systemic of our programming culture. It came from a combination of unclear documentation (from two sources, nonetheless), and an assumption of knowledge that, even if the person knows, can forget to apply in certain contexts.

Hopefully this radar gets fixed:

Making Mistakes: print()

travis-ci + swift + os/linux config

Since I was having such a hard time getting this working, I figure I’ll put this up here for everyone else too.

I have a project that I want to build for both Linux and OS X using Swift on travis-ci.org.

First you’ll need a proper .travis.yml file:


matrix:
include:
os: linux
dist: trusty
sudo: required
language: cpp
os: osx
osx_image: xcode8.3
language: objective-c
sudo: required
script:
swift build
swift test
before_install:
chmod ugo+x ./Scripts/InstallSwift.sh
. ./Scripts/InstallSwift.sh
notifications:
email:
on_success: never
on_failure: change

view raw

.travis.yml

hosted with ❤ by GitHub

This is what is working for me. The super important part is the disttrusty and sudo: required. Without those for the linux build, you won’t get Ubuntu 14.04 and thus won’t be able to run the Swift compiler.

The InstallSwift.sh script looks like this:


#!/bin/bash
if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
DIR="$(pwd)"
cd ..
export SWIFT_VERSION=swift-3.1.1-RELEASE
wget https://swift.org/builds/swift-3.1.1-release/ubuntu1404/${SWIFT_VERSION}/${SWIFT_VERSION}-ubuntu14.04.tar.gz
tar xzf $SWIFT_VERSION-ubuntu14.04.tar.gz
export PATH="${PWD}/${SWIFT_VERSION}-ubuntu14.04/usr/bin:${PATH}"
cd "$DIR"
else
export SWIFT_VERSION=swift-3.1.1-RELEASE
curl -O https://swift.org/builds/swift-3.1.1-release/xcode/${SWIFT_VERSION}/${SWIFT_VERSION}-osx.pkg
sudo installer -pkg ${SWIFT_VERSION}-osx.pkg -target /
export TOOLCHAINS=swift
fi

view raw

InstallSwift.sh

hosted with ❤ by GitHub

It’s not a perfect setup, but it was enough to get me up and running. Hopefully it’s helpful to others in the same boat as me.

travis-ci + swift + os/linux config

Build Times

I’m finally getting back to fixing up a few things in my json-swift project, but this makes me sad:

9:53:02 › time swift build
Compile Swift Module 'JSONLib' (10 sources)
swift build 3.69s user 1.03s system 120% cpu 3.918 total

9:53:12 › wc -l Sources/JSONLib/*.swift
 55 Sources/JSONLib/Error.swift
 56 Sources/JSONLib/Functional.swift
 89 Sources/JSONLib/JSValue.Accessors.swift
 38 Sources/JSONLib/JSValue.Encodings.swift
 50 Sources/JSONLib/JSValue.ErrorHandling.swift
 68 Sources/JSONLib/JSValue.Indexers.swift
 74 Sources/JSONLib/JSValue.Literals.swift
 671 Sources/JSONLib/JSValue.Parsing.swift
 271 Sources/JSONLib/JSValue.swift
 67 Sources/JSONLib/ReplayableGenerator.swift
 1439 total

While it may not seem like 3.69s is not a long time… add in compiling and running the unit tests and we are at 7.66s. This is a trivially sized project and the pain of fast iteration loops stinks here.

Build Times

Bad APIs

I write a lot of C++ code at my day job, so often I don’t want to do that for my hobby projects. There’s a lot of complexity in C++, no doubt. However, I think many of troubles of C++ are simply bad API designs.

As just an example, let’s look at the std::map API. I mean, really?


auto nameit = req.parameters.find("name");
if (nameit == req.parameters.end()) {
// parameter not found
}
else {
// yay, get the name out!
auto name = nameit->first;
}
if (req.parameters.count("name") == 1) {
// there is an item here
}
else {
// there is not
}

view raw

stdmap.cpp

hosted with ❤ by GitHub

OK… so there are some things we know about a map, but the most important is that the key values are unique. That is, if “name” exists, there will be one and only one of them.

So what the flip is up with the iterator stuff? And why on earth is there a count() function instead of just a contains() function?

I’m sure there is some explanation for it, but I’m not terribly interested in it. As a consumer of the API, I don’t really care why you make my life more challenging. I want to work with nice, clean APIs.

Swift vs. C++

Part of what make Swift look more appealing is the clean syntax and sometimes better (though, there are many examples where this isn’t the case today) API interfaces. Let’s take a look at a snippet of a Vapor project I’m toying around with:


let drop = Droplet()
drop.get("/v1/api/spells/:name") { req in
let name = try req.parameters.extract("name") as String
return SpellRouter.api(name: name).json
}
drop.get("/spells/:name") { req in
let name = try req.parameters.extract("name") as String
let partial = req.query?["partial"]?.bool ?? false
let res = SpellRouter.get(name: name)
return partial ? res.partialHtml : res.html
}
drop.run()

view raw

vapor.swift

hosted with ❤ by GitHub

That’s some pretty reasonable code. Maybe I’d change a few things, but there’s nothing that makes me go, “what the heck is wrong with you?”

Compare that to what it might look like in straight up C++:


fws::service service;
service.get("/v1/api/spells/:name", [](fws::http_request req) {
auto nameit = req.parameters.find("name");
if (nameit == req.parameters.end()) {
// bad request
return fws::http_response();
}
else {
auto name = nameit->first;
return spells::api(name);
}
});
service.get("/spells/:name", [](fws::http_request req) {
auto nameit = req.parameters.find("name");
auto partialit = req.query.find("partial");
if (nameit == req.parameters.end()) {
// bad request
return fws::http_response();
}
bool partial = false;
if (partialit != req.query.end()) {
auto value = partialit->first;
partial = (value == "true" || value == "yes");
}
return spells::view(nameit->first, partial);
});
service.run();

view raw

server1.cpp

hosted with ❤ by GitHub

I mean… it’s not that terrible, but there is a lot of WTF? in there. The biggest problem is that these APIs cause the code to really switch context from the get() handling to the details of parsing out data. That’s not cool.

Better APIs

We can fix this though and create something that is much more easier to follow and use:


fws::service service;
service.get("/v1/api/spells/:name", [](fws::http_request req) {
auto name = req.parameter("name");
if (!name) return fws::http_response();
return spells::api(*name);
});
service.get("/spells/:name", [](fws::http_request req) {
auto name = req.parameter("name");
auto partial = fws::asbool(req.query("partial"));
if (!name) return fws::http_response();
return spells::view(*name, partial);
});
service.run();

view raw

server2.cpp

hosted with ❤ by GitHub

This would be the same exact number of lines of code as the Swift version if I change the Swift version to not throw on an error trying to get the parameter. But more importantly, we didn’t lose anything in the C++ version. However, we did gain code that is much easier to read and maintain.

Build better APIs, your code and coworkers will thank you for it.

Bad APIs

Extensions and Categories

There’s a lot of talk about how extensions are used for code organization. This is even one of the primary defenses for some of the Swift proposals. However, it’s missing a key component of organization: categorization.

If you remember when dinosaurs roamed the Earth, they wrote code to maneuver their space ships with a language that must have been from aliens: Objective-C.

If you wanted to extend a type and provide yourself some new goodies, you would do so like this:

@implementation Raptor (BirdOfPrey)
// new cool stuff here
@end

In Swift, however, it’s not as clear what we should do.

Comments

Ok, this is the easiest and most straight-forward way do it. It’s close to what we had before. But as a comment, it has a bit of disconnect from the type. If you’re like me, you’ve also trained yourself to skim over comments like ads on a website.

extension Raptor /* BirdOfPrey */ {
    // new cool stuff here
}

Protocols

Now, of course, I think the most illustrious Swifters would say, “Obvious: use a protocol.”

Yep, you could do that:

protocol BirdOfPrey {}
extension Raptor: BirdOfPrey { /* ... */ }

And that’s not too bad, but you’ll implement it one of two ways:

  1. An empty protocol as you simply don’t care about duplicating the signatures for a one-off categorization mechanism.
  2. A full on protocol, because you’re a good little dinosaur offspring!

However, unless you’re going to be reusing these protocols, both are simply noise that you’re adding to the system that is supposed to simply help you organize your code.

Typealias

Another approach that was mentioned on Twitter:

Honestly, I’d not thought of this approach.

typealias RaptorBirdOfPrey = Raptor
extension RaptorBirdOfPrey {
    // new cool stuff here
}

I’m not personally a fan of this as it confuses the type some and introduces just as much noise as the empty protocol, but with more downsides, in my opinion.

Do Nothing

Of course, this is also an option too. Simply do nothing. Sadly for my code, this is often what I end up doing.

Wait A Minute…

I know what some of you are thinking… didn’t we have to define these categories in a .h file so we could use them?

Yes. If you wanted to use them outside of the file, that’s 100% correct. Thanks C.

But! Swift doesn’t have header files so we should not have to be burdened with all this duplicate declaration stuff… but, in this case, we still kind of are if we want to go the protocol route.

It would be great if we could just do this:

extension Raptor (BirdOfPrey) {
    // new cool stuff in our implicit protocol: BirdOfPrey
}

 

Extensions and Categories

There are levels of survival we are prepared to accept

Regarding the Mac Pro announcement and my current hobby endeavors.

 

the-architect-matrix-there-are-levels-of-survival-we-are-prepared-to-accept.jpg

There are levels of survival we are prepared to accept…

I’m not talking Hackintosh here, because, well, I don’t really want to mess with all that. What I am talking about is something much more abhorrent!

swift-windows-ubuntu.PNG

Yes… that is Visual Studio Code running on Windows 10 with Vapor running on the oh so lovely named Bash on Ubuntu on Windows or Bash on Windows Subsystem for Linux or Bash/WSL for short.

It’s not terribly pleasant, but it works.

Apple, please don’t let us down.

There are levels of survival we are prepared to accept

Band-Aids: Swift, fileprivate, and #169

A bit of a rant…

Well… I was pretty disappointed that proposal #159 – Fix Private Access Levels was voted down. But when #169 – Improve Interaction Between Private Declarations an Extensions was approved… seriously?

Why? Why another band-aid? Why another special case rule that I need to learn about? Why introduce more inconsistency in the language?

I just cannot get behind a change where I need to explain to someone why this won’t work:


struct A {
private var foo: Int = 10
}
extension A {
func bar() -> Int {
return self.foo // this will work after #169
}
}
class C {
private var foo: Int = 10
}
class D: C {
func bar() -> Int {
return self.foo. // this will be a compiler error after #169 still
}
}

view raw

access.swift

hosted with ❤ by GitHub

Or why if you want to move an extension from one file to another, you need to change from “private” to “internal“.

See, access control is still deficient for proper modeling if we revert back to Swift v2’s model, but at least there were no special rules to consider.

“Well, you see, private only allows access to those private bits if you’re in the same file.”

“Oh, well, then what does fileprivate do?”

“Well… it allows you to access that from anywhere within the file.”

“Wait, so what’s the difference?”

“You see, private allows only the extensions to see, but not everything else like other types or free functions.”

“Oh, cool, so I’ll just use that to poke into the private stuff in this class I’m subclassing too…”

“Um, actually… you can’t do that.”

“I thought extensions…”

“Right, subclassing isn’t an extension.”

“Ok. So how do I make those private bits only available to my subclasses in this file and not the other functions and types?”

“You can’t.”

“Wait, what? So the private keyword only has this special meaning in the context of protocol extension?”

“Yes.”

Good times.

P.S. Don’t forget about the already existing special case for private on internal types… 🙄

 

Band-Aids: Swift, fileprivate, and #169

My Thoughts on Server-Side Swift: Routing

The folks over at objc.io put out another high quality video, this time on Server-Side Swift: Routing. I like the high level goal they are going for, but I actually approach the problem differently.

To me, the key of setting up good architecture is leveraging what you are given while keeping the boundaries between the libraries or frameworks that you are using and your own code.

For example, I’m working on a server-side Swift project for D&D. The full conceptual flow looks like for the scenario of a user looking up a spell with a given name:

  1. User makes a GET request to /spells/burning-hands
  2. Vapor looks up the route and invokes the handler
  3. Handler does conversions and passes to my own SpellRouter.get() function
  4. This function handles the non-Vaporized data calling into my lookup code: Spell.lookup(name:)
  5. Above function returns an array of [Spell] items
  6. The [Spell] is converted into an SpellCard which serves as an abstract HTML representation
  7. That gets rendered into raw HTML
  8. The raw HTML makes it back to the Vapor route handler
  9. A Vapor Response is returned
  10. The user sees the HTML page

It looks like a lot of steps, but it has the following benefits:

  1. Each layer is agnostic of the layer below it. That is, my logic layer knows nothing about Vapor, HTML, or HTTP; it only knows how to perform spell lookups based on strongly-typed parameters, this case: name.
  2. The boiler plate is essentially nonexistent.
  3. It maintains flexibility without forcing a rigid structure. Swapping out Vapor for another web framework is a matter of writing the conversion functions.
  4. It scales well to multiple rendering needs.

Let’s go over some code to help make things more clear.

Server Hookup

drop.get("/spells", String.self) { req, name in
    SpellRouter.get(name: name).html
}

drop.get("/api/spells/", String.self) { req, name in
    SpellRouter.api(name: name).json
}

The variable drop is a Vapor Droplet instance. This code is in my main.swift file. A couple of things to note:

  1. I do not abstract the path (e.g. /spells) out somewhere else. The server is what needs to know about paths, so this is the level it should exist at.
  2. Vapor pulls out the first parameter as a String because of my usage of the second parameter in the get() function.
  3. In my case, there is very little conversion that I need to go from Vapor specifics to my own routing code, just handling the name retrieval, which is done above.

The only other Vapor specific items are the html and json extensions that I have on my own HtmlResponse and JsonResponse types that get() and api() return from the SpellRouter type.

They look like this:

extension HtmlResponse {
    var html: Response {
        return Response(status: .ok,
                       headers: ["Content-Type": "text/html"],
                          body: self.element.html)
    }
}

extension JsonResponse {
    var json: Response {
        return Response(status: .ok,
                       headers: ["Content-Type": "text/json"], 
                          body: self.json)
    }
}

The Response class is a Vapor type. But that’s it, the main.swift file here is the only place Vapor is referenced.

Spell Routing

For my routing, I implement the same get() and post() style APIs. This makes the translation process fairly straight forward.

The routing function looks a bit like this:

static func get(name: String) -> HtmlResponse {
    let translatedName = decode(name: name).lowercased()
    let spells = Spell.lookup(name: translatedName)
    if spells.count != 1 {
        return InvalidHtmlResponse(message: "No spells were found with the name: '\(name)'")
    }

    let spell = spells[0]
    return SpellCard(spell: spell)
}

Basically, and incoming request might look like this: /spells/burning-hands. The decode() function converts the name from “burning-hands” to “burning hands”. Then I call into my spell DB via the lookup(name:) function. And in this case, I only return the first spell found via a SpellCard class that is an abstracted HTML view of the card. The api() function returns a SpellApi instead.

In the End

Anyhow, I prefer this approach over the enum-style approach that was described. For one, it makes handling different types of requests (GET and POST for example) at the same URL extremely easy. They’ll have to encode that data somehow into their enum to handle it.

I’m interested to see how they progress through the problem though to handle more of the necessary aspects.

My Thoughts on Server-Side Swift: Routing

Why No External Templates?

I was asked on Twitter:

Could you expand in why [Swiccup] versus a template language?

I gave a brief answer there, but I think it’s worth talking about a bit more. Many of the HTML templating languages out there are simply trying to solve the problem that crafting HTML in code (or crafting HTML in general) can be a real pain.

As an example, the desired output might be this:

<b>resque</b>
<b>hub</b>
<b>rip</b>

You have the data structure of:

"repo": [
  { "name": "resque" },
  { "name": "hub" },
  { "name": "rip" }
]

Using HTML libraries in code are generally a huge pain. But if you can have a file that simply describes this as (in mustache):

{{#repo}}
  <b>{{name}}</b>
{{/repo}}

Or in Leaf, maybe it looks like this:

#loop(repo, "name") {
  <b>#(name)</b>
}

This is so much easier to read, understand, maintain, and author than authoring this by hand which might look something like:

for repo in repos {
  output.writeln("<b>\(repo)</b>")
}

Of if you’re using a typical HTML library:

for repo in repos {
  parent.addChild(HtmlBoldElement(repo))
}

The templates are just better.

So what’s the problem?

The problem is that they still come at a cost. Maybe times we’re moving from a typed system to an untyped system, so typos are easy to have happen. Most of the time there are no tools to help ensure that the templates are actually structured properly. This makes everything a runtime validation check.

The other problem is that we are introducing another language to understand, and in many normal use cases, figure out the interop between the host and the templating language for custom formatting or serialization of data.

The question is then: is there some tradeoffs that we can make to get the majority of the benefits from both sides? That is, more type safety and language similarity without the horrendous code that we typically write when authoring our templates in code?

Yes, but you need a language that is “powerful” enough to be able to create mini-DSLs.

Instead of writing the #loop from above, we could do something like:

repros.map { name in
  b |> name
}

Now we can leverage many of the benefits of using our compiled language while still getting the the vast majority of the benefits from our template system. The really awesome thing is that extensibility is already built in. If you need to format the name in a specific way, you just need to call your function. Depending on the template system you are using, this could actually be a fairly difficult and time consuming thing to get done.

So, why don’t I use external template? I find them redundant and don’t add much to solving the problem.

Why No External Templates?