When you build a parser, you need the ability to scan through your tokens to build up your output. Swift offers us a few different constructs for iteration; I needed the one that would be the fastest, after-all, I'm building a parser!
I wrote a small test suite to test the various iteration types, which essentially boils down to two options for String
s:
- The tradition for-loop that uses an index value
- The
GeneratorType
based approach
Index-based for-loop Here's the code for this one:
var string = ""
let scalars = self.largeJSON.unicodeScalars
self.measureBlock() {
for var idx = scalars.startIndex; idx < scalars.endIndex; idx = idx.successor() {
let scalar = scalars[idx]
scalar.writeTo(&string)
}
}
It's pretty straight-forward; simply start at startIndex
and traverse your way through the string until you hit endIndex
. There are couple of gotchas though, the most significant being that Swift doesn't allow Int
-based indexing – all of the types have their own special indexing type. The thing to watch out for, not all of them are bi-directional.
GeneratorType
Approach
var string = ""
self.measureBlock() {
for scalar in self.largeJSON.unicodeScalars {
scalar.writeTo(&string)
}
}
This one is fairly simple as well: simply loop through all of the unicode values. We can also write this loop in a slightly different way:
var string = ""
self.measureBlock() {
var generator = self.largeJSON.unicodeScalars.generate()
for var scalar = generator.next(); scalar != nil; scalar = generator.next() {
scalar?.writeTo(&string)
}
}
In my testing, I found that the GeneratorType
-based approach was about 18% faster. This is significant enough for me to use it. =)
Implementing the Parsing
Next up is actually parsing the JSON string. The basic idea is to look for specific tokens and call into one these methods:
parseObject
– used to parse out a JSON object (e.g. dictionary)parseArray
– used to parse an arrayparseNumber
– used to parse a number valueparseString
– used to parse out a string, also used when parsing keys from a dictionaryparseTrue
– used to parse the boolean valuetrue
parseFalse
– used to parse the boolean valuefalse
parseNull
– used to parse the literal valuenull
I think that about covers the basics of what I need. And here comes the problem… each of these a one more or pieces of the information:
- The current
generator
value so that increments can be done - The current character the
generator
is pointing to - The character used at the start of the
parse
call
When we look at the API for GeneratorType
, we find that it only supports next()
. Hmm… that's not going to be sufficient. So now we are left with two choices:
- Pass the current unicode token around with the our
generator
instance, or - Package up the
generator
and the current unicode token into a single class
To me, this is a no-brainer. As soon as we introduce this coupling, it is best to package up the dependencies and maintain that state with a single value.
Ideally, we would simply be able to extend the
GeneratorType
instance forString.UnicodeScalarView
, however, we cannot extend types with stored properties, so we are left with creating an entirely new type to box this functionality.
: .info
To work around this limitation, I created the following type:
struct UnicodeScalarParsingBuffer {
var generator: String.UnicodeScalarView.Generator
var current: UnicodeScalar? = nil
init(_ generator: String.UnicodeScalarView.Generator) {
self.generator = generator
}
mutating func next() -> UnicodeScalar? {
self.currentUnicodeScalar = generator.next()
return self.currentUnicodeScalar
}
}
I find this to be a deficiency in the current implementation of GeneratorType
. While it may be the case that you are always working in the same scope, it is also necessary at times to pass this context around. Once you start doing that, you're going to need that current
value, otherwise you need to pass both generator
and current
– no one really wants to do that.
The full source for the json-swift library can be found over on GitHub; the parsing code is here.