Sharing code in public is interesting in many ways. Sometimes the choices we make about design are somewhat arbitrary as there are many options before us. Sometimes those choices are deliberate and methodical with a well reasoned approach on how you got there. Then there are those times where you just do something dumb…
If you’re going to be willing to share your code for the world to see, you really need to be OK with being wrong about something and learning from it. But you also need to know how to stick to your guns when you think you are doing things right. This is post is going to be a bit about both using my latest JSON parsing articles as illustrations: Generators Need a current
Value and Improving Code with Generics.
The primary goal of the code that I wrote was to enable the ability to parse through a JSON string and create a JSON object representation from that string. However, in that article, I presented a much lower level view of the problem and framed it in such a way as to remove all of the context on why and how I reached that decision.
Wes Campaigne posted some great feedback over on GitHub about the approach I took to the problem.
I thought the whole
buffer.next()
while buffer.current != nil {
if let unicode = buffer.current { // ... somewhere, buffer.next() is called
dance was kind of ugly: you’re dealing with the overhead of using a generator, but receiving none of the benefits it provides (e.g. for in loops). Also, using a struct for your BufferedGenerator seems odd — you end up using a class as a backing store anyway, and having it as a struct means using inout parameters all over the place. There’s a discussion on the dev forums that argues the case why GeneratorTypes should, in general, just be reference types.
Wes makes some great points, and his RewindableGenerator<S>
is a very good class that solves the specific problem I was looking at better (both in terms of the applicability of the use cases and in how the code that consumes it should work).
The only real problem, which I forgot when I first looked at his solution, was that the performance difference between using the GeneratorType
and the Index
types for Strings
is fairly significant, nearly a 2.5x slowdown.
When I was first solving this problem, I looked at the following approaches:
String.Index
based approach grabbing individual characters. This lead me to find out howString
works with unicode combining characters.- Then I tried using
String.UTF8View.Index
, after all, they are both indexes it should be a fairly easy change. Well… turns out thatString.Index
is aBidirectionalIndexType
butString.UTF8View.Index
is only aForwardIndexType
. At this point, I realized that I basically needed to re-write a significant portion of my algorithm. I did so making sure that all of myprevious()
calls were updated; this also required some fairly ugly hacks to get everything to work. Then I found out two new things after more investigation in the topic:- Performance of the
GeneratorType
construct was significantly faster than theIndex
based construct. - There is a better view into the string
String.UnicodeScalarView
. With theString.UTF8View
, I had to create strings by passing a pointer to anUInt8
array that I had to keep track of while parsing the string. It was fairly ugly, but it worked. =)
- Performance of the
Both of these lead me to the realization that another parser re-write was coming… however, this time, I knew I needed to use GeneratorType
and I knew that I wanted to get rid of a lot of the hacks I did. This was the start of the Generators Need a current
Value and Improving Code with Generics posts.
Well, I was able to get rid of some of my hacks, but then Wes’ comments came. I already wasn’t very pleased with the implementation of the JSON parser as it still had some hacks in it and some somewhat cryptic logic, but hey, it worked! But as I thought about Wes’ comments some more, I knew there was a better way.
So I started integrating Wes’ solution into my parsing code. But, I had already forgotten a lesson I had learned earlier: Index
based approaches suck at perf, big time!
At this point, I had already re-written the parsing to provide some significantly better error messages (thanks in-part to using for (idx, scalar) in enumerate(generator) {}
that was now possible due to Wes’ updates) and a much cleaner logic flow. However, I wanted to get my performance back down.
That’s when I came up with this class: ReplayableGenerator
final public class ReplayableGenerator<S: SequenceType> : GeneratorType, SequenceType {
typealias Sequence = S
private var firstRun = true
private var usePrevious = false
private var previousElement: Sequence.Generator.Element? = nil
private var generator: Sequence.Generator
public init(_ sequence: Sequence) {
self.generator = sequence.generate()
}
public func next() -> Sequence.Generator.Element? {
switch usePrevious {
case true:
usePrevious = false
return previousElement
default:
previousElement = generator.next()
return previousElement
}
}
public func replay() {
usePrevious = true
return
}
public func generate() -> ReplayableGenerator {
switch firstRun {
case true:
firstRun = false
return self
default:
self.replay()
return self
}
}
public func atEnd() -> Bool {
let element = next()
replay()
return element == nil
}
}
I’ve been experimenting with using switch-statements over if-statements; I’m greatly likely their readability in many cases. However, there does seem to be a bug where
case true
andcase false
do not create an exhaustive list, so I usedefault
.
: .info
These were the constraints:
Index
based iterators and lookups are significantly slower thanGeneratorType
andfor-loop
; they cannot be used.- The
GeneratorType
is only a forward-moving iterator. - There is no ability to inspect the previous character in the construct. This is vital because when we parse values, often times we need to inspect the next value to determine if we stop parsing the current value. However, once we do this, we are in a bit of a situation as the parser really needs to start parsing from that previous character because it’s going to call
next()
and skip over the just visited character. Bad mojo.
This class provided everything I needed, while the semantics of it also allowed me to create a much better parse()
. The integration was also easy as I simply needed to replace the previous()
calls with a replay()
call.
With this implementation, I was able to get my performance back down to 0.25s vs. 0.17s (JSON.parse
vs. NSJSONSerialization
).
Remember, often times people are able to look at a problem have been working on and shed new light on the situation. While Wes’ solution was not applicable to my situation, his thought process on why his implementation better was superbly helpful in rethinking the semantics of what I was doing. Ultimately, I’m fairly happy with the results of the parser now… except for that perf! =)
So thanks Wes for helping me think about the problem better. Oh, and you can judge my parsing code here: JSValue.Parsing.