As part of my diving into Swift I've been using JSON as one of my learning projects: json-swift. Continuing on that tract, I took a look at what it would take to create a JSON parser using only Swift and no ObjC bridging. The results: ok, but lots of room for improvement.
One of the nice things about Swift is that it tries to abstract away all of the unicode information from you and create a nice, simple API for you to work with. Well, that's nice when it works, but there are cases that I ran into where it seemed to simply not be doing what I expected.
One such example:
let string = "\"Í´Øabcd"
countElements(string) // 5
countElements(string.utf8) // 8
// The raw bytes:
34 // "
234 // makes up "Í´Ø
171
175
97 // b
98 // c
99 // d
100 // e
In case that character isn't showing up, this is: .
I do not know enough about unicode so I don't know all of the ins and outs of why the " is attached to the unicode character (probably something to do with it being only three bytes), so some of you might be: duh!. That's ok. =)
: .info
That's not the worse part though, evident the value "Í´Ø
is a single character, that is treated somewhat like a quote so you have to escape it, hence: \"Í´Øabcd
.
I really wanted to try the String
class out, so I had to use String.UTF8View
as the mapping and compare each of the bytes and build up my strings manually using UnsafePointer
and String.fromCString
:
static func parseString(string: String.UTF8View, inout startAt index: String.UTF8View.Index, quote: UInt8) -> FailableOf<JSValue> {
var bytes = [UInt8]()
index = index.successor()
for ; index != string.endIndex; index = index.successor() {
let cu = string[index]
if cu == quote {
// Determine if the quote is being escaped or not...
var count = 0
for byte in reverse(bytes) {
if byte == Token.Backslash.toRaw() { count++ }
else { break }
}
if count % 2 == 0 { // an even number means matched slashes, not an escape
index = index.successor()
bytes.append(0)
let ptr = UnsafePointer<CChar>(bytes)
return FailableOf(JSValue(JSBackingValue.JSString(String.fromCString(ptr)!)))
}
else {
bytes.append(cu)
}
}
else {
bytes.append(cu)
}
}
let info = [
ErrorKeys.LocalizedDescription: ErrorCode.ParsingError.message,
ErrorKeys.LocalizedFailureReason: "Error parsing JSON string."]
return FailableOf(Error(code: ErrorCode.ParsingError, domain: JSValueErrorDomain, userInfo: info))
}
Since I do not know enough about unicode, I'm not sure if these are Swift bugs, limitations, or simply a lack in my own understanding.
Some of the problems I ran into:
- Unable to take a character from
String
and figure out the bytes that made it, so I have to useString.UTF8View
. - The
String.UTF8View.Index
is notComparable
, though itEquatable
; I was doingindex < endIndex
initially. - The
String.UTF8View.Index
is forward indexing only; originally, part of my algorithm would also step backwards through part of the string. - Performance of my algorithm is about 3x slower than the
NSJSONSerialization
algorithm: 0.12s vs. 0.4s to parse a roughly 688KB file. I still need to investigate the memory usage.
I'm going to keep playing with it, especially as the later betas come out. I may try a purely functional based algorithm as well and see how that plays out in Swift.