JSON Parsing

As part of my diving into Swift I've been using JSON as one of my learning projects: json-swift. Continuing on that tract, I took a look at what it would take to create a JSON parser using only Swift and no ObjC bridging. The results: ok, but lots of room for improvement.

One of the nice things about Swift is that it tries to abstract away all of the unicode information from you and create a nice, simple API for you to work with. Well, that's nice when it works, but there are cases that I ran into where it seemed to simply not be doing what I expected.

One such example:

let string = "\"Í´Øabcd"
countElements(string)           // 5
countElements(string.utf8)      // 8

// The raw bytes:
34      // "
234     // makes up "Í´Ø
171
175
97      // b
98      // c
99      // d
100     // e

In case that character isn't showing up, this is: unicde character, not sure the name of it.

I do not know enough about unicode so I don't know all of the ins and outs of why the " is attached to the unicode character (probably something to do with it being only three bytes), so some of you might be: duh!. That's ok. =)

: .info

That's not the worse part though, evident the value "Í´Ø is a single character, that is treated somewhat like a quote so you have to escape it, hence: \"Í´Øabcd.

I really wanted to try the String class out, so I had to use String.UTF8View as the mapping and compare each of the bytes and build up my strings manually using UnsafePointer and String.fromCString:

static func parseString(string: String.UTF8View, inout startAt index: String.UTF8View.Index, quote: UInt8) -> FailableOf<JSValue> {
    var bytes = [UInt8]()

    index = index.successor()
    for ; index != string.endIndex; index = index.successor() {
        let cu = string[index]
        if cu == quote {
            // Determine if the quote is being escaped or not...
            var count = 0
            for byte in reverse(bytes) {
                if byte == Token.Backslash.toRaw() { count++ }
                else { break }
            }

            if count % 2 == 0 {     // an even number means matched slashes, not an escape
                index = index.successor()

                bytes.append(0)
                let ptr = UnsafePointer<CChar>(bytes)
                return FailableOf(JSValue(JSBackingValue.JSString(String.fromCString(ptr)!)))
            }
            else {
                bytes.append(cu)
            }
        }
        else {
            bytes.append(cu)
        }
    }

    let info = [
        ErrorKeys.LocalizedDescription: ErrorCode.ParsingError.message,
        ErrorKeys.LocalizedFailureReason: "Error parsing JSON string."]
    return FailableOf(Error(code: ErrorCode.ParsingError, domain: JSValueErrorDomain, userInfo: info))
}

Since I do not know enough about unicode, I'm not sure if these are Swift bugs, limitations, or simply a lack in my own understanding.

Some of the problems I ran into:

  1. Unable to take a character from String and figure out the bytes that made it, so I have to use String.UTF8View.
  2. The String.UTF8View.Index is not Comparable, though it Equatable; I was doing index < endIndex initially.
  3. The String.UTF8View.Index is forward indexing only; originally, part of my algorithm would also step backwards through part of the string.
  4. Performance of my algorithm is about 3x slower than the NSJSONSerialization algorithm: 0.12s vs. 0.4s to parse a roughly 688KB file. I still need to investigate the memory usage.

I'm going to keep playing with it, especially as the later betas come out. I may try a purely functional based algorithm as well and see how that plays out in Swift.

JSON Parsing