Make sure to see the update below for a bit for more information on the causes of memory usage.
In my seemingly never ending and not quite achievable goal of beating NSJSONSerialization
in both performance and memory utilization for parsing a JSON string, I've come across another pearl of wisdom with regards to Swift: ignore my Error Handling in Swift piece and others that recommend using the Either<T,U>
as in other languages (at least for the current version of Swift, as of Beta 6).
I have been able to get my parsing speed to within 0.01s of NSJSONSerialization
; while my goal is domination, I also am pragmatic (at times). Next up was memory utilization. Unfortunately, I was (and still am), far behind the total memory usage of the ObjC version. So like a good little software engineer, I fired up Instruments and started investigating what I saw.
When you investigate memory usage, there are three primary concerns that we need to watch out for:
- Total amount of memory used over the life of the scenario
- Total amount of memory every actually in use at any given time
- Highest spike in memory used over the life of the scenario
Instruments visualizes this data pretty nicely for us:
The picture above is showing the results of the NSJONSerialization
code path. My implementation actually has a better "total persistent bytes" overall of 1.92MB vs. the 2.51MB shown above. However, the total memory used in mine was about 6.5MB while we see that NSJONSerialization
only used about 4.7MB.
Taking a Dive
There are a couple of approaches we can take to tracking down and solving memory issues:
- Examine the code 2. Examine the profiles
Unfortunately, the profiles were not really helping me track down root cause of the issues, but were illustrative in helping me understand that I was creating many, many copies of objects all around the place.
Examining the Error
type I first took a quick look over my code to see if I could see anything obvious. There was one thing I noticed right off the bat: FailableOf<T>
stores an Error
object in its Failure
case. Well, the Error
type is a struct with three values in it, and since I return a FailableOf<T>
in all of my parsing calls, I'm going to need to return a copy of that Error
, even if it's empty, all of the time.
Knowing that the Error
object is going to be copied so many times throughout the call chain, we can instead mark the Error
type as public final class
.
When we do this, the total memory usage drops to 6.06MB.
The other option is to create a backing class to store all of the data: that class looks like this:
public struct Error {
public typealias ErrorInfoDictionary = [String:String]
class ErrorInfo {
let code: Int
let domain: String
let userInfo: ErrorInfoDictionary?
init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
self.code = code
self.domain = domain
self.userInfo = userInfo
}
}
var errorInfo: ErrorInfo
public var code: Int { return errorInfo.code }
public var domain: String { return errorInfo.domain }
public var userInfo: ErrorInfoDictionary? { return errorInfo.userInfo }
public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
self.errorInfo = ErrorInfo(code: code, domain: domain, userInfo: userInfo)
}
}
However, that seems to be a lot more complicated over simply do this:
public final class Error {
public typealias ErrorInfoDictionary = [String:String]
public let code: Int
public let domain: String
public let userInfo: ErrorInfoDictionary?
public init(code: Int, domain: String, userInfo: ErrorInfoDictionary?) {
self.code = code
self.domain = domain
self.userInfo = userInfo
}
}
And since all my values are immutable to begin with, I'm not sure why I would chose the struct
approach for this problem.
Investigating the FailableOf<T>
Since I'm having copying issues with the Error
(gist) type, it is only logical to look at the FailableOf<T>
type next. Instead of using my JSON parser as the test ground, I decided to create a little sample app that would loop many times calling a function that returned the following types:
FailableOf<T>
– my implementation of theEither<T, U>
concept (gist)Either<T, U>
– a more generic solution to myFailableOf<T>
problem (gist)(T, Error)
– a tuple that contains the two pieces of information
The sample program is straight forward:
func either<T>(value: T) -> Either<T, Error> {
return Either(left: value)
}
// test: either
for var i = 0; i < 100_001; i++ {
let r = either(i)
if (r.right != nil) {
println("error at \(i)")
}
}
Each of the different constructs have the same form (gist).
This is where I found something interesting: both the FailableOf<T>
and Either<T, U>
take up about 3MB of memory, while the (T, Error)
tests only take 17KB. Clearly, there has to be some missed compiler optimizations in Swift. Regardless, the tuple approach is clearly the one we should be taking, at least for now, if we really care about every ounce of memory.
In order to work with it better in my code, I create a typealias
and use named tuples:
/// The type that represents the result of the parse.
public typealias JSParsingResult = (value: JSValue?, error: Error?)
After updating all of the JSON.parse
code to return this new type, memory usage is down to 5.33MB!! Simply switching from a struct
-based approach to this named tuple approach (which I think is just a good, frankly), I was able to shave off another 700KB of unnecessary memory creation.
I'm not done investigating other opportunities right now, but things are starting to look really promising here.
UPDATE After some more investigating, I realized why the enum case was causing such memory bloat: we need to box all of the types that get stored in them until Swift implements the proper generic support for an enum.