In my JSON Parsing post, I talked about an issue I was having with a particular character set:
let string = "\"\u{aaef}abcd"
countElements(string) // 5
countElements(string.utf8) // 8
Well, it turns out that \u{aaef}
is a unicode combining character that modifies the character before it. There are some combining characters that create a single character, but there are also combining characters that still result in multiple visible characters, as seen above.
However, it seems there is a view into the string that gave me what I wanted:
let string = "\"\u{aaef}abcd"
countElements(string.unicodeScalars) // 6
If we take a look at a few other examples, we can see that the unicodeScalars
seems to give us the full make-up of the unicode values that are making up the string.
let single = "è" // \u{e8}
for scalar in single.unicodeScalars {
println("\(scalar) (\(scalar.value))") // prints: è
}
let combined = "e\u{300}"
for scalar in combined.unicodeScalars {
println("\(scalar) (\(scalar.value))") // prints: e, `
}
Notice the difference in the two: the first is an single unicode value, the second is the letter "e" combined with the accent grave (`).
The only downside that I've run into with this approach is that it seems to be significantly slower the UT8
-based approach I was using earlier.