In college and other performance books, we learned about Big O notation. We were taught that the thing you really need to be careful of is the algorithmic complexity of your algorithms. And while that is true to some extent, it leads to being extremely wasteful.
The problem isn’t so much what you’ve been told about algorithmic complexity, the problem is that the comparison is done in the abstract, typically with no consideration of how the computer actually works.
And that’s a huge problem!
Take for instance a problem of inserting items into a the middle of an sequence of values. When we look at the comparison between a linked list and an array of items, the algorithmic complexity tells us that both operations are going to be O(N).
Of course, with a linked list, we only need to:
- Find the location to insert at
- Create the new node
- Update the next pointer of the new new node
- Update the next pointer of the previous node
That sounds pretty cheap!
For an array, we only need to:
- Resize the contiguous piece of memory making up the array
- Copy all the data over
- Add our new value in the correct offset
Well crap, that sounds really expensive!
Spoiler: you’ve been misled.
Yep, that’s right. What if I told you that even in an array of 10,000 items and a linked list of 10,000 items, insertion into the middle is still going to be faster in the array? Would you believe me?
The timings are for inserting 10,000 items into the middle of the vector each time.
- Insertion into a std::vector: 1.35ms
- Insertion into a std::list: 104.89ms
(side note: the std::list was 4x faster than my own linked list implementation, so it’s definitely not an STL issue)
What the heck is going on here!? Ok, this was a 32-bit size structure. What about a bigger struct, like 2048-bits?
- Insertion into a std::vector: 155.15ms
- Insertion into a std::list: 211.38ms
Ok, as expected, the size of the data structure has a much greater impact on the array than the list. But what’s the deal with the original timings?
The problem is: data locality.
This is a key thing that is missed when talking about performance. Regardless of how you implement a list of items, you’re going to have something like this:
node = node->next;
The problem is that the -> operator (de-referrencing) is a relatively expensive operation.
For comparison, here are the timings for insertion into the beginning of the array/list (32-bit structure):
- Insertion into a std::vector: 2.41ms
- Insertion into a std::list: 0.52ms
Remember, computers are extremely fast at dealing with data that is sequential. A lot of that data just gets pulled into the processor’s cache lines and it’s like a shark at a buffet line. When you have to follow pointers around, it’s like having to drive to the grocery store every time you want a bite to eat.
The above article has a much more thorough investigation and provides some excellent advice about increasing the performance of your game loop without messing up your data structures. It continues this same concept of “data locality” and shows just how important it is.
Here’s an interesting Swift comparison on the 32-bit sized data structure for a linked list and an array:
- Insertion into a Array: 1.22ms
- Insertion into a LinkedList: 396ms
If you’ve been wondering where my Swift posts have been… this is why. Honestly, I’m tired of trying to figure out the performance issues in the language (mostly between debug and release builds) and have been spending most of my hobby programming time in C++. ¯\_(ツ)_/¯
UPDATE: There was an error in the Swift version, it’s been fixed and the array times are SIGNIFICANTLY better. It’s still slower than the C++ version, but only by a factor of 2 this time. WAY better.
UPDATE2: When Ints are 64bits… duh! Sorry, Swift’s Int type is 64 bits. Using Int32, as I should have been, get’s us all on even playing field! GO SWIFT! 🙂
All code for this can be found and verified here: https://gist.github.com/owensd/94490781a0ce1dce6737834876ff3a02.
One thought on “Performance Matters: O(N)”
Locality is a big deal in modern CPUs where a cache miss means you gotta go to all the way to RAM to retrieve stuff. Keep your data structures small and simple. Pass and return structs by value from your functions. And above all, avoid premature optimization and profile later. Performance problems tend to crop up in unexpected places!
Comments are closed.