Performance Matters: O(N)

In college and other performance books, we learned about Big O notation. We were taught that the thing you really need to be careful of is the algorithmic complexity of your algorithms. And while that is true to some extent, it leads to being extremely wasteful.

The problem isn’t so much what you’ve been told about algorithmic complexity, the problem is that the comparison is done in the abstract, typically with no consideration of how the computer actually works.

And that’s a huge problem!

Take for instance a problem of inserting items into a the middle of an sequence of values. When we look at the comparison between a linked list and an array of items, the algorithmic complexity tells us that both operations are going to be O(N).

Of course, with a linked list, we only need to:

Find the location to insert at
Create the new node
Update the next pointer of the new new node
Update the next pointer of the previous node

That sounds pretty cheap!

For an array, we only need to:

Resize the contiguous piece of memory making up the array
Copy all the data over
Add our new value in the correct offset

Well crap, that sounds really expensive!

Spoiler: you’ve been misled.

Yep, that’s right. What if I told you that even in an array of 10,000 items and a linked list of 10,000 items, insertion into the middle is still going to be faster in the array? Would you believe me?

The timings are for inserting 10,000 items into the middle of the vector each time.

Insertion into a std::vector: 1.35ms
Insertion into a std::list: 104.89ms

(side note: the std::list was 4x faster than my own linked list implementation, so it’s definitely not an STL issue)

What the heck is going on here!? Ok, this was a 32-bit size structure. What about a bigger struct, like 2048-bits?

Insertion into a std::vector: 155.15ms
Insertion into a std::list: 211.38ms

Ok, as expected, the size of the data structure has a much greater impact on the array than the list. But what’s the deal with the original timings?

The problem is: data locality.

This is a key thing that is missed when talking about performance. Regardless of how you implement a list of items, you’re going to have something like this:

node = node->next;

The problem is that the -> operator (de-referrencing) is a relatively expensive operation.

For comparison, here are the timings for insertion into the beginning of the array/list (32-bit structure):

Insertion into a std::vector: 2.41ms
Insertion into a std::list: 0.52ms

Conclusion

Remember, computers are extremely fast at dealing with data that is sequential. A lot of that data just gets pulled into the processor’s cache lines and it’s like a shark at a buffet line. When you have to follow pointers around, it’s like having to drive to the grocery store every time you want a bite to eat.

Further Reading:

http://gameprogrammingpatterns.com/data-locality.html

The above article has a much more thorough investigation and provides some excellent advice about increasing the performance of your game loop without messing up your data structures. It continues this same concept of “data locality” and shows just how important it is.

Swift Comparison

Here’s an interesting Swift comparison on the 32-bit sized data structure for a linked list and an array:

Insertion into a Array: 1.22ms
Insertion into a LinkedList: 396ms

If you’ve been wondering where my Swift posts have been… this is why. Honestly, I’m tired of trying to figure out the performance issues in the language (mostly between debug and release builds) and have been spending most of my hobby programming time in C++. ¯\_(ツ)_/¯

UPDATE: There was an error in the Swift version, it’s been fixed and the array times are SIGNIFICANTLY better. It’s still slower than the C++ version, but only by a factor of 2 this time. WAY better.

UPDATE2: When Ints are 64bits… duh! Sorry, Swift’s Int type is 64 bits. Using Int32, as I should have been, get’s us all on even playing field! GO SWIFT! 🙂

All code for this can be found and verified here: https://gist.github.com/owensd/94490781a0ce1dce6737834876ff3a02.

One thought on “Performance Matters: O(N)”

kirsch says:

Locality is a big deal in modern CPUs where a cache miss means you gotta go to all the way to RAM to retrieve stuff. Keep your data structures small and simple. Pass and return structs by value from your functions. And above all, avoid premature optimization and profile later. Performance problems tend to crop up in unexpected places!

LikeLike

September 6, 2017 at 5:04 am

Comments are closed.

owensd.io

A builder of things.

Performance Matters: O(N)

Conclusion

Swift Comparison

One thought on “Performance Matters: O(N)”

Conclusion

Swift Comparison

Share this:

One thought on “Performance Matters: O(N)”