Although we seem to have hit a ~3 GHz limit in processor speed, Moore's law may still be holding as more and more cores are added to a processor at this speed. As processors have gotten faster, memory latency has gotten longer over time. This means that understanding how the architecture you are working on accesses memory is critical to the execution speed of your program, and for the first time, may be even more important than the algorithms themselves.

Let me give you an example: For collision detection with actors that move in the world, every level in Ghostbusters: The Video Game (and the Infernal Engine) is partitioned with a BSP tree. Each node in the tree contains a linked list of actor pointers that are in that node. Note that actors may not be in the leaves of the tree, but may be pushed up until one node fully contains the actor.

Here is a simplified example of what our world and actors looked like in previous generations of code:

class CActor {

.

.

.

CVector wPos;

float wRadius;

.

.

.

CActor *nextActorInBSP;

.

.

.

};

struct SBSPNode {

float a,b,c,d;

SBSPNode *left,*right;

CActor *firstActorInBSP;

};

void raytraceActorsInNode(SBSPNode *node,CVector *wRayStart,CVector *wRayEnd) {

CActor *a = node->firstActorInBSP;

while (a != 0) {

checkCollision(&a->wPos,a->wRadius,wRayStart,wRayEnd);

a = a->nextActorInBsp;

}

}

So this is a very simple code example running through a linked list of actors and performing a simplified ray trace on them. The problem with this code, though, is execution speed. While this type of code worked well in previous years, it does not perform well on modern hardware due to the potential of cache misses running through a linked list. Certain hardware can have over a 500 clock tick penalty for an L2 cache miss and a 37 clock penalty for an L1 miss. Neither is acceptable if you want to run your code at 3 GHz. To ensure we run at maximum speed, we rearrange our structures as follows:

New BSP structure:

struct SActorPosRad {

CActor *actor;

CVector wPos;

float wRadius;

};

struct SBSPNode {

float a,b,c,d;

SBSPNode *left,*right;

Array actorList;

};

void raytraceActorsInNode(SBSPNode *node,CVector *wRayStart,CVector *wRayEnd) {

int i;

for (i =0; i wPos,a->wRadius,wRayStart,wRayEnd);

}

}

The array template is a very simple dynamic memory allocator. As long as actors don't move very far in one frame of a game, using a template array here mostly doesn't change from frame to frame. Insertion and deletion from the array is treated as a linear list, and the array is pre-allocated to a minimum size, so that allocation (and de-allocation) will almost never happen. This could even be a static-sized array, with actors overflowing this node pushed up in the tree.

This is a case where we will have to unlearn what has been taught in computer science classes for decades; the use of a linked list may not be the fastest way to store data in memory anymore. We're violating multiple rules we've been taught -- not using a linked list, and using a linear search for inserting and deleting actors in this list. Linear searches are slow, right?

Linear searches (if used properly) can be extremely fast due to the memory access pattern. The cache line on a modern system is anywhere from 64 bytes to 128 bytes in size. That means that, to read any memory location in that vicinity, the memory around that will also have to be read in. So you have free full-speed access to the data around you as well. If you are constantly hopping around in memory, like a linked list does, you will be stalled while the memory around your pointer is brought in as it is de-referenced.

So, while linear searches may not be applicable for all types of data, they can certainly be used for tight game code.

 

Mark Randel is the president and chief technology officer of Terminal Reality Inc., developer of Ghostbusters: The Video Game

 

Available at Amazon.com:

The Saboteur

Assassin's Creed II

Assassin's Creed: Brotherhood

Mario vs. Donkey Kong Mini-Land Mayhem!

Time Crisis: Razing Storm

ArcaniA: Gothic 4

Article: Copyright © iHaveNet

Video Games: Game Optimization for Modern Hardware