Developing “Rush for Glory” game is going to be finished. so I decided to rewrite some parts of the engine recently. this may cause you to rewrite the whole source code. In the first step, I rewrote the containers library. in this new version, I changed the structure of containers to decrease the usage of templates<> to more easier to understand. I used a sampler to decrease allocations. also implementing memory allocators is more easy than before and changeable during the code in some cases. finally implementing a new memory allocator that uses stack memory to avoid calling the malloc function.
Note that some new features are not designed for general purposes. there are many limitations to using them and must be used with care because those features are designed to develop the game and most of the capabilities depend on the needs of a game engine.
Here are some highlights :
Added math base functions to the library with some fast functions like :
// fast sqrt root using Log Base 2 Approximation With One Extra Babylonian Steps float sx_sqrt_fast(constfloat x );
// return sine( x ) from table. maximum absolute error is 0.001f float sx_sin_fast(constfloat x );
// return cosine( x ) from table. maximum absolute error is 0.001f float sx_cos_fast(constfloat x );
// compute the sine and cosine of the angle x at the same time. maximum absolute error is 0.001f void sx_sin_cos_fast(constfloat IN x, float& OUT s, float& OUT c);
and some useful conventional functions
MemManFixed::SetBuffer(void* buffer);
I can change the memory buffer of the allocator by setting a new buffer. this feature is available only on the fixed memory manager. it means that it’s suitable just for arrays and strings.
For example, I wanna change/fill a string in a structure. in this example when I set members of the structure to the allocator, all string functions can be applied easily.
// extract file path from file name and copy that to fileInfo.path
memTmp.SetBuffer(&fileInfo.path);
strTmp = fileName;
strTmp.ExtractFilePath();
// extract the name of the file and copy that to fileInfo.name
memTmp.SetBuffer(&fileInfo.name);
strTmp = fileName;
strTmp.ExtractFileName();
// extract file type and copy that to the fileInfo.type
memTmp.SetBuffer(&fileInfo.type);
strTmp = fileName;
strTmp.ExtractFileExtension();
MemManFixed_inline<memSizeInByte>
This memory manager uses a memory stack of functions. by the way, this feature is available only on the fixed memory manager. this means that it’s suitable just for arrays and strings.
In this example, I try to collect some nodes from the scene manager. this will happen at each frame more than once, at the renderer, AI system &, etc. Implementing a new Array class causes calling allocation/deallocation. the other solution is declaring a static array. the other solution is using a memory stack with some limitations but fast and easy memory management.
example :
// collect nodes from the scene manager
g_engine->m_scene->GetNodesByFrustum( cameraFrustum, nodes );
for(int i =0; i < nodes.Count(); i++) { // ... do something ! }
Although there are some changes like adding new functions and changing algorithms in the other stuff to increase performance and capabilities, but those still need to be optimized more and more.
In the last article, I explained something about the concept of MemPool. Now I want to show you some optimization in design and a simple sample of MemPool that can be downloaded and tested.
Before testing the code imagine that we have a lot of chunks on the pool consecutively. in the last article, I introduced the body of a chunk just like a basket that contains the data. now the chunk can have a fixed design to have more readable code.
New Chunk Structure
As you can see we can place the address of the chunk in the head of the chunk instead of the end line of that and store the address of the behind chunk. all we try to do is make a chain of chunks by minimum size.
Here are the results of my test for the performance of this system for 100000 allocation/deallocation in release mode. I try to simulate a real environment for both the memory pool and OS memory system, but I don’t know how much it’s real. 😉
Also, the source code of this sample already exists at the end of this post. You may change and test it by yourself.
85~116 ms by using new/delete
10~17 ms by using MemPool
NOTE:
There is no force to use a general MemPool in your project! you can make a faster MemPool for each part of your application considering the needs. for instance, I write a fast hybrid MemPool which is faster. but the number of allocations for this hybrid MemPool is limited and must be defined in creation time.
Sometimes we want to allocate and deallocate some memory blocks frequently in high-performance applications, especially in games and game engines. in this case, regular functions like malloc/free may not provide as much speed as we like. Memory pool ( MemPool ) provides a simple and fast memory management scheme for compact memory blocks chiefly. MemPool allocates a specified memory block at initialization time and manages that to use in allocate/deallocate functions instead of OS memory management. MemPool has some advantages and disadvantages. the most important benefits of MemPool are providing high-speed allocations in real-time applications and deallocating all objects ( which are allocated by MemPool ) by releasing the main pool at once. The disadvantages of MemPool directly depend on how the programmer uses that. Using a fast memory pool system is not safe and has some restrictions that I will describe later.
There are many prepared memory pool libraries that you can search, download, and use in your applications. Recently I wrote some types of MemPool for various applications. One of them is more general to use; Thus in this simple article, I try to explain briefly the concept of my memPool structure.
The most important couple of methods that we want from MemPool are Alloc and Free. The first method will store a block of memory from the pool and the second will restore that to the pool and this system will manage every block in this trading. To manage the pool we use some type of object called “Chunk” that contains some information about the memory block requested by the user. The other object is a simple list. We will use this list to store and restore the pointer of empty chunks in the pool. In this system every chunk has three parts and will be created in the pool, so we don’t need conventional new/delete operators to create them. The first part of the chunk contains some little info about the type of the chunk, size, and links. The middle part is an outgoing memory block to the user and the end section is a pointer to the head of the chunk.
Now let’s start playing with this pool and chunks to manage the system. At first, assume that the MemPool class was created and we allocated a large enough memory block for the pool. we create the first empty chunk on the pool set the data size of the chunk to the whole of the pool size ( except Its size because the chunk is carved on the pool ) and finally push this empty chunk to the list.
Every time the method Alloc is called by a user we will create a new chunk on the pool and return the pointer to the Data block in the middle of the chunk. Assume that the first request is received and we want to allocate a new chunk. To do this, we must search in the list of empty chunks and find a chunk that has enough space to allocate the requested memory block size. at this time the list contains only one empty chunk. we can change the type of the chunk from “empty” to “Used”, pop it from the list, and send out the pointer of the data block to the user. but its size is more than the requested size! In this case, we create a new empty chunk by the remaining size and push it to the list of empty chunks. now we have one used chunk and one empty chunk. notice that only one empty chunk is in the list!
Some new requests were received by calling the Alloc method. we perform the request just like above. notice that we have only one empty chunk in the list. As you can see in these images, I show the allocated chunks in orange color and empty chunks in white color. In this step, we have 3 allocated chunks and one empty chunk.
When the user wants to free the memory block number 2. the entry parameter, is the pointer of the block of a chunk, so to find the chunk object we should go back to the size of an object in bytes. Something just like this:
After finding the chunk we change the type of chunk from “used” to “empty” and push it back to the chunk. now we have 2 used chunks and 2 empty chunks. As mentioned above there is no need to push used chunks in any list and we just list empty chunks.
In this important step, the user wants to free memory block 3. As you can see in the picture above, if we remove chunk 3 we have 3 empty chunks respectively and burst. what if this behavior continues? Then we have a lot of separated empty chunks with different little sizes and we can’t allocate a new chunk larger than any of them in size. The solution to this problem is merging empty chunks. to achieve this solution, when the user wants to free a memory block and we find the specified chunk to remove, we should look at the neighbors of the chunk and verify that they are empty or not. If they are empty, merge them and prepare the list of unused chunks in MemPool.
At first, to look for the right neighbor chunk we go to the end of the chunk and merge that with the current chunk. Merging these two chunks is easy by increasing the size of data of the current chunk and changing the block of address in the neighbor chunk. Pop the empty neighbor chunk from the list and push the current empty chunk to the list.
// look at neighbor chunk in right side of the current chunk
CChunk* rch =(CChunk*)((PBYTE)p + pchunk->size + ChunkAddressSize); if(rch->type == chEmpty) { // change the address at the end of the chunk
PChunkPTR prch =(PChunkPTR)((PBYTE)rch +sizeof(CChunk)+ rch->size);
prch->chunk = pchunk;
// exclude the right chunk from the list
emptylist.pop(rch);
pchunk->size += rch->size +sizeof(CChunk)+ ChunkAddressSize; }
In the next step. to look for the neighbor chunk on the left side we just read the block of address behind the current chunk. this address will guide us to the left neighbor chunk and finally, merging these chunks is as simple as the previous step. Now look at chunk 3; by freeing this chunk both left and right neighbors will merge and only one empty chunk will be in the list of empty chunks.
This is a simple concept of a memory pool system and it is certainly different from OS memory management. Using MemPool has some risks and restrictions. for instance, assume that the user allocates a memory block from the pool and wants to use it for a string. if the user traverses the characters of this string and changes data out of size, the chunk of the pool breaks down and may cause the system failure. Also writing some features for MemPool like reallocation methods and protection system for chunks makes the system speed low.
There are some optimizations that we can apply to this system. For example, probably we can use a map to store, restore, and search for suitable empty chunks in the alloc function. when we talk about speed and optimization to increase speed in milliseconds of the time units, attention to how we write our code is important. for example, decreasing temp object in functions for local variables and …
NOTE: There is no force to use a general MemPool in your project! you can make a faster MemPool for each part of your application considering the needs. for instance, I write a fast hybrid MemPool which is faster, but the number of allocations for this hybrid Mempool is limited and must be defined in creation time.