Input System

The input system of the engine is completed. this system is composed of a main static input class which contains all input data including keyboard keys, mouse buttons, two analog data, and one cursor. these data are separated into four players and the engine’s parts can access these data from the main input class by specifying the player ID.

myState = sx::io::Input::GetAnalog_2(playerID)->state;
cursorX = sx::io::Input::GetCursor_ABS(playerID)->x;

each hardware input device – Keyboard, Mouse, Joystick, etc – can easily defined and attached to the main input class. for example, to define a new joystick for the engine we can write down a unique class for that joystick which is inherited from the standard input device interface of the engine, and attach that to the engine by a unique player ID.

class MyJoystick : public sx::io::InputDeviceBase
{
// ...
// .. define joystick
// .. change main input class data
// ...
};

// add joystick connectivity. specify player ID in the creation time
sx::io::PInputDeviceBase newDevice = SEGAN_NEW( MyJoystick(new_playerID) );
sx::io::Input::Attach( newDevice );

Also, we can easily send signals to the player devices to change a value or use some new features of current joysticks. something like this:

// send signal to the joystick to vibrate it
Input_Signal_Vibration isVibrate;
isVibrate.wLeftMotorSpeed = SX_INPUT_DEFAULT_VIBRATE;
isVibrate.wRightMotorSpeed = SX_INPUT_DEFAULT_VIBRATE;
sx::io::Input::SendSignal(playerID, IST_SHOCK, isVibrate);

And finally, all keys and buttons in this system will have one state from listed below:

//! input buttons state
SX_INPUT_STATE_NORMAL 0 //This is the normal state of the key
SX_INPUT_STATE_DOWN_FIRST 1 //The key is pressed in first time.
SX_INPUT_STATE_DOWN_HOLD 2 //After the first down, the state will go to hold down
SX_INPUT_STATE_UP -1 // pressing finished.
//This state will go back to normal in the next update

So when we make our game just use one standard input class and don’t worry about devices anymore. easily find how many players are connected. after that according to the gameplay use the keys and data in the input system. For each player change input data or vibrate joystick sets.

PS: The other part of the engine is currently under development. rendering parts, mesh, materials, HUD, AI, script contexts, particles, physics, logic objects, etc will come to gather in the core and will set on scene system and node structures.

Task Manager for Multitasking

“MULTITHREADED” Or Not “MULTITHREADED”?

This is the first question that fires your brain’s sensor if you thinking about implementing a multitasking system in your 3D application. “D3DCREATE_MULTITHREADED” is a creation flag that makes the Direct3DDevice9 multithreaded safe. it’s easy. it works but it’s slow! also, this flag will not solve your problems in a multithreaded system. it just makes some functions of the device multithreaded safe (Lock/Unlock for instance).

There are so many problems that still exist and will confront in synchronization between threads and application events. For example, suppose that a thread is decompressing a file to load resource data and copy them to the buffers of a locked surface. at this moment the application tries to release the surface. What we can do? The first answer is shutting the thread down, but it’s worse because terminating a thread is a very dangerous operation. The thread may have allocated a memory block or created events and finally, you may have implemented third-party libraries (zlib to decompression) it would be an unpredicted operation.

Where is the solution?
You have to design a general solution to solve the problems and represent a safe environment to extend and avoid the same problems. I created a Task Manager system that has some internal threads and queues of tasks. for each task, you can specify the destination thread and send it to the task manager. The task manager executes any task in a specified thread. After that, I made some tasks to call Direct3D functions in the main thread. the other threads can send these tasks to the task manager and wait for them to complete. this solution makes the Direct3D function calls safely. also, I made some tasks for destroying objects to solve the mentioned problem above.

The other useful multitasking technique that can implemented by task managers is Self Delayed Task. This is very important to avoid the main thread from waiting. in my application, each task in the queue of the main thread will execute in the main loop of the application. thus some tasks with special missions the other threads can check the flags and add them again to the task manager to execute in the next loop. I used this technique to destroy objects. when an object is going to be destroyed and has some tasks that are executing in other threads at the same time, the destroy task will delay itself to execute in the next loop, instead of waiting for other thread to let the main loop continue running.

Now in SeganX engine, resources can be created and loaded in the background. The other feature is loading specified LOD of objects in the background depending on the distance of the camera. In this feature, every object at a far distance will load a low level of LOD to reduce the time consumption of decompression files, transporting data to buffers, and other operations. For example, to load a texture 1024×1024 of an object in a far distance the system loads the mipmap of 256×256 at first and when the object comes closer, the other mipmap will load consecutively. The other new feature is the resource manager of the device buffers can classify and cache created buffers for reuse later to decrease the number of Direct3D creation operations.

Loading levels of texture\’s mipmaps in separated tasks in the background

Address Spaces Problem

As you know the Windows operating system defines an address space area for each process to prevent data hacks and data corruption that may occur by other processes. This approach has some advantages but with a few problems.
A simple (but popular) problem appears when memory blocks are allocated in one process (a DLL for instance) and transported to the other (an application for instance) and then should be reallocated or free in another process! The exception will rise at this point because different memory managers in different address spaces want to operate on non-shared blocks of memory.
You may never see this problem because people usually use default shared libraries, but for me, this is another test and practice 😀
At this time to manage memory block transitions between engine and application address spaces, I added sxAlloc object to the sxLib that can redirect memory allocators in sxMemory to the engine from other address spaces. So all default data containers represented in the last post will use engine address space and become safe to transport data between engine and other processes. however, I can still redefine any container with a different allocator to use anywhere I like.
here is the concept in the image:

Data transition between Engine and Application address spaces

SeganX Foundation

Well. after more attempts finally the SeganX foundation is now completed and ready to use. It separated into different groups and namespaces without any STL footprints.
At first glance, the reason is purely personal, but when you look deeply into this decision, you will find the most important technical advantage of writing the entire internal library!
Practice!
Just as every athlete needs to practice his skills, this exercise, in addition to increasing knowledge about data containers and algorithms, increases your coding skills and makes you think more and more about solving problems.

The SeganX foundation covers the tools listed below:

sxLib: the most independent library that contains the basic container classes and functions.

String: Standard String that uses standard OS memory management to store data.
String_inline: Fast String that uses a hybrid memory pool to store data. It’s suitable for local/temp variables in functions. and should not implemented in classes as a member!
String_256: Type of String that uses 256 fixed-size string.
String_512: Type of String that uses 512 fixed-size string.
String_1024: Type of String that uses 1024 fixed size string.

Array: Standard Array that uses standard OS memory management to store data.
Array_inline: Fast type of Array that uses a hybrid memory pool to store data. It’s suitable for local/temp variables in functions. and should not implemented in classes as a member!
Array_fix<T, MaxPush>: Fast type of Array that uses fixed size memory block to store/restore data.

List: Standard List that uses OS memory management to store/restore data.
List_256: Fast type of List that can hold a maximum 256 number of objects.
List_512: Fast type of List that can hold a maximum 512 number of objects.
List_fix<T, MaxPush>: Fast type of List that uses a memory pool to store/restore data.

Queue: Standard Queue that uses OS memory management to store/restore data.
Queue_fix<T, MaxPush>: Fast type of Queue that uses a memory pool to store/restore data.
Queue_256: Fast type of Queue that can hold a maximum 256 number of objects.
Queue_512: Fast type of Queue that can hold a maximum 512 number of objects.

Stack: Standard Stack that uses standard OS memory management to store data.
Stack_inline: Fast type of Stack that uses a hybrid memory pool to store data. It’s suitable for local/temp variables in functions. and should not implemented in classes as a member!
Stack_fix<T, MaxPush>: Fast type of Stack that uses fixed size memory block to store/restore data.

Here the Map container uses AVL Tree and as you know it’s one of the Self-Balanced Binary Search Tree. Although there are some drawbacks in insertion and deletion it has a fast search ( O(log n) ) and I will use its search frequently. here are some types of my maps:
Map<key, type>: Standard Map that uses OS memory management to store/restore data.
Map_256<key, type>: Fast type of Map that can hold a maximum 256 number of objects.
Map_512<key, type>: Fast type of Map that can hold a maximum 512 number of objects.
Map_fix<key, type, MaxPush>: Fast type of Map that uses a memory pool to store/restore data.

In addition, there are two memory pool types, three memory manager types, a base of streams a memory stream class, and some tools in sxClasses that are boring to describe.

sxSys: this library contains more system tools and functions:
Window class is a simple class to create and modify a window.
Application Is the main class that no need to create. this class is responsible for managing windows, running, and controlling the project.
System Functions are a series of functions to get system info, CPU cores monitoring, power management monitoring to keep the system wake up full, working with files, folders, disk drives, etc.
The other things that exist in namespace sxSys are threads and task manager, log system, and some additional useful functions for numbers and strings. these tools and functions need sxLib to compile.

some other parts, rendering system, engine, editor, etc are under development.

Gamescom exhibition 2010

There are some late because of my work on games in Sourena corporation and the pressure to prepare some demos for the Gamescom exhibition that was held in Germany. besides all of them thinking regularly about the design of SeganX makes me a sleepless machine.
anyway, the SeganX foundation is nearly complete, and the rendering part is on the way. The foundation contains memory systems, data structure algorithms and containers, operating system functions, application framework, task manager, file system, etc.
The rendering part is in progress that manages resources, devices, and any objects that we need to render a 3D scene.
The plan of IO is on paper and finally, I’m on my way with tired eyes =)

SeganMM

When I was a university student, a little project was suggested to us (me and my friend Hamed) that was supposed to have three-dimensional spaces and GUIs. Without effective tools to implement, the project production took too long (about 23 days). After that project, I decided to write some new tools to decrease the production lead time. I initially decided to produce a tool that can be used quickly and easily to create a three-dimensional scene and multimedia software. So SeganScene.exe has been built. Then I realized that good multimedia software should have a perfect GUI and a perfect GUI should also have pieces of music, sounds, videos, and an appropriate scripting language. So SeganMenu.exe has been built.

Over time and for each project, I added some parts to the collection. Whereas the opportunity to produce this software in each step was very low, rather than investigating and studying different fields, I decided to use innovative techniques by myself and so you know they may not work well 😉

Now the SeganMM (Segan MultiMedia) is just like an old man and I left out continue developing in Delphi. my new plan is to design and write some new tools and engines by VC++.

Some of the samples made by these tools are available here: http://www.hamed-khezrian.com/

And here are some screenshots:

: We used this software to make a 3d space, place cinematic cameras, apply shaders, etc.

: We used this software to implement GUI, Video, Sound, and Script.

: We used this software to implement GUI, Video, Sound, and Script.

: A simple auto presentation of Gallardo

: New Dream

: A simple auto presentation of Gallardo

: a simple sample of soft particles

: PCSS shadow test in Segan Scene

: My attempts to implement soft shadow

MemPool example

In the last article, I explained something about the concept of MemPool. Now I want to show you some optimization in design and a simple sample of MemPool that can be downloaded and tested.
Before testing the code imagine that we have a lot of chunks on the pool consecutively. in the last article, I introduced the body of a chunk just like a basket that contains the data. now the chunk can have a fixed design to have more readable code.

New Chunk Structure

As you can see we can place the address of the chunk in the head of the chunk instead of the end line of that and store the address of the behind chunk. all we try to do is make a chain of chunks by minimum size.

Here are the results of my test for the performance of this system for 100000 allocation/deallocation in release mode. I try to simulate a real environment for both the memory pool and OS memory system, but I don’t know how much it’s real. 😉
Also, the source code of this sample already exists at the end of this post. You may change and test it by yourself.

85~116 ms by using new/delete
10~17 ms by using MemPool

NOTE:
There is no force to use a general MemPool in your project! you can make a faster MemPool for each part of your application considering the needs. for instance, I write a fast hybrid MemPool which is faster. but the number of allocations for this hybrid MemPool is limited and must be defined in creation time.

Download: MemPool_test.rar

MemPool

Sometimes we want to allocate and deallocate some memory blocks frequently in high-performance applications, especially in games and game engines. in this case, regular functions like malloc/free may not provide as much speed as we like.
Memory pool ( MemPool ) provides a simple and fast memory management scheme for compact memory blocks chiefly. MemPool allocates a specified memory block at initialization time and manages that to use in allocate/deallocate functions instead of OS memory management.
MemPool has some advantages and disadvantages. the most important benefits of MemPool are providing high-speed allocations in real-time applications and deallocating all objects ( which are allocated by MemPool ) by releasing the main pool at once. The disadvantages of MemPool directly depend on how the programmer uses that. Using a fast memory pool system is not safe and has some restrictions that I will describe later.

There are many prepared memory pool libraries that you can search, download, and use in your applications. Recently I wrote some types of MemPool for various applications. One of them is more general to use; Thus in this simple article, I try to explain briefly the concept of my memPool structure.

The most important couple of methods that we want from MemPool are Alloc and Free. The first method will store a block of memory from the pool and the second will restore that to the pool and this system will manage every block in this trading.
To manage the pool we use some type of object called “Chunk” that contains some information about the memory block requested by the user. The other object is a simple list. We will use this list to store and restore the pointer of empty chunks in the pool. In this system every chunk has three parts and will be created in the pool, so we don’t need conventional new/delete operators to create them. The first part of the chunk contains some little info about the type of the chunk, size, and links. The middle part is an outgoing memory block to the user and the end section is a pointer to the head of the chunk.

Chunk Structure

Now let’s start playing with this pool and chunks to manage the system. At first, assume that the MemPool class was created and we allocated a large enough memory block for the pool. we create the first empty chunk on the pool set the data size of the chunk to the whole of the pool size ( except Its size because the chunk is carved on the pool ) and finally push this empty chunk to the list.

memory pool with one empty chunk

Every time the method Alloc is called by a user we will create a new chunk on the pool and return the pointer to the Data block in the middle of the chunk. Assume that the first request is received and we want to allocate a new chunk. To do this, we must search in the list of empty chunks and find a chunk that has enough space to allocate the requested memory block size. at this time the list contains only one empty chunk. we can change the type of the chunk from “empty” to “Used”, pop it from the list, and send out the pointer of the data block to the user. but its size is more than the requested size! In this case, we create a new empty chunk by the remaining size and push it to the list of empty chunks. now we have one used chunk and one empty chunk. notice that only one empty chunk is in the list!

memory pool with one allocated chunk and one empty chunk

Some new requests were received by calling the Alloc method. we perform the request just like above. notice that we have only one empty chunk in the list. As you can see in these images, I show the allocated chunks in orange color and empty chunks in white color. In this step, we have 3 allocated chunks and one empty chunk.

memory pool with 3 allocated chunk and one empty chunk

When the user wants to free the memory block number 2. the entry parameter, is the pointer of the block of a chunk, so to find the chunk object we should go back to the size of an object in bytes. Something just like this:

void CMemPool::Free(void* p)
{
PBYTE address = (PBYTE)p - sizeof(CChunk);
CChunk* pchunk = (CChunk*) address;
// ...
// some codes to merging empty chunks !
// ...
pchunk->type = chEmpty;
emptylist.push(pchunk);
}

After finding the chunk we change the type of chunk from “used” to “empty” and push it back to the chunk. now we have 2 used chunks and 2 empty chunks. As mentioned above there is no need to push used chunks in any list and we just list empty chunks.

memory pool with 2 allocated chunk and 2 empty chunk

In this important step, the user wants to free memory block 3. As you can see in the picture above, if we remove chunk 3 we have 3 empty chunks respectively and burst. what if this behavior continues? Then we have a lot of separated empty chunks with different little sizes and we can’t allocate a new chunk larger than any of them in size. The solution to this problem is merging empty chunks. to achieve this solution, when the user wants to free a memory block and we find the specified chunk to remove, we should look at the neighbors of the chunk and verify that they are empty or not. If they are empty, merge them and prepare the list of unused chunks in MemPool.

At first, to look for the right neighbor chunk we go to the end of the chunk and merge that with the current chunk. Merging these two chunks is easy by increasing the size of data of the current chunk and changing the block of address in the neighbor chunk. Pop the empty neighbor chunk from the list and push the current empty chunk to the list.

// look at neighbor chunk in right side of the current chunk
CChunk* rch = (CChunk*)((PBYTE)p + pchunk->size + ChunkAddressSize);
if (rch->type == chEmpty)
{
// change the address at the end of the chunk
PChunkPTR prch = (PChunkPTR)((PBYTE)rch + sizeof(CChunk)+ rch->size);
prch->chunk = pchunk;

// exclude the right chunk from the list
emptylist.pop(rch);
pchunk->size += rch->size + sizeof(CChunk) + ChunkAddressSize;
}

In the next step. to look for the neighbor chunk on the left side we just read the block of address behind the current chunk. this address will guide us to the left neighbor chunk and finally, merging these chunks is as simple as the previous step.
Now look at chunk 3; by freeing this chunk both left and right neighbors will merge and only one empty chunk will be in the list of empty chunks.

memory pool with one allocated chunk and one empty chunk

This is a simple concept of a memory pool system and it is certainly different from OS memory management. Using MemPool has some risks and restrictions. for instance, assume that the user allocates a memory block from the pool and wants to use it for a string. if the user traverses the characters of this string and changes data out of size, the chunk of the pool breaks down and may cause the system failure. Also writing some features for MemPool like reallocation methods and protection system for chunks makes the system speed low.

There are some optimizations that we can apply to this system. For example, probably we can use a map to store, restore, and search for suitable empty chunks in the alloc function. when we talk about speed and optimization to increase speed in milliseconds of the time units, attention to how we write our code is important. for example, decreasing temp object in functions for local variables and …

NOTE:
There is no force to use a general MemPool in your project! you can make a faster MemPool for each part of your application considering the needs. for instance, I write a fast hybrid MemPool which is faster, but the number of allocations for this hybrid Mempool is limited and must be defined in creation time.

Software Occlusion Culling

Software Occlusion Culling (SOC) is a part of the rendering 3D spaces pipeline. In this technique, objects that are hidden (for example over the wall) will removed from the pipeline and will not be painted. This approach will increase performance and create smoother experiences in movement for gamers.

What’s SeganSOC?
SeganSOC (Segan Software Occlusion Culling) is a free tool for easy implementation of occlusion culling systems in any type of game engine. The most important properties of this tool are easy to use and is independent of other parts of the engine. These specifications will improve the programmer’s ability and high flexibility in working on the engine. But it’s here in Tech Museum! because It’s written in Delphi and I may try to write a new soon in C++ with more optimization!

SeganSOC features:

Independent Software mesh with serialization functions.
Implemented simplifier function using a new method to reduce the number of triangles.
Use triangles cache to prevent drawing hidden triangles which are placed behind big triangles.

Software Rasterizer — Software Occlusion Culling

For more information and downloads click here …

Script machine

What is a script machine? why do we use that? which script machine is the best? and there are hundreds of questions about script machines.

Script machines can be a part of computer programs or game engines. It can load and execute external code after the program is compiled. these codes can be a series of commands or C-like codes. It would be beneficial for us because we can change some parts of the application by simply changing the script file even after the program is compiled and published. We can drive the gameplay in a game engine by writing script code without recompilation. Additionally, script commands can be simpler than a programming language and can speed up development.

Most script machines include a lexer, parser, syntax tree, semantic checker, command generator or byte code generator, and a virtual machine. you can use prepared lexer and parser libraries like BISON or COCO or any other here or write your lexer and parser ( just like me 😉 ). After that, you could parse the source code and create two important components in the next step. The first component is the Symbol Table and the second is The Syntax Tree. The symbol table works just like a container that holds any symbol used in your script. All constant and variable strings, numbers, functions, and classes will be stored in this container. By Syntax Tree you can store and represent the script structure. also, this tree will help you to control command order and priority which is very important in math calculations. Don’t worry about these steps because the prepared lexer and parsers library will help you create these components 🙂

It’s usual to generate a list of ByteCodes from the script in the next step. These ByteCodes are closely similar to assembly language and can be saved in a file to be used later by a virtual machine. They will be translated by a virtual machine that simulates the CPU. The advantage of using ByteCodes is that there are more controls over generation time. It would be easier to implement OOP and other new features in common languages. Also, you can find common ways to optimize the generated ByteCodes, etc. Here you can find more things about script engines.

The other way of compiling and running script codes is to generate a list of pointers to function instead of ByteCodes. In this approach, you can create a list of commands in the compiling step and just call command functions on running. Each command contains a pointer to a function and arguments. Some programmers try to build their script machine and would like to implement “pointer to function” in designing. Some of them believe that using “pointer to function” instead of “ByteCode” can obtain the fastest script even as fast as C++ language !!!

Well, I can’t accept that because the most important difference between compiling machine language and script commands is CPU features like CPU Caches, CPU Optimization, etc. The caches in the CPU are responsible for bringing functions and data beside the CPU and increasing access time to them for the CPU significantly. Using any list of commands or virtual machines could lead to the discarding of these features of modern CPUs.

Sajad Beigjani

Welcome to my personal blog.