Order of parameter evaluation, some pitfalls

Introduction

In the C++ world, it’s a good idea to create multiple build configurations for your projects to try out different compilers. Especially when your project is open-source, you never know which compiler (version) people will use in the build process, so being conservative with compilers might lead to bug reports which are difficult to close. Not being open to bugs occuring with other compilers can even lead to memory leaks not being noticed. In this post, I’m going to elaborate on this, using as an example a bug I ran into the day before.

The bug

Let me first describe the scenario in which the bug occurred. The game fruitcut that I’m currently working on (shameless plug detected!), loads md3 model files to C++ structures and uploads these structures to the graphics card for rendering. The md3 loader always worked fine in gcc (multiple versions) and Microsoft Visual Studio 2010 (MSVC2010), but it had one peculiar bug in all versions of clang, even the latest trunk.

Wrong texture coordinates on a melon (in-game screenshot)

Wrong texture coordinates on a melon (in-game screenshot)

As you can see on the left, the mesh looks alright, but the texture coordinates are clearly wrong. Since this bug hasn’t been fixed in 7 months (due to not caring much about clang, to be honest), I decided to investigate. I quickly found out that the texture coordinates’ axes are flipped in clang, so instead of (x,y) we somehow get (y,x) on that compiler.

What a strange bug! This “flipping”, however, can basically only occur in two places: 1. when reading the file and storing the texture coordinates in the internal vector structure, or 2. when the loaded model is uploaded to the GPU. I decided to look at the code for step 1 first. A single (x,y) texture coordinate pair is defined as follows:

class md3_texpos
{
private:
  vector<float,2> coordinates_;
public:
  md3_texpos(
    std::istream &_istream)
  :
    coordinates_(
      md3_read_scalar<float>(
        _istream),
      md3_read_scalar<float>(
        _istream))
  {
    std::swap(
      coordinates_.x(),
      coordinates_.y());
  }

  vector<float,2> coordinates() const
  {
    return coordinates_;
  }
};

The coordinate consists of a two-dimensional vector and, on construction, gets an input stream to read its data from (whether or not this is good object-oriented style is incidental). The function md3_read_scalar<T> reads raw bytes and converts them to a numeric type T, taking care of endianness conversion. md3 stores values in little endian, the host machine might be big endian, though, so this is important.

Now, although I actually am the author of this code (thanks, git-blame!), I was still puzzled by the call to std::swap in there. There was no comment, so I didn’t know what it was supposed to do. It contradicted the md3 specification. So the first thing I did was remove it and recompile with clang and gcc. Voila, the texture coordinates were now broken in gcc and fixed in clang. The reason was now clear to me. It has to do with C++ leaving certain things implementation-defined. In this case, it is argument evaluation order.

It’s actually pretty simple. Consider the following code fragment, which has three function calls in it:

f(g(),h());

The question is: In which order do the three functions get called? Clearly, f has to be called last, because its arguments have to be initialized before the call. The C++11 standard enforces this in 1.9 15:

When calling a function (whether or not the function is inline), every value computation and side effect associated with any argument expression, or with the postfix expression designating the called function, is sequenced before execution of every expression or statement in the body of the called function.

However, both g → h → f and h → g → f are possible evaluation orders which satisfy that requirement. What does the Standard make of that? Well, for reasons such as performance optimizations by reordering, the order is simply not defined, see 5.2.2 4:

When a function is called, each parameter (8.3.5) shall be initialized (8.5, 12.8, 12.1) with its corresponding argument. [ Note: Such initializations are indeterminately sequenced with respect to each other (1.9) — end note ]

Thus, the implementation (i.e. the compiler) has two options for translating the constructor initialization list of md3_texpos, shown in pseudocode below:

md3_texpos(                                md3_texpos(
  std::istream &_istream)                     std::istream &_istream)
{                                           {
  float x =                                   float y =
    md3_read_scalar<float>(                     md3_read_scalar<float>(
      _istream);                                  _istream);
  float y =                                   float x =
    md3_read_scalar<float>(                     md3_read_scalar<float>(
      _istream);                                  _istream);
  coordinates_ =                              coordinates_ =
    vector<float,2>(x,y);                       vector<float,2>(x,y);
}                                           }

Of course, md3_read_scalar is not a pure statement (see my post on this). It modifies the input stream, advancing it by 4 bytes. This makes the implementation on the right side behave incorrectly, although it’s valid for the compiler to generate such code. The mysterious std::swap in the original code can now be explained. Both gcc and MSVC2010 chose the right implementation which necessitates a swap, clang chose the left one (at least in this case).

To fix the bug, I rewrote the class so we store the vector’s components separately:

class md3_texpos
{
private:
  float x_,y_;
public:
  md3_texpos(
    std::istream &_istream)
  :
    x_(
      md3_read_scalar<float>(
        _istream)),
    y_(
      md3_read_scalar<float>(
        _istream))
  {
  }

  vector<float,2> coordinates() const
  {
    return vector<float,2>(x_,y_);
  }
};

This fixes the issue and thanks to encapsulation, the API doesn’t change!

Memory leaks

The bug described above isn’t a critical one, it didn’t make the application crash or cause other serious failures. Being careless in combination with the new operator, however, can easily result in memory leaks. Consider the following code snippet:

// Function declaration:
void f(int,std::shared_ptr<T>);

// Function call
f(g(),new T());

As I’ve described above, this (loosely) translates either to

int u = g();
T *t = new T();
f(u,std::shared_ptr<T>(t));

or to

T *t = new T();
int u = g();
f(u,std::shared_ptr<T>(t));

As an exception-safe programmer, you might have already spotted the problem: The second implementation leaks memory if g() throws an exception, because nobody deletes t! And as you can see, using smart pointers like std::shared_ptr<T> does not automatically fix the leak, because the raw pointer is converted to a shared pointer after the throw statement in g.

There are two ways to solve this problem. The first is to immediately convert the raw pointer to a smart pointer:

f(g(),std::shared_ptr<T>(new T()));

This way, either…

  1. g() is executed
  2. g() throws
  3. new is never called

…or…

  1. a new T is constructed on the heap
  2. the raw pointer to the new T instance is stored inside a std::shared_ptr<T>.
  3. g() is executed
  4. g() throws
  5. The destructor of the std::shared_ptr<T> causes a delete t;
  6. No leak occurs

The second solution is std::make_shared<T>, a variadic function that returns a std::shared_ptr<T>:

f(g(),std::make_shared<T>());
Posted in Uncategorized | Tagged , , , , , , , | Leave a comment

Pure functions in C/C++

Introduction

Functions that only depend on their parameters and not on any global state are usually called pure functions. An example of a non-pure function is this:

int counter = 0;

int return_global_counter(int a)
{
  return counter++;
}

This function is a corner case. It doesn’t depend on the parameters at all and it returns a different value each time it is called. Functions returning void are another example of non-pure functions.

There are lots of examples of pure functions. For example, mathematical functions such as abs, pow and the operator+ for integers are pure. In fact, I’m telling the story backwards here. Pure functions are supposed to model mathematical functions, which are inherently pure (which “mathematical global state” should they depend on, anyway).

The question is, how are pure functions represented in C/C++. The canonical answer is: They aren’t. There’s no keyword to denote or enforce the purity of a function, so from a Standard’s viewport, this article ends here. However, almost all major compilers (except MSVC, apparently) support extensions that allow one to mark a function as pure.

In fact, there are two “purity attributes” that you can add: const and pure.

pure and const

I’ll just quote the gcc manual to explain what “const” means:

Many functions do not examine any values except their arguments, and have no effects except the return value. Note that a function that has pointer arguments and examines the data pointed to must not be declared const. Likewise, a function that calls a non-const function usually must not be const. It does not make sense for a const function to return void.

As you can see, this models purity as explained above, with some sensible constraints (for example, you cannot “break out of” purity by calling non-pure functions). The pointer constraint, however, is very restricting, especially in C++, because references seem to be treated as pointers in this context, too. I couldn’t find any good information on that, though.

The second attribute is pure and it means the following:

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables.

This is a less strict form of const. We’re allowed to read global variables, but not write to them.

You can attach these attributes in the following form to a function:

int output(int x) __attribute__((const));

int output(int x)
{
  return x+1;
}

int output2(int x) __attribute__((pure));

int output2(int x)
{
  return x+1;
}

Note that attaching an attribute to a function definition is illegal.

Implications

In theory

Determining which functions are pure/const is a vital part of the optimization process every modern compiler goes through. It is used, for example, in simplifying the loop:

for(container::iterator it = container_.begin(); it != container_.end(); ++it)
{
  ...
}

For most containers, the end member function is pure. It just returns a pointer to one-past the last element. This information allows the compiler to optimize away all but one call to the end function:

for(container::iterator it = container_.begin(),end = container_.end(); it != end; ++it)
{
  ...
}

This, of course, assumes that the loop body doesn’t do anything “nasty” like clearing the container.

The problem with this optimization is that it doesn’t scale to multiple translation units. Let’s assume we have this loop:

for(int i = 0; i < fibonacci(100); ++i)
{
	...
}

Since fibonacci(100) is a constant value, the compiler would like to optimize it this way:

for(int i = 0,end = fibonacci(100); i < end; ++i)
{
	...
}

Thus saving many unnecessary calculations. However, let’s say fibonacci is defined in a different translation unit (in a separate math.cpp file maybe). Then the compiler just cannot optimize it. It doesn’t have the information available to infer that this function is const. Bummer.

This is where the function attribute kicks in. Declaring the function like this:

int fibonacci(int) __attribute__((const));

allows the compiler to optimize the loop.

In practice

The question remains: Do the compilers give a crap? Let’s see what happens in a simple example:

#include <iostream>

int output() __attribute__((const));

int output()
{
    std::cout << "lol | ";
    return 1;
}

int main()
{
    output();
}

Compiling this with g++ -Wall test.cpp results in the following output:

test.cpp:14:11: warning: statement has no effect [-Wunused-value]

How did that happen? Well, pure and const functions just read data, they do not write to it. Also, “by contract”, they do not modify the environment (print values and such things). So calling a pure/const function and not using the result simply doesn’t make any sense, which is what gcc (and clang) is trying to tell us here.

That also explains that when we run the program, nothing is printed. Even without optimization enabled, the whole function call gets eliminated (again, by both clang and gcc). Apparently, this type of dead code is removed in a very early stage of analysis.

Let’s modify the example so the function cannot be completely eliminated:

#include <iostream>

int output() __attribute__((const));

int output()
{
		std::cout << "lol | ";
		return 1;
}

int main()
{
	std::cout << output() << " | " << output();
}

What happens here is very interesting. Let’s go though all the optimization/nonoptimization and clang/gcc cases:

Output with -O0 Output with -O3
gcc lol | lol | 1 | 1 lol | 1 | 1
clang lol | 1 | lol | 1 1 | 1

Apparently, gcc doesn’t remove the cout statement in output no matter how hard you tell it to optimize. clang removes the statement when at least -O1 is enabled. gcc also seems to reorder the code a bit so the cout statements are evaluated first and then the values in main are printed. This is, of course, permitted since declaring the function const and then “breaking the contract” implies undefined behavior.

Note that the compilers do not issue a warning about the illegal cout statement in the const function. Purity isn’t checked or enforced, unfortunately.

Random notes

Note that in gcc-4.6.3 there’s a warning flag -Wsuggest-attribute=pure and -Wsuggest-attribute=const that tells you (by using optimization analysis) candidates for purity/constness. This seems to work, but it “misses” a lot of functions. For example, trivial getter functions such as the getI function below are candidates for purity (not constness, though):

class testclass
{
private:
  int i;
public:
  int getI() const { return i; }
};

This doesn’t get suggested by gcc, though.

Be cautious when using pure and const with template classes. If you have a container template, for example, you’re better off making no assumptions on the container’s value_type. Maybe the user puts a class in it and expects certain copy semantics, for example. These would be indeterministic with pure and const attributes.

Also, be cautious with const and classes. The simple getter function above is a candidate for pure, but not for const, as this example shows:

class testclass
{
private:
  int i;
public:
  testclass() : i(0) {}
  void increase() { i++; }
  int getI() const __attribute__((const));
};

int testclass::getI() const { return i; }

int main()
{
  testclass c;
  std::cout << c.getI();
  c.increase();
  std::cout << c.getI();
}

This will output “00” instead of the correct “01”. However, returning a reference might make the getter a candidate for const:

class testclass
{
private:
  int i;
public:
  testclass() : i(0) {}
  void increase() { i++; }
  int const &getI() const __attribute__((const));
};

int const &testclass::getI() const { return i; }

int main()
{
  testclass c;
  std::cout << c.getI();
  c.increase();
  std::cout << c.getI();
}

This will output “01”, as the reference does not change because of the call to increase.

Another minor thing: Not all mathematical functions from math.h are const, or even pure. sqrt, for example, sets errno on an invalid input, and thus modifies global state.

Posted in Uncategorized | Tagged , , , , , , | 6 Comments

Assigning OpenCL parameters by name

In this article, I’ll explain to you a few ways of implementing “smart” parameter assignment to OpenCL kernels. I’ll not explain in detail what OpenCL is, just enough to give you the problem context.

OpenCL is a framework for implementing parallel algorithms. Currently, the only language you can implement these algorithms in is “OpenCL-C”, which is C99 with some additions (such as vector types) and restrictions (recursion is not allowed). So you’ve got your “host-side code” that runs on the CPU and is written in C, C++, Java or whatever, and your OpenCL-C programs that do the computing work. The OpenCL programs can run on the CPU, the GPU or some other parallel system.

In your OpenCL-C code, you define special functions called kernels that are callable from the outside. Kernels have no return value, but they otherwise act just like normal C functions. They take parameters and can call other normal C functions.

Let’s say we’ve got the following OpenCL kernel (global and kernel are new keywords):

kernel void add_buffers(
  global float const *source1,
  global float const *source2,
  global float *destination)
{
  destination[get_global_id(0)] =
    source1[get_global_id(0)] +
    source2[get_global_id(0)];
}

The kernel takes two float buffers, adds them together and stores the result in a third buffer. We save this program to a file called myfile.cl.

In the C++ Code, we do something like this to start an OpenCL calculation (pseudocode):

// Create the three buffers. You would, of course, fill them
// with meaningful values. I'll skip that step, however.
cl::buffer first_buffer = cl::create_buffer(256 * sizeof(float));
cl::buffer second_buffer = cl::create_buffer(256 * sizeof(float));
cl::buffer third_buffer = cl::create_buffer(256 * sizeof(float));

// Load program from file and compile it
cl::program p = load_program_from_file("myfile.cl");
p.compile();

// Load a specific kernel from the program (kernels are separate objects!)
// You'll get an error if there is no kernel called "add_buffers"
cl::kernel foo = p.create_kernel("add_buffers");

// Set the kernel arguments to our pre-allocated buffers
foo.argument(0,first_buffer);
foo.argument(1,second_buffer);
foo.argument(2,third_buffer);

// Create 256 instances of the program (one for each array element)
foo.start(256);

I hope the pseudocode is readable enough. The actual OpenCL-API is too verbose to be written down here.

The main problem with the code is the parameter passing to the kernel:

foo.argument(0,first_buffer);
foo.argument(1,second_buffer);
foo.argument(2,third_buffer);

As you can see, you have to specify the arguments’ positions, not the names, which is extremely error-prone. You often copy&paste the parameter passing, you might end up with:

foo.argument(0,first_buffer);
foo.argument(1,second_buffer);
foo.argument(1,third_buffer);

This compiles and runs just fine, it’ll just do The Wrong Thing.

Another thing to worry about is type-safety. If you accidentally assign a parameter of the wrong type, you either get a slightly cryptic error message (which is fine) or the code, again, compiles and does something strange.

To mitigate the first problem – the positional parameters – a few solutions exist:

  1. OpenCL-1.2 includes a function to get the name of a parameter from the index, see clGetKernelArgInfo. However, virtually no vendor ships with 1.2 yet, so that’s not a real solution.
  2. Use an index to assign parameters:

    int param = 0;
    foo.argument(param++,first_buffer);
    foo.argument(param++,second_buffer);
    foo.argument(param++,third_buffer);
    

    maybe even put all of this inside a macro so you don’t forget to increment the index. Many people do it that way. It has obvious drawbacks, though.

  3. Parse the OpenCL-C code yourself and extract the parameters. OpenCL-C is “just” C99, so it’s not as hard to parse as, say, C++. But it’s still a huge amount of work, just to parse the function headers.
  4. Decorate the OpenCL-C code with easily-parseable macros.

I have implemented the last solution and am quite happy with it so far. The trick is to define macros in your OpenCL-C code (yes, OpenCL-C is preprocessed just like C99), like this:

#define KERNEL_NAME(name) name
#define KERNEL_ARGUMENT(name) name

Then, you write your kernel like this:

#define KERNEL_NAME(name) name
#define KERNEL_ARGUMENT(name) name

kernel void KERNEL_NAME(add_buffers)(
  global float const *KERNEL_ARGUMENT(source1),
  global float const *KERNEL_ARGUMENT(source2),
  global float *KERNEL_ARGUMENT(destination))
{
  destination[get_global_id(0)] =
    source1[get_global_id(0)] +
    source2[get_global_id(0)];
}

In your C++ code, you load and build your program as we saw above, no change here. The macros will expand to the parameter given, so they don’t disturb the parsing process.

Then, we read in the cl file and so some parsing. We extract the first kernel name by searching for KERNEL_NAME in the program. We create an empty vector of strings for the upcoming kernel parameters. We search for occurrences of KERNEL_ARGUMENT until we find the next KERNEL_NAME. Every argument we find we push to the end of the vector. For the next kernel name, we do the same.

Using this algorithm, you get a mapping from the kernel names to their arguments, and the arguments automatically get indices. We can now write:

foo.argument("source1",first_buffer);
foo.argument("source2",second_buffer);
foo.argument("destination",third_buffer);

Inside the kernel::argument function, we check if the given arguments exist and retrieve its index. Voila. 🙂

Furthermore, we could extend the KERNEL_ARGUMENT macro to include not only the name, but the type:

#define KERNEL_ARGUMENT(type,name) type name

// Example usage
KERNEL_ARGUMENT(global float2 *,foo)

And then we could do type checking, too. This is, of course, a lot more complicated.

Posted in Uncategorized | Tagged , , | Leave a comment

Some main facts

Introduction

Every C or C++ programmer knows about the main function. It’s probably the first function every programmer sees. Usually, main is introduced in the following form:

int main(int argc,char *argv[])
{
}

The main function is the application’s “designated start”, meaning that’s basically where the program starts its execution. Apart from that, it’s a very old function and has some very special properties, which I’m going to address here.

Random facts

Simple facts

First of all, main is not necessarily the first function that’s executed in your code. Consider the following simple example:

#include <iostream>
// Include ostream for operator<, since we're so pedantic 🙂
#include <ostream>

namespace
{
class foo
{
public:
  foo()
  {
    std::cout << "Mother of god, where's main?\n"
  }
};

foo f;
}

int main(int argc,char *argv[])
{
  std::cout << "I'm here!\n";
}

This will output:

Mother of god, where's main?
I'm here!

This must happen since global objects have to be initialized before main starts (as a side note, the name argv stands for “argument vector”, and “vector” stands for “array”).

Additionally, you don’t have to have a main function in your program. If you’re designing a library, there’s no need for a starting point. If you’re compiling and linking an executable, however, you’ll get linker errors if main is missing.

What we definitely know

First of all, what do the parameters to main really mean? If you know C/C++ long enough, you might have developed a certain hesitation when it comes to reality vs. formal definitions. For example, one might expect float and double values to be encoded in the standard IEEE 754 floating point encoding, and that’s certainly true for most systems. But the standard will tell you nothing of the likes. Instead, floating point values are completely implementation defined. They could be stored as random bit sequences, the Standard doesn’t care, it’s still valid.

Fortunately, when it comes to main, everything’s pretty much as you expect it to be: The argc parameter denotes the number of arguments passed to the program and argv denotes the parameters themselves. It was a little surprising to me that even argv[0] is well-defined. It’s either empty or it’s the name used to invoke the program. Of course, even if argv[0] is not empty, it’s not a name you can rely on. For example, it doesn’t necessarily contain an absolute path to the executable or even the name of the executable file on disk.

The following code is also correct:

int main(int argc,char *argv[])
{
  while(argv)
  {
    char *current_parameter = *argv;
    std::cout << current_parameter << "\n";
    argv++;
  }
}

The program will print the command line argument it’s given. This works because it is guaranteed that argv[argc] is NULL.

main‘s return code is, mostly implementation defined. There are two values that designate a “successful” program termination: 0 and EXIT_SUCCESS. The latter is defined in cstdlib (or stdlib.h if you’re a C guy). There’s one return value that designates nonsuccessful termination: EXIT_FAILURE (also defined in cstdlib or stdlib.h).

Last but not least, there are two main prototypes that every implementation must accept:

int main();
int main(int,char*[]);

There might be other correct prototypes of main (meaning they’re called main and return an int, but they take other parameters), but they’re not portable. main cannot be overloaded, so you have to define exactly one main function in your program.

Note that technically, the following declarations are also correct:

main();
main(int,char*[]);
main(void);

Simply because in C, a function without a return type returns an int by default. This is ill-formed in C++, however. The notation main(void); is a shorthand for “takes no parameters” in C (originally, main() meant the same as main(...), so “take arbitrarily many parameters”).

Historical oddities

As I said, the main is a very old concept. It has been in C since its inception. Because of that and because of backwards compatibility, main admits some idiosyncracies. For example: Why is the argc parameter a signed integer? Can it be negative? Are there even valid use cases where this makes sense? The answer is no. The standard says it cannot be negative. It is an int, however, for historical reasons. You see, main was there before there even was unsigned. It’s like going back to prehistoric times!

Another thing is the argv parameter. Since main is C, its type is char*[] (or char**) and not a nice std::vector<std::string> which we’d all appreciate much more. However, a little stranger to me is the fact that it’s not a constant pointer. Again you may ask yourselves, is there a use case where we might change the parameters passed to the program? Is that even defined? Again, the answer is no. The pointer isn’t const because main simply is older than const.

It’s a lack of the holy const-correctness, and that might even bother you for a non-ideological reason: you cannot convert the argv pointer to a const char ** without a const_cast! And, evil as I am, I leave it up to you to figure out why. Anyhow, that’s why the main.m files of most of the Cocoa Objective-C programs look like this:

int main(int argc, char *argv[])
{
  return NSApplicationMain(argc, (const char **) argv);
}

main and return are also a strange couple. In some really bad programming books, you stumble upon the following main definition:

void main()
{
}

This is wrong but it might compile on some lenient compilers. What drives people to write it that way, you might ask. I think one of the reason is that the following program is valid C/C++:

int main()
{
}

Y U NO RETURN ANYTHING?!

You might think that this is undefined behavior, and it is! … in general. With main however, it’s fine. It’s equivalent to:

int main()
{
  return 0;
}

Another historical oddity.

Other mains

As I said above, you don’t really need a main. Libraries, for example, are main-less almost by definition. But there’s another unfortunate use case for not having a standard main as described above, and that’s when you’re building a Windows application.

On Windows, you have two choices when creating an application (actually, when linking an application). You can create either a console application or a graphical application. You might think that the difference lies in what you can do with your program. Do fancy GUI stuff versus boring console text output. But that’s not really true. You can create a console application that creates windows as usual. In fact, a console application creates at least one window: A terminal window showing the output of stdout and stderr. For developers, a handy tool. For end users, “watching shit scroll down their screen” is usually not very healthy. To supress the console window, create a graphical application.

And here’s the catch: A graphical application doesn’t have a main as defined above. It’s not even a non-portable main taking parameters other than the familiar (argc,argv) pair. It’s a whole new beast, and its name is WinMain. If you haven’t got one in your graphical application, you’ll get a linker error.

Its prototype looks intimidating:

#include <windows.h>

int CALLBACK WinMain(
  HINSTANCE hInstance,
  HINSTANCE hPrevInstance,
  LPSTR lpCmdLine,
  int nCmdShow
);

A lot of cryptic macros in there, not really self-explanatory. At least it returns an integer! I’ll not explain it in intimiate detail, just the main points: the first two parameters, hInstance and hPrevInstance, are not that important. If you need hInstance, you can get it from a function instead of having them passed in here. hPrevInstance is always NULL.

The next parameter is the equivalent of argc and argv. Let’s read it together: LPSTR: A long pointer to a string. Just one string, you ask? Yes, it’s not neatly packaged into parts, it’s one string. It’s not even a UTF-16 string (everything else is on Windows).

The last parameter is a hint to the application stating how it is to be shown when it starts. This parameter might indicate that the application starts minimized, for example. I know of no way to get to this value using a function, so you might want to consider passing this along to your window creation subsystem.

The biggest annoyance with this, however, is the very first line of the code above: windows.h. This header is so extremely crappy, I have almost no words for it. It’s full of macros that have very common names. For example, it defines min and max macros, so code using these words will break (for example, std::min(a,b)). It also defines near, so your near viewing planes might get replaced by random crap.

Isolating WinMain

So my advice: Isolate the code that uses windows.h very cleanly from the rest (if you’re not going to use the WinAPI, that is). I’ll give a few hints on how to do that effectively.

There should be one file designated just for the main function. This file should use main on POSIX systems like Linux and Mac OSX, and WinMain on Windows. We want to forward the main function to our own application starting function and pass along all the information we get in main. On POSIX systems, that’s just argc,argv and on Windows, we have the nShowCmd parameter. What we get is the following:

#ifdef _WINDOWS
#include <windows.h>
#include <stdlib.h>
#endif

int the_real_main(int,int,int);

#ifdef _WINDOWS
int CALLBACK WinMain(
  HINSTANCE hInstance,
  HINSTANCE hPrevInstance,
  LPSTR lpCmdLine,
  int nCmdShow
)
{
  int argc = __argc;
  char *argv[] = __argv;
#else
int main(int argc,char *argv[])
{
  int nCmdShow = -1;
#endif
  return the_real_main(argc,argv,nCmdShow);
}

As you can see, we get our argc,argv back from the Windows runtime using the internal variables __argc,__argv which are defined in stdlib.h. Also, on POSIX systems, we (randomly) assign -1 to nCmdShow. The window system (whatever you use) should only access that variable when we’re on Windows, in which case it’s the parameter you receive in WinMain. Now, somewhere in your program, you have your the_real_main function that’s almost completely agnostic of what operating system we’re on.

Almost, you say?

Well, yes. We still need some special behavior for Windows (not just the nCmdShow stuff). On Windows, we cannot just return from main as usual. We have to signal that we’re quitting the application with an exit code and then wait for the quit message, containing the real exit code. Oh dear. Also we might get the quit message before the program is done. Of course, in that case, we also return the message’s exit code parameter.

To signal our application exit, we can explicitly post Windows’ application quit message, WM_QUIT using the function PostQuitMessage. This function gets the exit code as its parameter. After sending the message, your message dispatching loop should wait for the WM_QUIT message and then break.

A note about Qt

To a Qt programmer, all this WinMain stuff sounds unnecessary. Almost all Qt examples just use a main function and everything still works as expected (so you don’t have a console window open all the time). I was curious on how they achieve that and did a little digging. It turns out that if you specify to build your application as a graphical application and you don’t create a WinMain, you will get a linker error unless you link a special library to your program. This suggests that this special library contains the WinMain function which calls the user main function. If that is the case, then the Qt guys rely on undefined behavior, since even forward-declaring main is prohibited, let alone call it from the outside. One might correct me if I missed something here.

Posted in Uncategorized | 12 Comments

chrono and sge timers

Abstract

This (relatively short) article will explain to you the concepts of chrono, the “time library” of the latest C++ standard. It will also explain how timers in sge work.

Motivation

chrono is defined in the latest standard, but until you decide to use C++0x instead of the good olde C++03, you need to use an alternative implementation (luckily, implementing chrono doesn’t require other C++0x features – it stands on its own).

There are two implementations of chrono: fcppt::chrono and boost::chrono. The latter, however, requires boost-1.47.0, so we’re going to use fcppt::chrono. Both are virtually identical, anyway, since they both implement the same standard.

We want to “manage time”. This means that we don’t care about the current day of month or how many days there are until christmas. Things are simpler and more abstract in chrono. We’ve got three “concepts” to learn:

  1. Clocks
  2. Time points
  3. Durations

Let’s start with clocks. A clock is something that you can ask for the current time. But how do you define “current time” in abstract terms? Surely we won’t get back a string “07/30/2011 14:58:58”, since we want to do calculations with the return type.

So instead, you get a number which tells you how much “time units” have expired since the clock’s “epoch”. If you’ve heard about the Unix Timestamp, you’ll immediately feel at home. This particular timestamp measures the seconds (or sometimes milliseconds/microseconds, depending on the book you read) since 01/01/1970 0:00. That’s a number you can do calculations with! For example, taking two Unix Timestamps and subtracting them gives you the difference in seconds between two time points. Using that, we could define:

typedef
long
time_point;

typedef
long
duration;

class unix_clock
{
public:
  time_point now() const
  {
    return current_unix_timestamp();
  }
};

This is, of course, very rudimentary. It would be better if time_point and duration were classes with overloaded operators so we can write things like:

unix_clock clock;
time_point current = clock.now();
duration difference = clock.now() - current;

And it would print out how much time has passed between the subtraction and the assignment (probably “0” for most clocks).

Compile time fractions, wtf?

Apart from the Unix Timestamp, there are other clocks imaginable. For example, a clock could define the “epoch” as the duration in seconds since the system start, measuring these seconds as float. Or we could have a clock having the epoch at 0 a.d., measuring in days. And so on.

All of these clocks have different native durations (seconds in integers, seconds in float, days in integers, …). We cannot, however, convert between these different durations yet, since we have no specification on how they relate to each other. They’re just numbers as of yet.

That’s why in chrono, a duration has another piece of information included in its type: A fraction “1/m” telling you how the duration relates to “1 second”. For example, a duration

chrono::duration
<
  int,
  ratio::object<1,1000>
>

tells you that the integer stored in it represents “milliseconds”. As you can see, these fractions are compile time properties, just as the duration of a clock is a compile time property.

Getting more concrete

Clock API

Let’s look at the definition of a clock in fcppt::chrono:

class clock
{
public:
  typedef implementation_defined rep;

  // m and n are implementation defined
  typedef ratio::object<m,n> period;

  typedef 
  chrono::duration
  <
    rep,
    period
  > 
  duration;

  typedef 
  chrono::time_point
  <
    clock
  > 
  time_point;

  static bool const is_steady = false;

  static time_point
  now();
};

Let’s go through it one by one.

The first typedef, rep, is the numeric value of the clock (that’s the type calculations are done with). This is usually something pretty large, like a 64 bit integer, so the clock doesn’t “wrap around” too quickly. But as with durations, this can be a floating point type, too!

The second typedef, period, is the fraction I talked about in the previous section.

Then we have durations and time points. Time points just know about the clock type.

The boolean is_steady tells you if the clock guarantees “steadiness”. This means that for two time points t1 and t2, returned by the now function, t1 <= t2 must hold. Additionally, the time between clock ticks must be constant.

Duration API

The duration type looks like this:

template
<
  typename Rep,
  typename Period
>
class duration
{
public:
  typedef Rep rep;
  typedef Period period;

  /// Default constructs a duration with an undeterminate value
  duration();

  /// Constructs a duration from a compatible internal representation
  /**
   * For example seconds(10) will result in a duration representing 10 seconds
  */
  template<typename Rep2>
  explicit duration(
    Rep2 const &amp;);

  /// Constructs a duration from another compatible duration and converts if necessary.
  template<typename Rep2,typename Period2>
  duration(
    duration<Rep2,Period2> const &amp;);

  /// Returns the internal representation
  rep
  count() const;

  duration
  operator+() const;

  duration
  operator-() const;

  duration &amp;
  operator++();

  duration
  operator++(int);

  duration &amp;
  operator--();

  duration
  operator--(int);

  duration &amp;
  operator+=(
    duration const &amp;);

  duration &amp;
  operator-=(
    duration const &amp;);

  duration &amp;
  operator*=(
    rep const &amp;);

  duration &amp;
  operator/=(
    rep const &amp;);

  duration &amp;
  operator%=(
    rep const &amp;);

  duration &amp;
  operator%=(
    duration const &amp;);

  /// The duration with a zero value
  static duration
  zero();

  /// The duration with the minimum value
  static duration
  min();

  /// The duration with the maximum value
  static duration
  max();
private:
  rep rep_;
};

As you can see, you can get the numeric value from the duration and do calculations with durations like subtracting them from another. These calculations, however, can only be performed with durations of the same type. Stuff like…

fcppt::chrono::seconds myseconds(1);
myseconds -= fcppt::chrono::milliseconds(10);

is not possible. However, there are binary variants of these operators so you can subtract/add/… two arbitrary durations:

result_duration_type result = 
  fcppt::chrono::seconds(1) - fcppt::chrono::milliseconds(10);

But what’s result_duration_type, you ask? Well, in general, that’s not so simple. The result will be a duration with at least the resolution milliseconds, so no information is lost. If you want to convert the result to a coarser duration, you have to use duration_cast. So, put simply, something sane will be produced and you cannot shoot yourself in the foot that easily.

Time point API

template
<
  typename Clock,
  typename Duration
>   
class time_point
{
public:
  typedef Clock clock;
  typedef Duration duration;
  typedef typename duration::rep rep;
  typedef typename duration::period period;
  
  time_point();
  
  /// Constructs a time_point from a duration.
  /**
   * This duration is interpreted as if it were obtained from time_since_epoch().
  */
  explicit time_point(
    duration const &amp;);
  
  /// Constructs a time_point from a compatible time_point.
  /**
   * This may convert if necessary.
  */
  template<typename Duration2>
  time_point(
    time_point<clock, Duration2> const &amp;);
  
  /// Returns the duration from the beginning of the Clock to this time_point
  duration
  time_since_epoch() const;
  
  time_point &amp;
  operator +=(
    duration const &amp;);

  time_point &amp;
  operator -=(
    duration const &amp;);

  /// The minimal time_point
  static time_point
  min();

  /// The maximal time_point
  static time_point
  max();
private:
  duration d_;
};

As you can see, a time point is just a wrapper around a duration, really. But I wanted to show you its api, anyway. As with durations, there are binary operators for subtracting time points or adding a duration to a time point which result in a duration (in the clock’s duration type) and a new time point, respectively.

Example: Limiting your game’s frame rate

To show the expressivity of chrono, I’m going to show you how to limit your game’s frame rate so that it doesn’t exceed a certain number of frames per second (yes, we should be using vertical sync and a blocking swap to do that. Let’s pretend those mechanisms don’t work).

We have to decide on a clock to use. chrono provides a few predefined clocks, among them high_resolution_clock. This clock might not be steady (see above), but since we’re dealing with small time deltas here, a high-resolution clock seems like a wise choice (its internal duration ratio is nanoseconds).

Let me show you the code and figure out for yourselves how it works (this is an exercise, not laziness on my side ;))

#include <fcppt/chrono/high_resolution_clock.hpp>
#include <fcppt/chrono/duration_cast.hpp>
#include <fcppt/chrono/duration.hpp>
#include <fcppt/chrono/duration_arithmetic.hpp>
#include <fcppt/chrono/time_point.hpp>
#include <fcppt/chrono/time_point_arithmetic.hpp>
#include <fcppt/chrono/seconds.hpp>
#include <fcppt/time/sleep_duration.hpp>
#include <fcppt/time/sleep.hpp>

typedef fcppt::chrono::high_resolution_clock clock_type;

typedef clock_type::rep fps_type;

fps_type desired_frames_per_second = 
	60;

// Divide by a rep, get a new duration
clock_type::duration minimum_frame_length(
	fcppt::chrono::duration_cast<clock_type::duration>(
		fcppt::chrono::seconds(1)) / desired_frames_per_second);
	
clock_type::time_point const before_frame = 
	clock_type::now();

render_stuff();

// Subtract two time points, get a clock duration!
clock_type::duration const diff = 
	clock_type::now() - before_frame;

// If the frame was over too quickly, compensate
if(diff < minimum_frame_length)
{
	// Subtract two durations
	fcppt::time::sleep(
		fcppt::chrono::duration_cast<fcppt::time::sleep_duration>(
			minimum_frame_length - diff));
}

sge::timer

In your application (your game, for example), you may want to time things. Say you have a bonus system in your game and you want to let the bonus expire after 10 seconds while the player sees the seconds running down. You can do that with chrono, the structures are there…but it’s not very comfortable. You’d have to store a time point bonus_started yourself, along with a duration bonus_duration and watch if the timer has expired.

In sge, you can define a sge::timer::basic<Clock> which gets a clock as template parameter. Again, in your application, you have to decide which type of clock you want to use. A timer has the following properties:

  1. An interval determining how long it lasts. Internally, this is stored in the clock’s duration type. The interface, however, doesn’t care about the specific type of the duration. You can pass in and retrieve it as any duration type and it will be implicitly converted.
  2. A starting time point, pretty self-explanatory. The only way this member can be modified is to call reset() on the timer.
  3. An activation state which is simply a bool.
  4. An expiration flag which determines if the timer has expired. This can be set explicitly, but is otherwise determined by looking at the interval, the starting time point and the activation state.

Timers can be active or inactive, and they can be expired or not expired. This might be a bit confusing, so let me clarify: A timer is expired, if…

  1. you’ve explicitly set it to be expired (via timer.expired(true))
  2. it’s inactive
  3. the specified interval has elapsed since the last call to reset() or since the timer’s construction

A timer is active if you’ve set it to be active. It’s inactive if you say timer.active(false);. Note that the active flag will not be set to true when you call reset(). The expired flag, however, will be set to false.

The timer interface looks like this

template<typename Clock>
class basic
{
FCPPT_NONCOPYABLE(
	basic);
public:
	typedef
	Clock
	clock_type;

	typedef
	timer::parameters<clock_type>
	parameters;

	typedef typename
	clock_type::time_point
	time_point;

	typedef typename
	clock_type::duration
	duration;

	explicit basic(
		parameters const &amp;);

	bool expired() const;
	void expired(bool);

	bool active() const;
	void active(bool); 

	template<typename NewDuration>
	NewDuration const interval() const;

	template<typename NewDuration>
	void interval(NewDuration const &amp;);

	time_point const now() const;

	time_point const last_time() const;

	void reset();
private:
	duration interval_;
	bool active_;
	bool expired_;
	time_point last_time_;
};

Since we don’t want to specify all of the 4 properties of a timer when initializing it, there is a parameters class which you give to the constructor. Again, I think the timer is best explained with a simple example: Let’s stall the user for, say, 10 seconds and output a nice progress bar while they are waiting:

#include <sge/timer/basic.hpp>
#include <sge/timer/elapsed_fractional.hpp>
#include <fcppt/chrono/seconds.hpp>
#include <fcppt/chrono/high_resolution_clock.hpp>
#include <iostream>

typedef
sge::timer::basic<fcppt::chrono::high_resolution_clock>
timer_type;

timer_type wait_timer(
	timer_type::parameters(
		fcppt::chrono::seconds(10))
		// Unnecessary, just here for exposition
		.active(true)
		.expired(false));

// Let's reserve 60 characters for the progress bar.
unsigned const progress_bar_width = 60;

while(!wait_timer.expired())
{
	// Retrieve the elapsed time as a floating point value in [0,1]
	double const fraction = 
		sge::timer::elapsed_fractional<double>(wait_timer_);

	unsigned const elapsed_time = 
		static_cast<unsigned>(fraction * progress_bar_width);
	
	// \r to rewind to the start of the line
	std::cout << '\r' << '|';
	// Draw the elapsed_time. We could also use one of std::string's constructors
	// for this.
	for(unsigned i = 0; i < elapsed_time; ++i)
		std::cout << '-';
	// Draw the remaining time
	for(unsigned i = 0; i < (progress_bar_width - elapsed_time); ++i)
		std::cout << ' ';
	std::cout << '|';
}

More clocks

There’s one more thing sge::timer provides, and that’s additional clocks. chrono defines three clocks:

  1. high_resolution_clock we already saw
  2. steady_clock which is guaranteed to be steady
  3. system_clock which provides functions to interact with C’s time api (convert to and from time_t values)

All of these clocks are stateless. They are classes, but they contain no data member. That’s why they have a static member function now(). Now, unfortunately, they can be instantiated:

fcppt::chrono::high_resolution_clock my_highres_clock;
fcppt::chrono::high_resolution_clock::time_point my_now = my_highres_clock.now();

but this is just a bad design decision in the Standard.

Despite that, stateful clocks can be useful, too. For example, sge provides sge::timer::clocks::adjustable<Clock>, which takes a stateless clock and modifies its behavior. adjustable has a data member factor, which determines how fast time moves forward. If this factor is 1.0, time moves at normal speed (relative to Clock). Now, in a game, you could define

typedef
sge::timer::clocks::adjustable<fcppt::chrono::steady_clock>
ingame_clock;

ingame_clock game_clock;

and give it this game_clock to all your entities (the player, the bullets etc.). Then to enter pause mode, all you have to do is call

game_clock.factor(
	0.0f);

and the game stops, since all the timers use this clock. Similarly, if you just want to slow time down (yay, Bullet Time!), you could set the time factor to 0.5.

Remember, though, that you have to pass stateful clocks to the timers explicitly. E.g.:

sge::timer::basic<ingame_clock> my_timer(
	sge::timer::parameters<ingame_clock>(
		game_clock,
		fcppt::chrono::seconds(1)));

Also, you have to call game_clock.update() to make time progress.

Conclusion

That’s all I have to say for now about chrono, durations, clocks, and timers. I hope you see how well the standard guys thought about the time API, and why defining it with templates is a necessary evil in this situation. If you haven’t done much C++ yet, consider chrono one of the “good” C++ APIs.

Posted in Uncategorized | Leave a comment

Managing configuration stuff with json – a complete solution?

In this article, I’m assuming you know what the json file format is (and, of course, have some knowledge of the C++ programming language).

Motivation

Most applications don’t run out of the box. They need to be configured, first. Think of a web server, for example. It often needs to know which port it’s supposed to listen on, or where it should store files it wants to cache. There are a few standard methods available for configuring an application:

  • Command line parameters, which are passed to the program by the operating system as arguments to main (and are usually called “argc” and “argv”, for “argument count” and “argument vector”). You would use those variables to temporarily change the program’s behavior, not permanently.
  • Configuration files, which, on Unix-like systems, are stored either globally inside the /etc directory, or locally, usually below the home directory (think $HOME/.application_name/config_file). Most applications feature both a global and a local configuration file.
  • System-specific configuration databases, one of the most prominent ones being the Windows registry.
  • Preprocessor flags which are specified when compiling the program. They’re usually permanent after compiling/installing the program and cannot be overridden.

In this article, I will describe how to use json to create applications which are configurable via command line, global configuration file and local configuration file. All at once, and with the same syntax.

sge’s json module

Before I can dig in to how json is used for configuration, I should explain how we represent json in C++. We will be using sge’s json module, which provides a json parser for files, streams and strings. It uses spirit to do the parsing and basically implements the context-free grammar structure which is depicted on the json home page. To understand the following, you should have the json homepage with its diagrams open, parallel to reading the article.

Simple types

The grammar on the json website tells you that a json value can be a string, a boolean, a number, an object, an array or “null“. Now, how do we map these json types to C++ types?

Everything except object, array and value itself can be mapped pretty easily. The json sge module defines the following types/typedefs (they’re all in the sge::parse namespace):

json type C++ type
int json::int_type (typedef for int)
int with fractional part json::float_type (typedef for float)
null An empty struct called json::null
string fcppt::string
boolean bool

The value type

Inheritance? …

As you can see, the “value” type should be able to hold exactly one of the mentioned json types (except another value). Take a moment and think about how you would define such a “value” type.

If you’re an object-oriented programmer you might think that you have solved the problem: use inheritance. Define value as an abstract base class and derive all the other types from it!

// Our base class
struct value
{
	// We have to define this so it's really a base class (C++ idiosyncrasy)
	virtual ~value() {}
};

// Derived from base class
struct int_type
:
	value
{
	int number;

	int_type(int _number)
	:
		number(_number)
	{
	}
};

struct array
:
	value
{
	std::vector<value*> values;
};

// ...

This approach works. But it’s not quite the right tool for the job. Let me quickly give you some reasons for this:

To be as comfortable as possible, we would like all the json C++ types to be copyable, so we can pass them around freely. Inheritance – at least in C++ – doesn’t really play well into this. You can already see this in the code above. The array class needs to hold values. But it can’t really hold them by value (no pun intended), meaning we cannot define the “values” member to be a std::vector<value> values; because that would cause slicing:

// Let's collect some values
std::vector<value> values;
// Add an integer (as defined above)
values.push_back(int_type(10));
// Let's see if we can output the integer we just inserted
std::cout << dynamic_cast<int_type &>(values.front()).number << "\n";

The code above will throw a bad_cast. Why? Because the vector holds objects of type value. Pushing a subclass into the vector will “cast” (slice!) it into the base class, losing information.

We could work around this using smart pointers and ptr_containers, but that would be really clumsy and we would need new and delete.

There is another, more idealistic, reason for not choosing inheritance to solve the problem. Inheritance is an option if you do not know how many derived classes of value there will be, and to allow the user to extend the functionality by deriving from our base class. This is not the case with json. It’s clearly defined how many “derived” classes we have, and the user shall not extend the functionality.

…no, variants!

So, what’s our solution? Well, if you’re a functional programmer, you immediately think of an algebraic data type (or variant) for our value type. Don’t be afraid, I do not require you to know what that is, I’ll explain it. You can think of a variant as a base class for a fixed number of derived types. A quick example should explain how you use it and what it can do:

// number holds exactly one of the three types mentioned
boost::variant<int,float,double> number;

// Now number is an "int" (and not a float or double)
number = 10;

// We can extract its value using the "get" function
std::cout << boost::get<int>(number) << "\n";

// This, however, will throw an exception, since we didn't store a double in
// the variant
// std::cout << boost::get<double>(number) << "\n";


number = 10.0;

// Now this works
std::cout << boost::get<double>(number) << "\n";

// We can put variants in a regular container!
std::vector<boost::variant<int,float,double> > numbers;
numbers.push_back(10);
numbers.push_back(20.f);
std::cout 
  << boost::get<int>(numbers[0]) 
  << ", " 
  << boost::get<float>(numbers[1]) 
  << "\n";

Variants are obviously copyable and they don’t use new/delete, so they are an ideal candidate for our value. With that in mind, we can complete our json-to-C++-mapping as follows:

json type C++ type
value
variant
<
	json::int_type,
	json::float_type,
	json::null,
	bool,
	json::object,
	json::array,
	fcppt::string
>
array
struct array 
{ 
	json::member_vector elements; 
};

where json::member_vector is a std::vector<json::value>

object
struct object 
{ 
	json::member_vector members; 
};

where json::member_vector is a std::vector<json::member> and json::member is

struct member 
{ 
	fcppt::string name; 
	json::value value; 
};

(yes, calling both the type and the member variable “value” is possible in C++)

Reading and writing a json tree

To read a json tree, you can call one of the following functions:

Function Description
json::object 
parse_file_exn(
	filesystem::path)
Parses the json file and throws an exception if it doesn’t exist (exn is for “existing”).
bool 
parse_file(
	filesystem::path,
	json::object &)
Parses the json file and returns true if the parsing succeeded. The result is stored in the object that is passed as second parameter.
template<typename It>
bool 
parse_range(
	It &begin,
	It end,
	json::object &)
Parses the iterator range delimited by [begin,end), returns true if parsing succeeded and sets the begin iterator to the place where the parsing ended (which might not equal end). The result is, again, stored in the object passed as third parameter.
bool 
parse_stream(
	fcppt::io::istream &,
	json::object &)
Parses the stream, returns true if parsing succeeded. The result is, again, stored in the object passed as second parameter. Note that fcppt::io::istream is a stream adapted to fcppt::char_type and can thus be used in conjunction with fcppt::string.

If you have an object and want to write it to a file, there is

fcppt::string
output_tabbed(
	object const &);

which uses tabs to indent the json structure in an eye-friendly way.

Manipulating the json tree

Let’s say you’ve read in the following json object (from a file, for example):

{
	"player-name" : "pimiddy",
	"camera" : 
	{
		"x-angle" : 16.0,
		"position" : [4.0,8.0,15.0]
	}
}

You want to extract the various values from the tree.

  • To extract the player name, you have to iterate through the object’s members and boost::get<fcppt::string> the value of the member “player-name”.
  • To extract the “x-angle” inside “camera”, you have to search the “camera” member, convert it to an object, then search the “x-angle” and convert it to float_type.
  • To extract the “position” and store it inside a std::vector<float>, for example, you have to go to “camera”, then search “position”, then convert the (heterogenous!) array “position” to your std::vector.

Clearly, this is repetitive and annoying. A helper function find_and_convert_member is needed! This function is used as follows:

// Read our file (just to show you how one of the parse_ functions is used)
json::object const &config_file = 
	json::parse_file_exn(
		FCPPT_TEXT("my_file.json"));

// Note: This is a double, not float_type. find_and_convert_member doesn't
// care, as long as it's a floating point type.
double camera_x_angle = 
	json::find_and_convert_member<double>(
		config_file,
		// Unix filesystem-like syntax here
		json::path(FCPPT_TEXT("camera")) / FCPPT_TEXT("x-angle"));

// Again, this could be a double or long double, too
std::vector<float> camera_position = 
	json::find_and_convert_member<std::vector<float> >(
		config_file,
		json::path(FCPPT_TEXT("camera")) / FCPPT_TEXT("position"));

That’s much more readable and more intuitive code! Note that instead of a dynamic std::vector, we could have used an array<float,3>, too.

json for configuration purposes

Command line

With what we’ve seen so far, we can load a json file into an
object and extract values from this object. So the greater
idea is to read in a json file and use the json tree as a database inside our
application.

But we want to take into account command line parameters, too. So, obviously,
need to modify the json tree that comes out of the parse_file
method. This is done with the following function call:

int 
main(
	int argc,
	char *argv[])
{
	json::object const &config_file = 
		json::merge_command_line_parameters(
			json::parse_file_exn(
				FCPPT_TEXT("my_file.json")),
			json::create_command_line_parameters(
				argc,
				argv));
}

There’s a little more to this than you might have expected. What’s this create_command_line_parameters? Aren’t the command line parameters already there, “created” by the operating system? Well, yes, but they’re of type char, not fcppt::char_type, and they’re stored in a clumsy char*[] array! So in the code above, we convert them to a std::vector<fcppt::string>, which is more C++y.

The function merge_command_line_parameters takes an existing json tree (in our case, read directly from a file) and “merges” the tree with directives contained in the command line parameters. The above code allows you to issue the following:

./my_program 'player-name="test"' 'camera/position=[16.0,23.0,42.0]' 'camera/x-angle=18.0'

The syntax should be very clear:

option-path=option-value

where option-path may contain a ‘/’ to indicate “descend into the object”. option-value has to be a valid json type, so it can even be a whole object!

./my_program 'camera={ "position" : [1.0,2.0,3.0], "x-angle" : 32.0 }'

There are two important things to keep in mind when using merge_command_line_parameters:

  1. Your system shell parses the command line first. And while doing that, it might respond to certain special characters. For example, ‘[‘ and ‘]’ have special meaning in the bash shell. Also, quoting stuff with “” gives unexpected results. So my advice is: Always surround your arguments with apostrophes.
  2. merge_command_line_parameters does some type checking when merging your parameters. This means, for example, that
    player-name=10.0

    will throw an exception, since the original type of player-name is string! It doesn’t, however, recursively check object types (yet), so the following is valid:

    camera='{ "lol" : "this works" }'

    since “camera” has type object before and after the merge.

User configuration files

We’ve seen that we can merge an existing tree with a single new value. The next step is to merge two json trees. This situation occurs when you read a global configuration file, but want to incorporate changes from a user configuration file. We can merge two trees with the merge_trees function:

json::object const &config_file = 
	json::merge_trees(
		json::parse_file_exn(
			global_path()/FCPPT_TEXT("my_file.json")),
		json::parse_file_exn(
			local_path()/FCPPT_TEXT("my_local_file.json")));

Now we’ve got a complete configuration system with user config files and command line manipulation, all in json! In the next section, I’ll explain how all of it fits together.

Writing back changes and putting it all together

One thing might bug you: What if I want to manipulate the json tree in my program and write back the changed values to my user configuration file? The json module provides two wrappers for this situation. One is the function modify_user_value:

void
modify_user_value(
	json::object const &global_configuration,
	json::object &user_configuration,
	json::path const &,
	json::value const &new_value);

It takes the global_configuration object to determine if the json path we want to change was defined in the global configuration file. Thus, the global configuration file serves as a “declaration file”, determining which json values exist and can be changed. It also compares the type of the value in global_configuration with the type contained in new_value. If it doesn’t match, you’ll get an exception. This, again, prevents you from accidentally writing 10.0 to the player-name variable, which would result in an error when re-loading the user config file at the next application start.

However, you should consider modify_user_value a low-level function. This is indicated by the fact that the last parameter is a value, not an arbitrary type. This means that you cannot pass a std::vector<int> to this function, you have to convert it into a json type, first (the convert_to function does this for you).

As a higher-level construct, sge provides user_config_variable, which is a template class:

template<typename T>
class user_config_variable
{
public:
	user_config_variable(
		json::object const &global_configuration,
		json::object &local_configuration,
		json::path const &);

	void
	value(
		T const &);

	T const &
	value() const;

	fcppt::signal::auto_connection
	change_callback(
		function<void(T const &)>);

	~user_config_variable();
};

If the fcppt::signal type looks strange to you, consider reading part III of the sge tutorial. I’ll assume that you know what signals are and how they’re used.

So what’s the idea behind user_config_variable? I’m going to answer this question and, in addition, I’m going to explain how to put together all the json functions presented so far to create a json configuration system for an application. The idea behind user_config_variable is illustrated best by the use case that made me write this class:

In the game fruitcut, which I’m currently working on, I’ve got a configuration variable “music volume” which is a floating point value in the range [0,1]. This variable is read from the global configuration file, but it may be changed by the user in the game’s main menu (via a slider). The change should, of course, be permanent, so I need to write the volume back to the user configuration file when the application exits.

The class responsible for music is called music_controller. This class, however, doesn’t contain any json configuration code, since it should be as reusable and abstract as possible (and maybe someone wants to use it in conjunction with his configuration system which isn’t based on json). The class has a method volume to set the volume. The initial volume is given to it in the constructor.

Looking at the class user_config_variable, the following steps have to be done to create a configuration system and to integrate a “writeable” configuration variable:

  1. Read in the global configuration file, store it in global_config_temp.
  2. Read in the user configuration file, store it in user_config.
  3. Merge the global config, the user config and the command line parameters, store the result in global_config. This will be the main database which we query when we need a config value, since it contains all the information the user and the system gave us.
  4. Create a user_config_variable called music_volume (with T = float), pass it global_config, user_config and the correct path.
  5. Create the music_controller, pass music_volume.value() to it. That’s the volume specified in the merged global_config, so it might come from the global config, the user config or the command line.
  6. Connect the music_controller::volume(...) function to the music_volume::value() getter via change_callback.
  7. Modify music_volume when the slider in the main menu is updated.
  8. user_config_variable will update the user_config tree in its destructor. So, after the variable is destroyed, you’re free to write back user_config to a file using output_tabbed (see above).
Posted in Uncategorized | Tagged , , , , | Leave a comment

Game development in sge/C++: Part IV (statechart)

Introduction

Welcome back to the fourth “episode” of our journey to create a top down shooter game. Note that there were some minor changes to sge and I’ve updated the repository and the blog code. It’s nothing really noteworthy for us, though.

So, what’s the topic of this post, how do we continue? Well, we definitely need some structure for the game. Something to build upon which won’t change as frequently as the other parts. Again employing a modern C++ library, we’re going to create a “statechart” for the application.

States

Motivation

I’ll first try to motivate the key ideas which make up the boost::statechart library which we’ll be using, then introduce the library itself.

As you can easily imagine, a game “as a whole” almost always consists of a set of states. For example, most games have a main menu. That’s a state, and it is different from the “ingame” state and the “game is over, please enter your name” state. States can form a hierarchy. Below “ingame”, there might be two substates “paused” and “running”. The menu state might consist of a collection of submenu states.

Of course, each state has its own local data attached to it. The menu state stores some gui widgets, the ingame state stores the game’s objects and so on. This data shouldn’t be allocated all the time, just when the state is active. Therefore, it is wise to create one class for each state (or substate, or subsubstate) and make it that when a state is entered, the class’ constructor is called and when it is exited, the destructor is called (following the RAII principle). To make the discussion easier, consider the following very simple hierarchy which we will be using in the game (at least in the beginning):

Our game's first statechart

Our game's first statechart

States are dynamically constructed and destroyed as certain events occur. When the user presses the “pause” button, “running” is destroyed and “paused” is constructed (“ingame”, however, isn’t touched, which makes it a good place to store more persistent game data such as all the game’s objects). These events should be classes, too, since we might want to pass them around and store relevant data in them.

Note that statechart has a lot of capabilities (asynchronous state machines, orthogonal states, history, …) and we will be using a very small subset at first. Later on, we might explore more possibilities.

boost::statechart

Writing a state management system from scratch is a very difficult task. There are some ready-made solutions available, but basically just one sticks out: boost::statechart. You’ll see that we can basically take the picture above and translate it into C++ code. Note that the picture also contains a “machine”. This is really the states’ “engine”. It’s a top-level container which contains the currently active state. Also, the user can store additional data in the machine which is then globally available for all the states.

I’m now going to give you a skeleton for boost::statechart which implements the diagram1. This is just to show you:

  1. how to (syntactically) define a single state, substates and the machine
  2. how the states are interconnected and which class must “know” which one (this is not completely trivial)
#include <boost/statechart/state_machine.hpp>
#include <boost/statechart/state.hpp>

// ----------------- machine begin -----------------
namespace sgeroids { namespace states { namespace ingame { class superstate; } } }

namespace sgeroids {
class machine
:
	public boost::statechart::state_machine<machine,states::ingame::superstate>
{
public:
	machine();

	sge::systems::instance const &
	systems() const;

	void
	run();
private:
	sge::systems::instance systems_;
	bool running_;
};
}
// ----------------- machine end ----------------- 

//  ----------------- superstate begin ----------------- 
namespace sgeroids { namespace states { namespace ingame { class running; } } }

namespace sgeroids { namespace states { namespace ingame {
class superstate
:
	// machine (the "context") must be complete, running (the initial state)
	// doesn't have to be
	public boost::statechart::state<superstate,machine,running>
{
public:
	// Wtf, my_context? See below
	explicit
	superstate(
		my_context);
};
} } }
//  ----------------- superstate end ----------------- 

//  ----------------- running begin ----------------- 
namespace sgeroids { namespace states { namespace ingame {
class running
:
	// ingame now has to be complete
	public boost::statechart::state<running,superstate>
{
public:
	// Wtf, my_context? See below
	explicit
	running(
		my_context);
};
} } }
//  ----------------- running end ----------------- 

//  ----------------- paused begin ----------------- 
namespace sgeroids { namespace states { namespace ingame 
{
class paused
:
	// ingame now has to be complete
	public boost::statechart::state<paused,superstate>
{
public:
	// Wtf, my_context? See below
	explicit
	paused(
		my_context);
};
} } } 
// ----------------- paused end ----------------- 

As you can see, I finally chose a name for the game: “sgeroids”. It’s asteroids for sge on steroids. Thanks to nille for that.

So, we put our states in the “states” namespace and we let the filesystem and namespace hierarchy follow the state hierarchy, which is usually a good idea. You can see three types of objects in the code:

  1. The state machine, derived from boost::statechart::state_machine, which receives itself as the template parameter as well as the initial state.
  2. “leaf” states like ingame::running and ingame::paused which have no further child states. They receive themselves and their “context” or “father” state as a template parameter.
  3. “inner” states like ingame::superstate which receive themselves, the context state and the initial child state as template parameter.

The astute — as well as the unastute — reader might have noticed that all of statechart’s classes are templatized by…themselves! This is called the curiously recurring template pattern and I won’t go into it right now because we probably won’t be using it in our code (except for the statechart declarations). Just keep in mind that the first argument to just about any class in statechart is the name of the class again. It’s a common source of copy-and-paste errors.

My stuff!

There’s another little oddity about the statechart code. The state constructors have exactly one parameter, a my_context object. This looks a little weird, but it’s necessary to pass some data to the states, since we want to be able to access the father state or the machine from within the constructor.

If we receive no parameters in the constructor, that’s not possible (unless the machine is a global object). And, oh well, the statechart author felt it was a good idea to typedef the state’s base class to my_base and the “context” to my_context.

Filling in the blanks

Now that there’s a statechart skeleton we have to think about which state does what and how to transition between states and when. This is usually a pretty easy process. Which variables to put where is largely self-explanatory, so we’ll skip that for now (in the code above I already created some methods and variable names which we’ll use later on).

More interesting is our game loop, since we will base it on statechart events. For now, we will create two events which will be sent every frame:

  1. A tick event which instructs the currently active state to “update the game”. This means stepping the collision engine, updating timers and so on. The event contains no data.
  2. A render event

The definition in C++ is trivial:

#include <boost/statechart/event.hpp>

namespace sgeroids { namespace events {
class tick
:
	public boost::statechart::event<tick>
{
};
} }

namespace sgeroids { namespace events {
class render
:
	public boost::statechart::event<render>
{
};
} }

Notice again the CRTP pattern. Now on to the main loop, which we will initiate inside a member function of the machine:

void
sgeroids::machine::run()
{
	while(running_)
	{
		// Get our input events!
		systems_.window()->dispatch();

		// process_event is derived from state_machine
		process_event(
			events::tick());

		sge::renderer::scoped_block scoped_render_block(
			systems_.renderer());

		process_event(
			events::render());
	}
}

The main loop should seem pretty familiar to you now. The process_event function sends an event “down the state chain”. It will first go as deep as possible in the hierarchy, stopping at states::ingame::running (unless we’re in pause mode). In this state, it’ll look for a “reaction” to the event. If it finds one, it’s done (unless you instructed it to defer/forward the event). If it doesn’t, it’ll go up a level to states::ingame::superstate and look for a reaction there, and so on:

The event chain induced by a call to process_event

The event chain induced by a call to process_event, ordered chronologically with a color coding

pause, running, pause, …

This leaves but one thing to discuss: How to write event reactions! Fortunately, it’s not difficult at all. Let’s learn how to do it by implementing the transition from “running” to “paused”. First, we’re creating a new event:

#include <boost/statechart/event.hpp>

namespace sgeroids { namespace events {
class pause_toggle
:
	public boost::statechart::event<pause_toggle>
{
};
} }
} }

This event will be triggered from ingame::superstate so the states below it can catch it and do the transition.

Transition between running and paused

Transition between running and paused

We add a few things to the superstate:

class superstate
:
	public boost::statechart::state<superstate,machine,running>
{
public
	superstate(my_context _context)
	:
		my_base(
			_context),
		handle_keypress_connection_(
			context<machine>().systems().keyboard_collector().key_callback(
				boost::bind(
					&handle_keypress,
					this,
					_1)))
	{
	}
private:
	fcppt::signal::scoped_connection handle_keypress_connection_;

	void
	handle_keypress(
		sge::input::keyboard::key_event const &)
	{
		context<machine>().process_event(
			events::pause_toggle());
	}
};

There are a few new things here. In the constructor you see how to initialize a state. As I said, the state’s base class is typedeffed to my_base and it expects a my_context as the constructor argument. After the base class is initialized, we can call the context() function to access either a state that’s higher up in the hierarchy or even the machine. In this case, we fetch the machine and use the systems() accessor function to get the keyboard collector.

In our key callback (the last function in the code sample), we “inject” the pause_toggle event to the machine just like we injected tick and render earlier.

Event handlers

On to the transition in running:

#include "superstate.hpp"
#include "paused.hpp"
#include "../../events/pause_toggle.hpp"
#include "../../events/render.hpp"
#include <sge/input/keyboard/key_event_fwd.hpp>
#include <fcppt/signal/scoped_connection.hpp>
#include <boost/statechart/state.hpp>
#include <boost/statechart/custom_reaction.hpp>
#include <boost/mpl/list.hpp>

namespace sgeroids
{
namespace states
{
namespace ingame
{
class running
:
	public boost::statechart::state<running,superstate>
{
public:
	typedef
	boost::mpl::list
	<
		boost::statechart::custom_reaction<events::pause_toggle>
		boost::statechart::custom_reaction<events::render>
	>
	reactions;

	explicit
	running(
		my_context _context)
	:
		my_base(
			_context)
	{
	}

	boost::statechart::result
	react(
		events::pause_toggle const &)
	{
		return transit<paused>();
	}

	boost::statechart::result
	react(
		events::render const &)
	{
		// TODO
		return discard_event();
	}

	~running();
};
}
}
}

Again we use a type container from boost::mpl (last time we used it to store the properties of our sprites). This time, we use it to store the collection of event reactions. custom_reaction is just a wrapper saying “this class has a react member function to react to event x”. This react function returns a result, which has one of the following effects:

  1. discard the event, which just means “move along, nothing to see here”
  2. a transition to a different state (which we are using to get from running to paused and back)
  3. forward the event and let the state above deal with it
  4. defer the event, pushing it into an internal queue so it’s processed later (maybe after a transition?)

You see two types of results in the event handlers: The render event is just discarded. In pause_toggle, we execute the transition to paused. Pretty simple, isn’t it?

What’s next?

I hope you like statechart so far and are not disappointed that there were no “visible” additions in this article. In the next article, we will focus on clocks, durations and time in general, introducing chrono, the time library of the upcoming standard C++0x. This will help us limit the game’s frame rate so it doesn’t consume more CPU time than necessary. Also, we need a way to stop the game time so the “pause” state makes sense.

Note that in the repository you can find the skeleton in the “4” directory. It contains only the files talked about here, so it doesn’t compile/link.

Footnotes

1 Note that I’m breaking sge style a bit here, the namespace declarations are usually one below the other, not next to each other. But I didn’t want to blow up the code in the blog.

Posted in Uncategorized | Tagged , , , , | Leave a comment