12

Non-determinism in C++

 5 years ago
source link: https://www.tuicool.com/articles/IveaUnR
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

A deterministic program, when given the same input, will always result in the same output. This intuitive, albeit quite fuzzily defined, property is often times pretty important for correct program. Sources of non-determinism can be quite subtle – and once they creep into your program, they can propagate and amplify and have enormous consequences. It is pretty much well-known the butterfly effect.

When discussing this problem, it is important to know what exactly makes up the input and the output of the program. For example, when logging times to a logfile, and considering this an actual output, no two runs will ever be the same – so this is usually not considered an output relevant for determinism. Which brings us to the first common source of non-determinism:

Time

If any part of your program depends on the time it is run at, it will be easily be non-deterministic. Common cases are using the time to initializing some variable depending on the time, or using the time for some kind of numerical integration, like computing a value over time. Also, using execution time as an output respective for determinism is hopeless on a normal desktop computer – but can be crucial for a real-time system.

Random number generation

Random number generation seems like an obvious candidate, yet most random number generators are not really random, but only pseudo-random. For example, std::mersenne_twister_engine will generate the same sequence of values every time, when initialized with the same seed. So do not initialize this with a non-deterministic input like the time, and it will be predictable. However, std::random_device might not share this property and give you fresh non-deterministic input. As a weird middle ground, std::default_random_engine will probably give you the same results when compiled with the same compiler/standard-lib, but on another compiler version or OS, it will not. Subtle.

The allocator

Another source of non-determinism that is pretty tricky is the allocator. For example, consider the following piece of code:

template <class T>
T sum(std::set<Thingy*> const& set)
{
  T result{};
  for (auto const& each : set)
    result += each->value();
  return result;
}

Is this deterministic or not? It depends. Now let’s assume that all the Thingy s were allocated using standard new . In that case, the actual pointers, Thingy* are non-deterministic, and hence the order of the Thingy* s in the set is random. But does this matter? Well if T is std::uint32_t , it does not. Order in addition does not matter for unsigned integers, even with overflows. However, if T is float , then it does matter and the whole result becomes unpredictable, at least in the general case (it will even be predictable, if e.g. all the numbers in the computation are integers that are exactly representable as floats). Other languages have “insertion-ordered” containers to get around this problem. A sensible approximation in C++ is to use the (unordered_)set and (unordered_)map containers together with another list to iterate on.

The thread scheduler

When you cannot really control the order of instructions, which is really the whole point of threading, you will have a harder time making things deterministic. Like the allocator problem, this is usually also paired with floating-point arithmetic. The workaround here is to make sure that the order of computation does not influence the final result. One common way around this is to sort the output by a unique criteria. For example, if you use multiple threads to report the intersections of a bunch of line segments, you can later sort them by their position in space.

There’s of course the honorable mention for uninitialized variables, but I’m sure your static analyzer will complain about it. Any interaction with the “outside” of your program, any side-effect, be it filesystems, user input, output or cosmic background radiation can lead to non-determinism, so be sure to know the context well enough and plan accordingly to your determinism requirements.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK