I know that an “undefined behaviour” in C++ can pretty much allow the compiler to do anything it wants. However, I had a crash that surprised me, as I assumed that the code was safe enough.
In this case, the real problem happened only on a specific platform using a specific compiler, and only if optimization was enabled.
I tried several things in order to reproduce the problem and simplify it to the maximum. Here’s an extract of a function called Serialize
, that would take a bool parameter, and copy the string true
or false
to an existing destination buffer.
Would this function be in a code review, there would be no way to tell that it, in fact, could crash if the bool parameter was an uninitialized value?
// Zero-filled global buffer of 16 characters
char destBuffer[16];
void Serialize(bool boolValue) {
// Determine which string to print based on boolValue
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
const size_t len = strlen(whichString);
// Copy string into destination buffer, which is zero-filled (thus already null-terminated)
memcpy(destBuffer, whichString, len);
}
If this code is executed with clang 5.0.0 + optimizations, it will/can crash.
The expected ternary-operator boolValue ? "true" : "false"
looked safe enough for me, I was assuming, “Whatever garbage value is in boolValue
doesn’t matter, since it will evaluate to true or false anyhow.”
I have setup a Compiler Explorer example that shows the problem in the disassembly, here the complete example. Note: in order to repro the issue, the combination I’ve found that worked is by using Clang 5.0.0 with -O2 optimisation.
#include <iostream>
#include <cstring>
// Simple struct, with an empty constructor that doesn't initialize anything
struct FStruct {
bool uninitializedBool;
__attribute__ ((noinline)) // Note: the constructor must be declared noinline to trigger the problem
FStruct() {};
};
char destBuffer[16];
// Small utility function that allocates and returns a string "true" or "false" depending on the value of the parameter
void Serialize(bool boolValue) {
// Determine which string to print depending if 'boolValue' is evaluated as true or false
const char* whichString = boolValue ? "true" : "false";
// Compute the length of the string we selected
size_t len = strlen(whichString);
memcpy(destBuffer, whichString, len);
}
int main()
{
// Locally construct an instance of our struct here on the stack. The bool member uninitializedBool is uninitialized.
FStruct structInstance;
// Output "true" or "false" to stdout
Serialize(structInstance.uninitializedBool);
return 0;
}
The problem arises because of the optimizer: It was clever enough to deduce that the strings “true” and “false” only differs in length by 1. So instead of really calculating the length, it uses the value of the bool itself, which should technically be either 0 or 1, and goes like this:
const size_t len = strlen(whichString); // original code
const size_t len = 5 - boolValue; // clang clever optimization
While this is “clever”, so to speak, my question is: Does the C++ standard allow a compiler to assume a bool can only have an internal numerical representation of ‘0’ or ‘1’ and use it in such a way?
Or is this a case of implementation-defined, in which case the implementation assumed that all its bools will only ever contain 0 or 1, and any other value is undefined behaviour territory?