1

GotW #99 Solution: Postconditions (Difficulty: 7/10) – Sutter’s Mill

 3 years ago
source link: https://herbsutter.com/2021/02/08/gotw-99-solution-postconditions-difficulty-7-10/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

GotW #99 Solution: Postconditions (Difficulty: 7/10)

Herb Sutter C++, GotW 2021-02-082021-02-09

11 Minutes

This special Guru of the Week series focuses on contracts. Postconditions are directly related to assertions (see GotW #97)… but how, exactly? And since we can already write postconditions using assertions, why would having language support benefit us more for writing postconditions more than for writing (ordinary) assertions?

1. What is a postcondition, and how is it related to an assertion?

A function’s postconditions document “what it does” — they assert the function’s intended effects, including the return value and any other caller-visible side effects, which must hold at every return point when the function returns to the caller.

A postcondition IS-AN assertion in every way described in GotW #97, with the special addition that whereas a general assertion is always checked where it is written, a postcondition is written on the function and checked at every return (which could be multiple places). Otherwise, it’s “just an assertion”: As with an assertion, if a postcondition is false then it means there is a bug, likely right there inside the function on which the postcondition is written (or in the postcondition itself), because if prior contracts were well tested then likely this function created the first unexpected state. [2]

Explain your answer using the following example, which uses a variation of a proposed post-C++20 syntax for postconditions. [1]

// Example 1(a): A postcondition along the lines proposed in [1]
string combine_and_decorate( const string& x, const string& y )
[[post( _return_.size() > x.size() + y.size() )]]
{
if (x.empty()) {
return "[missing] " + y + optional_suffix();
} else {
return x + ' ' + y + something_computed_from(x);
}
}

The above would be roughly equivalent to writing the test before every return statement instead:

// Example 1(b): What a compiler might generate for Example 1(a)
string combine_and_decorate( const string& x, const string& y )
{
if (x.empty()) {
auto&& _return_ = "[missing] " + y + optional_suffix();
assert( _return_.size() > x.size() + y.size() );
return std::forward<decltype(_return_)>(_return_);
} else {
auto&& _return_ = x + ' ' + y + something_computed_from(x);
assert( _return_.size() > x.size() + y.size() );
return std::forward<decltype(_return_)>(_return_);
}
}

2. Rewrite the example in Question 1 to show how to approximate the same effect using assertions in today’s C++. Are there any drawbacks to your solution compared to having language support for postconditions?

We could always write Example 1(b) by hand, but language support for postconditions is better in two key ways:

(A) The programmer should only write the condition once.

(B) The programmer should not need to write forwarding boilerplate by hand to make looking at the return value efficient.

How can we approximate those advantages?

Option 1 (basic): Named return object + an exit guard

The simplest way to achieve (A) would be to use the C-style goto exit; pattern:

// Example 2(a)(i): C-style “goto exit;” postcondition pattern
string combine_and_decorate( const string& x, const string& y )
{
auto _return_ = string();
if (x.empty()) {
_return_ = "[missing] " + y + optional_suffix();
goto post;
} else {
_return_ = x + ' ' + y + something_computed_from(x);
goto post;
}
post:
assert( _return_.size() > x.size() + y.size() );
return _return_;
}

If you were thinking, “in C++ this wants a scope guard,” you’re right! [3] Guards still need access to the return value, so the structure is basically similar:

// Example 2(a)(ii): scope_guard pattern, along the lines of [3]
string combine_and_decorate( const string& x, const string& y )
{
auto _return_ = string();
auto post = std::experimental::scope_success([&]{
assert( _return_.size() > x.size() + y.size() );
});
if (x.empty()) {
_return_ = "[missing] " + y + optional_suffix();
return _return_;
} else {
_return_ = x + ' ' + y + something_computed_from(x);
return _return_;
}
}

Advantages:

  • Achieved (A). The programmer writes the condition only once.

Drawbacks:

  • Didn’t achieve (B). There’s no forwarding boilerplate, but only because we’re not even trying to forward…
  • Overhead (maybe). … and to look at the return values we require a named return value and a move assignment into that object, which is overhead if the function wasn’t already doing that.
  • Brittle. The programmer has to remember to convert every return site to _return_ = ...; goto post; or _return_ = ...; return _return_;… If they forget, the code silently compiles but doesn’t check the postcondition.

Option 2 (better): “return post” postcondition pattern

Here’s a second way to do it that achieves both goals, using a local function (which we have to write as a lambda in C++):

// Example 2(b): “return post” postcondition pattern
string combine_and_decorate( const string& x, const string& y )
{
auto post = [&](auto&& _return_) -> auto&& {
assert( _return_.size() > x.size() + y.size() );
return std::forward<decltype(_return_)>(_return_);
};
if (x.empty()) {
return post( x + ' ' + y + something_computed_from(x) );
} else {
return post( "[missing] " + y + optional_suffix() );
}
}

Advantages:

  • Achieved (A). The programmer writes the condition only once.
  • Efficient. We can look at return values efficiently, without requiring a named return value and a move assignment.

Drawbacks:

  • Didn’t achieve (B). We still have to write the forwarding boilerplate, but at least it’s only in one place.
  • Brittle. The programmer has to remember to convert every return site to return post. If they forget, the code silently compiles but doesn’t check the postcondition.

Option 3 (mo’betta): Wrapping up option 2… with a macro

We can improve Option 2 by wrapping the boilerplate up in a macro (sorry). Note that instead of “MY_” you’d use your company’s preferred unique macro prefix: [4]

// Eliminate forward-boilerplate with a macro (written only once)
#define MY_POST(postconditions)                            \
auto post = [&](auto&& _return_) -> auto&& {           \
assert( postconditions );                          \
return std::forward<decltype(_return_)>(_return_); \
};

And then the programmer can just write:

// Example 2(c): “return post” with boilerplate inside a macro
string combine_and_decorate( const string& x, const string& y )
{   MY_POST( _return_.size() > x.size() + y.size() );
if (x.empty()) {
return post( x + ' ' + y + something_computed_from(x) );
} else {
return post( "[missing] " + y + optional_suffix() );
}
}

Advantages:

  • Achieved (A) and (B). The programmer writes the condition only once, and doesn’t write the forwarding boilerplate.
  • Efficient. We can look at the return value without requiring a local variable for the return value, and without an extra move operation to put the value there.
  • Future-friendly. You may have noticed that I changed my usual brace style to write { MY_POST on a single line; that’s to make it easily replaceable with search-and-replace. If you systematically declare the condition as { MY_POST at the start of the function, and systematically write return post() to use it, the code is likely more future-proof — if we get language support for postconditions with a syntax like [1], migrating your code to that could be as simple as search-and-replace:

{ MY_POST( * )[[post _return_: * )]] {

return post( * )return *

Drawbacks:

  • (improved) Brittle. It’s still a manual pattern, but now we have the option of making it impossible for the programmer to forget return post by extending the macro to include a check that post was used before each return (see [5]). That’s feasible to put into the Option 3 macro, whereas it was not realistic to ask the programmer to write out by hand in Options 1 and 2.

GUIDELINE: If you don’t already use a way to write postconditions as code, consider trying something like MY_POST until language support is available. It’s legal C++ today, it’s not terrible, and it’s future-friendly to adopting future C++ language contracts.

Finally, all of these options share a common drawback:

  • Less composable/toolable. The next library or team will have THEIR_POST convention that’s different, which makes it hard to write tools to support both styles. Language support has an important incidental benefit of providing a common syntax that portable code and tools can rely upon.

3. Should a postcondition be expected to be true if the function throws an exception back to the caller?

First, let’s generalize the question: Anytime you see “if the function throws an exception,” mentally rewrite it to “if the function reports that it couldn’t do what it advertised, namely complete its side effects.” That’s independent of whether it reports said failure using an exception, std::error_code, HRESULT, errno, or any other way.

Then the question answers itself: No, by definition. A postcondition documents the side effects, and if those weren’t achieved then there’s nothing to check. And for postconditions involving the return value we can add: No, those are meaningless by construction, because it doesn’t exist.

“But wait!” someone might interrupt. “Aren’t there still things that need to be true on function exit even if the function failed?” Yes, but those aren’t postconditions. Let’s take a look.

Justify your answer with example(s).

Consider this code:

// Example 3: (Not) a reasonable postcondition?
void append_and_decorate( string& x, string&& y )
[[post( x.size() <= x.capacity() && /* other non-corruption */ )]]
{
x += y + optional_suffix();
}

This can seem like a sensible “postcondition” even when an exception is thrown, but it is testing whether x is still a valid object of its type… and sure, that had better be true. But that’s an invariant, which should be written once on the type [2], not a postcondition to be laboriously repeated arbitrarily many times on every function that ever might touch an object of that type.

When reasoning about function failures, we use the well-known Abrahams error safety guarantees, and now it becomes important to understand them in terms of invariants:

  • The nofail guarantee is “the function cannot fail” (e.g., such functions should be noexcept), and so doesn’t apply here since we’re discussing what happens if the function does fail.
  • The basic guarantee is “no corruption,” every object we might have tried to modify is still a valid object of its type… but that’s identical to saying “the object still meets the invariants of its type.”
  • The strong guarantee is “all or nothing,” so in the case we’re talking about where an error is being reported, a strong guarantee function is again saying that all invariants hold. (It also says observable state did not change, but I’ll ignore that for now; for how we might want to check that, see [6].)

So we’re talking primarily about class invariants… and those should hold on both successful return and error exit, and they should be written on the type rather than on every function that uses the type.

GUIDELINE: If you’re trying to write a “postcondition” that should still be true even if an exception or other error is reported, you’re probably either trying to write an invariant instead [2], or trying to check the strong did-nothing guarantee [6].

4. Should postconditions be able to refer to both the initial (on entry) and final (on exit) value of a parameter, if those could be different?

If so, give an example.

Consider this code, which uses a strawman _in_() syntax for referring to subexpressions of the postcondition that should be computed on entry so they can refer to the “in” value of the parameter (note: this was not proposed in [1]):

// Example 4(a): Consulting “in” state in a postcondition
void instrumented_push( vector<widget>& c, const widget& value )
[[post( _in_(c.size())+1 == c.size() )]]
{
c.push_back(value);
// perform some extra work, such as logging which
// values are added to which containers, then return
}

Postconditions like this one express relative side effects, where the “out” state is a delta from the “in” state of the parameter. To write postconditions like this one, we have to be able to refer to both states of the parameter, even for parameters that must be modifiable.

Note that this doesn’t require taking a copy of the parameter… that would be expensive for c! Rather, an implementation would just evaluate any _in_ subexpression on entry and store only that result as a temporary, then evaluate the rest of the expression on exist. For example, in this case the implementation could generate something like this:

// Example 4(b): What an implementation might generate for 4(a)
void instrumented_push( vector<widget>& c, const widget& value )
{
auto __in_c_size = c.size();
c.push_back(value);
// perform some extra work, such as logging which
// values are added to which containers, then return
assert( __in_c_size+1 == c.size() );
}

Notes

[1] G. Dos Reis, J. D. Garcia, J. Lakos, A. Meredith, N. Myers, and B. Stroustrup. “P0542: Support for contract based programming in C++” (WG21 paper, June 2018). Subsequent EWG discussion favored changing “expects” to “pre” and “ensures” to “post,” and to keep it as legal compilable (if unenforced) C++20 for this article I also modified the syntax from : to ( ), and to name the return value _return_ for postconditions. That’s not a statement of preference, it’s just so the examples can compile today to make them easier to check.

[2] Upcoming GotWs will cover preconditions and invariants, including how invariants relate to postconditions.

[3] P. Sommerlad and A. L. Sandoval. “P0052: Generic Scope Guard and RAII Wrapper for the Standard Library” (WG21 paper, February 2019). Based on pioneering work by Andrei Alexandrescu and Petru Marginean starting with “Change the Way You Write Exception-Safe Code – Forever” (Dr. Dobb’s Journal, December 2000), and widely implemented in D and other languages, the Folly library, and more.

[4] In a real system we’d want a few more variations, such as:

// A separate _V version for functions that don’t return
// a value, because 'void' isn’t regular
#define MY_POST_V(postconditions)                          \
auto post = [&]{ assert( postconditions ); };
// Parallel _DECL forms to work on forward declarations,
// for people who want to repeat the postcondition there
#define MY_POST_DECL(postconditions)   // intentionally empty
#define MY_POST_V_DECL(postconditions) // intentionally empty

Note: We could try to combine MY_POST_V and MY_POST by always creating both a single-parameter lambda and a no-parameter lambda, and then “overloading” them using something like compose from Boost’s wonderful High-Order Function library by Paul Fultz II. Then in a void-returning function return post() still works fine even with empty parens. I didn’t do that because the proposed future in-language contracts proposed in [1] uses a slightly different syntax depending on whether there’s a return value, so if our syntax doesn’t somehow have such a distinction then it will be harder to migrate this macro to a syntax like [1] with a simple search-and-replace.

[5] We could add extra machinery help the programmer remember to write return post, so that just executing a return without post will assert… set a flag that gets sets on every post() evaluation, and then assert that flag in the destructor of an RAII object for every normal return. The code is pretty simple with a scope guard [3]:

// Check that the programmer wrote “return post” each time
#define MY_POST_CHECKED                                     \
auto post_checked = false;                              \
auto post_guard = std::experimental::scope_success([&]{ \
assert( post_checked );                             \
});

Then in MY_POST and MY_POST_V, pull in this machinery and then also set post_checked:

#define MY_POST(postconditions)                             \
MY_POST_CHECKED                                         \
auto post = [&](auto&& _return_) -> auto&& {            \
assert( postconditions );                           \
post_checked = true;                                \
return std::forward<decltype(_return_)>(_return_);  \
};
#define MY_POST_V(postconditions)                           \
MY_POST_CHECKED                                         \
auto post = [&]{                                        \
assert( postconditions );                           \
post_checked = true;                                \
};

If you don’t have a scope guard helper, you can roll your own, where “successful exit” is detectable by seeing that the std::uncaught_exceptions() exception count hasn’t changed:

// Hand-rolled alternative if you don’t have a scope guard
#define MY_POST_CHECKED                                     \
auto post_checked = false;                              \
struct post_checked_ {                                  \
const bool *pflag;                                  \
const int  ecount = std::uncaught_exceptions();     \
post_checked_(const bool* p) : pflag{p} {}          \
~post_checked_() {                                  \
assert( *pflag ||                               \
ecount != std::uncaught_exceptions() ); \
}                                                   \
} post_checked_guard{&post_checked};

[6] For strong-guarantee functions, we could try to check that all observable state is the same as on function entry. In some cases, we can partly do that… for example, writing the test that a failed vector::push_back didn’t invalidate any pointers into the container may sound hard, but it’s actually the easy part of that function’s “error exit” condition! Using a strawman syntax like [1], extended to include an “error” exit condition:

// (Using a hypothetical “error exit” condition)
// This is enough to check that no pointers into *this are invalid
template <typename T, typename Allocator>
constexpr void vector<T>::push_back( const T& )
[[error( _in_.data() == data() && _in_.size() == size() )]] ;

But other “error exit” checks for this same function would be hard, expensive, or impossible to express. For example, it would be expensive to write the check that all elements in the vector have their original values, which would require first taking a deep copy of the container.

Acknowledgments

Thank you to the following for their feedback on this material: Joshua Berne, Gábor Horváth, Andrzej Krzemieński, James Probert, Bjarne Stroustrup, Andrew Sutton.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK