7

Formatting Text in C++: The Old and The New Ways

 10 months ago
source link: https://mariusbancila.ro/blog/2023/09/12/formatting-text-in-c-the-old-and-the-new-ways/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Formatting Text in C++: The Old and The New Ways

Posted on September 12, 2023September 12, 2023 by Marius Bancila

When it comes to format a piece of text in C++ there are several ways you can employ:

  • I/O streams, particularly std::stringstream with stream operations (such as operator <<)
  • printf family of functions, particularly sprintf
  • the C++20 format library, particularly std::format / std::format_to
  • any 3rd party library, {fmt} in particular (the source of the new standard format library)

The first two options represent the old ways. The format library, is, obviously, the new, modern way of formatting text. But which is better in terms of performance? Let’s try to figure out.

In which examples are presented…

First, let’s see a few simple examples of formatting text. Let’s say we want to format a text in the form "severity=1,error=42,reason=access denied". We can do this as follows:

  • with streams
int severity = 1;
unsigned error = 42;
reason = "access denied";
std::stringstream ss;
ss << "severity=" << severity
<< ",error=" << error
<< ",reason=" << reason;
std::string text = ss.str();
int severity = 1;
unsigned error = 42;
reason = "access denied";

std::stringstream ss;
ss << "severity=" << severity
   << ",error=" << error
   << ",reason=" << reason;

std::string text = ss.str();
  • with printf
int severity = 1;
unsigned error = 42;
reason = "access denied";
std::string text(50, '\0');
sprintf(text.data(), "severity=%d,error=%u,reason=%s", severity, error, reason);
int severity = 1;
unsigned error = 42;
reason = "access denied";

std::string text(50, '\0');
sprintf(text.data(), "severity=%d,error=%u,reason=%s", severity, error, reason);
  • with format
int severity = 1;
unsigned error = 42;
reason = "access denied";
std::string text = std::format("severity={},error={},reason={}", severity, error, reason);
std::string text;
std::format_to(std::back_inserter(text), "severity={},error={},reason={}", severity, error, reason);
int severity = 1;
unsigned error = 42;
reason = "access denied";

std::string text = std::format("severity={},error={},reason={}", severity, error, reason);

// or

std::string text;
std::format_to(std::back_inserter(text), "severity={},error={},reason={}", severity, error, reason);

std::format is very similar in many aspects to printf, although you don’t need to provide type specifiers, such as %d, %u, %s, only an argument placeholder {}. Of course, there are format specifiers available, which you can read about here, but that’s not of interest for the point of this article.

The std::format_to is useful to append text because it writes to an output buffer through an iterator. This allows us to append text, conditionally, such as in the following example where the reason is written in the message only if it’s not empty:

std::string text = std::format("severity={},error={}", severity, error);
if(!reason.empty())
std::format_to(std::back_inserter(text), ",reason=", reason);
std::string text = std::format("severity={},error={}", severity, error);

if(!reason.empty())
  std::format_to(std::back_inserter(text), ",reason=", reason);

In which performance is compared…

With all these options, the question is which is the best to use? In general, stream operations are slow, and {fmt} is known to be significantly faster. But not all cases are equal, and, in general, when you want to make optimizations, you should measure yourself and not take decisions based on generalities.

I asked myself this question recently, when I noticed in a project that I’m currently involved with the large scale use of std::stringstream to format log messages. Most examples involved one to three arguments. For instance:

std::stringstream ss;
ss << "component id: " << id;
std::string msg = ss.str();
std::stringstream ss;
ss << "source: " << source << "|code=" << code;
std::string msg = ss.str();
std::stringstream ss;
ss << "component id: " << id;

std::string msg = ss.str();

// or

std::stringstream ss;
ss << "source: " << source << "|code=" << code;

std::string msg = ss.str();

I thought that replacing std::stringstream with std::format should be beneficial for performance, but I wanted to measure how much faster it would be. So I wrote the following program to compare the alternatives. What it does is:

  • format a text to the form "Number 42 is great!"
  • compares std::stringstream, sprintf, std::format, and std::format_to
  • runs a variable number of iterations, 1 to 1000000, and determines the average time per iteration
int main()
std::stringstream ss;
ss << 42;
using namespace std::chrono_literals;
std::random_device rd{};
auto mtgen = std::mt19937{ rd() };
auto ud = std::uniform_int_distribution<>{ -1000000, 1000000 };
std::vector<int> iterations{ 1, 2, 5, 10, 100, 1000, 10000, 100000, 1000000 };
std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "iterations", "stringstream", "sprintf", "format_to", "format");
std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "----------", "------------", "-------", "---------", "------");
for (int count : iterations)
std::vector<int> numbers(count);
for (std::size_t i = 0; i < numbers.size(); ++i)
numbers[i] = ud(mtgen);
long long t1, t2, t3, t4;
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::stringstream ss;
ss << "Number " << numbers[i] << " is great!";
std::string s = ss.str();
auto end = std::chrono::high_resolution_clock::now();
t1 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string str(100, '\0');
std::sprintf(str.data(), "Number %d is great!", numbers[i]);
auto end = std::chrono::high_resolution_clock::now();
t2 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string s;
std::format_to(std::back_inserter(s), "Number {} is great!", numbers[i]);
auto end = std::chrono::high_resolution_clock::now();
t3 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string s = std::format("Number {} is great!", numbers[i]);
auto end = std::chrono::high_resolution_clock::now();
t4 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
std::println("{:<10} {:<12.2f} {:<7.2f} {:<9.2f} {:<7.2f}", count, t1/1000.0 / count, t2 / 1000.0 / count, t3 / 1000.0 / count, t4 / 1000.0 / count);
int main()
{
   {
      std::stringstream ss;
      ss << 42;
   }

   using namespace std::chrono_literals;

   std::random_device rd{};
   auto mtgen = std::mt19937{ rd() };
   auto ud = std::uniform_int_distribution<>{ -1000000, 1000000 };

   std::vector<int> iterations{ 1, 2, 5, 10, 100, 1000, 10000, 100000, 1000000 };

   std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "iterations", "stringstream", "sprintf", "format_to", "format");
   std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "----------", "------------", "-------", "---------", "------");

   for (int count : iterations)
   {
      std::vector<int> numbers(count);
      for (std::size_t i = 0; i < numbers.size(); ++i)
      {
         numbers[i] = ud(mtgen);
      }

      long long t1, t2, t3, t4;

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::stringstream ss;
            ss << "Number " << numbers[i] << " is great!";
            std::string s = ss.str();
         }

         auto end = std::chrono::high_resolution_clock::now();
         t1 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string str(100, '\0');
            std::sprintf(str.data(), "Number %d is great!", numbers[i]);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t2 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string s;
            std::format_to(std::back_inserter(s), "Number {} is great!", numbers[i]);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t3 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string s = std::format("Number {} is great!", numbers[i]);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t4 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      std::println("{:<10} {:<12.2f} {:<7.2f} {:<9.2f} {:<7.2f}", count, t1/1000.0 / count, t2 / 1000.0 / count, t3 / 1000.0 / count, t4 / 1000.0 / count);
   }
}

The results vary slightly for each execution, and, as one would expect, should be different on different machines. On my machine, a Release build for 64-bit produces results as following (time values are given in microseconds):

iterations stringstream sprintf format_to format
---------- ------------ ------- --------- ------
1 29.60 11.80 1.80 0.60
2 10.00 4.20 0.55 0.50
5 1.56 0.56 0.34 0.26
10 1.61 1.15 0.26 0.31
100 1.15 0.28 0.22 0.26
1000 1.17 0.30 0.24 0.26
10000 1.29 0.28 0.23 0.24
100000 0.87 0.18 0.15 0.16
1000000 0.74 0.18 0.15 0.16
iterations stringstream sprintf format_to format
---------- ------------ ------- --------- ------
1          29.60        11.80   1.80      0.60
2          10.00        4.20    0.55      0.50
5          1.56         0.56    0.34      0.26
10         1.61         1.15    0.26      0.31
100        1.15         0.28    0.22      0.26
1000       1.17         0.30    0.24      0.26
10000      1.29         0.28    0.23      0.24
100000     0.87         0.18    0.15      0.16
1000000    0.74         0.18    0.15      0.16

If we ran the loop a single time, the sprintf is 2-3 times faster, in general, than std::stringstream, but std::format/std::format are 20-30x times faster than std::stringstream and 5-20x faster than sprintf. These numbers change as we measure the execution of more loops, but still, std::format is still about 5 times faster than std::stringstream and mostly the same with sprintf. Since in my case generating a log message does not occur in a loop, I can conclude that the speed-up can be 20-30x.

For the case when 2 arguments are written in the output text, the numbers are similar. The program is only slightly different, to generate text of the form "Numbers 42 and 43 are great!":

int main()
std::stringstream ss;
ss << 42;
using namespace std::chrono_literals;
std::random_device rd{};
auto mtgen = std::mt19937{ rd() };
auto ud = std::uniform_int_distribution<>{ -1000000, 1000000 };
std::vector<int> iterations{ 1, 2, 5, 10, 100, 1000, 10000, 100000, 1000000 };
std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "iterations", "stringstream", "sprintf", "format_to", "format");
std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "----------", "------------", "-------", "---------", "------");
for (int count : iterations)
std::vector<int> numbers(count);
for (std::size_t i = 0; i < numbers.size(); ++i)
numbers[i] = ud(mtgen);
long long t1, t2, t3, t4;
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::stringstream ss;
ss << "Numbers " << numbers[i] << " and " << numbers[i] + 1 << " are great!";
std::string s = ss.str();
auto end = std::chrono::high_resolution_clock::now();
t1 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string str(100, '\0');
sprintf(str.data(), "Numbers %d and %d are great!", numbers[i], numbers[i] + 1);
auto end = std::chrono::high_resolution_clock::now();
t2 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string s;
std::format_to(std::back_inserter(s), "Numbers {} and {} are great!", numbers[i], numbers[i] + 1);
auto end = std::chrono::high_resolution_clock::now();
t3 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
auto start = std::chrono::high_resolution_clock::now();
for (std::size_t i = 0; i < numbers.size(); ++i)
std::string s = std::format("Numbers {} and {} are great!", numbers[i], numbers[i] + 1);
auto end = std::chrono::high_resolution_clock::now();
t4 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
std::println("{:<10} {:<12.2} {:<7.2} {:<9.2} {:<7.2}", count, t1 / 1000.0 / count, t2 / 1000.0 / count, t3 / 1000.0 / count, t4 / 1000.0 / count);
int main()
{
   {
      std::stringstream ss;
      ss << 42;
   }

   using namespace std::chrono_literals;

   std::random_device rd{};
   auto mtgen = std::mt19937{ rd() };
   auto ud = std::uniform_int_distribution<>{ -1000000, 1000000 };

   std::vector<int> iterations{ 1, 2, 5, 10, 100, 1000, 10000, 100000, 1000000 };

   std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "iterations", "stringstream", "sprintf", "format_to", "format");
   std::println("{:>10} {:>12} {:>7} {:>9} {:>6}", "----------", "------------", "-------", "---------", "------");

   for (int count : iterations)
   {
      std::vector<int> numbers(count);
      for (std::size_t i = 0; i < numbers.size(); ++i)
      {
         numbers[i] = ud(mtgen);
      }

      long long t1, t2, t3, t4;

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::stringstream ss;
            ss << "Numbers " << numbers[i] << " and " << numbers[i] + 1 << " are great!";
            std::string s = ss.str();
         }

         auto end = std::chrono::high_resolution_clock::now();
         t1 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string str(100, '\0');
            sprintf(str.data(), "Numbers %d and %d are great!", numbers[i], numbers[i] + 1);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t2 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string s;
            std::format_to(std::back_inserter(s), "Numbers {} and {} are great!", numbers[i], numbers[i] + 1);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t3 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      {
         auto start = std::chrono::high_resolution_clock::now();

         for (std::size_t i = 0; i < numbers.size(); ++i)
         {
            std::string s = std::format("Numbers {} and {} are great!", numbers[i], numbers[i] + 1);
         }

         auto end = std::chrono::high_resolution_clock::now();
         t4 = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start).count();
      }

      std::println("{:<10} {:<12.2} {:<7.2} {:<9.2} {:<7.2}", count, t1 / 1000.0 / count, t2 / 1000.0 / count, t3 / 1000.0 / count, t4 / 1000.0 / count);
   }
}

The results are in the same range as shown previously (although, again, they vary from one execution to another):

iterations stringstream sprintf format_to format
---------- ------------ ------- --------- ------
1 27 4.7 5.8 0.8
2 8.1 1.4 0.9 0.75
5 3.4 0.8 0.62 0.46
10 4.3 0.82 0.44 0.38
100 1.9 0.45 0.31 0.33
1000 1.9 0.46 0.37 0.35
10000 1.8 0.38 0.29 0.31
100000 1.3 0.26 0.22 0.24
1000000 1.2 0.27 0.23 0.25
iterations stringstream sprintf format_to format
---------- ------------ ------- --------- ------
1          27           4.7     5.8       0.8
2          8.1          1.4     0.9       0.75
5          3.4          0.8     0.62      0.46
10         4.3          0.82    0.44      0.38
100        1.9          0.45    0.31      0.33
1000       1.9          0.46    0.37      0.35
10000      1.8          0.38    0.29      0.31
100000     1.3          0.26    0.22      0.24
1000000    1.2          0.27    0.23      0.25

In which compatibility of formatting is discussed…

Although in most cases, moving from std::stringstream to std::format is straight forward, there are some things that are not the same and, therefore, require extra work. Examples include formatting pointers and arrays of unsigned characters.

You can easily write the value of a pointer to an output buffer as follows:

int a = 42;
std::stringstream ss;
ss << "address=" << &a;
std::string text = ss.str();
int a = 42;

std::stringstream ss;
ss << "address=" << &a;
std::string text = ss.str();

The text will have the form "address=00000004D4DAE218". But this does not work with std::format:

int a = 42;
std::string text = std::format("address={}", &a); // error, does not know how to format
int a = 42;

std::string text = std::format("address={}", &a); // error, does not know how to format

This snippet will generate errors (that vary with the compiler) because it does not know how to format the pointer. You can obtain the same result as previously, treating the pointer like a std:size_t value and using a format specifier such as :016X (16 hexadecimal uppercase digits with leading zeros):

std::string text = std::format("address={:016X}", reinterpret_cast<std::size_t>(&a));
std::string text = std::format("address={:016X}", reinterpret_cast<std::size_t>(&a));

Now, the result will be the same (although you should keep in mind that for 32-bit the pointers are only 8 hexadecimal digits).

Here is another example with arrays of unsigned characters, that std::stringstream converts to char when writing to the output buffer:

unsigned char str[]{3,4,5,6,0};
std::stringstream ss;
ss << "str=" << str;
std::string text = ss.str();
unsigned char str[]{3,4,5,6,0};

std::stringstream ss;
ss << "str=" << str;
std::string text = ss.str();

The content of text will be "str=♥♦♣♠".

Trying the same with std::format, will fail again because it does not know how to format the array:

std::string text = std::format("str={}", str); // error, does not now how to format
std::string text = std::format("str={}", str); // error, does not now how to format

We can write the content of the array in a loop, as follows:

std::string text = "str=";
for (auto c : str)
std::format_to(std::back_inserter(text), "{}", c);
std::string text = "str=";
for (auto c : str)
   std::format_to(std::back_inserter(text), "{}", c);

The content of the text will be "str=34560" because every unsigned char is written as such to the output buffer, without any casting. To obtain the same result as previously, you need to perform a cast explicitly:

std::string text = "str=";
for (auto c : str)
std::format_to(std::back_inserter(text), "{}", static_cast<char>(c));
std::string text = "str=";
for (auto c : str)
   std::format_to(std::back_inserter(text), "{}", static_cast<char>(c));

Bonus talk

If you’re formatting text to be written to the output console and use the result of std::format / std::format_to with std::cout (or other alternatives) there is no need for that in C++23, which introduced std::print and std::println:

int severity = 1;
unsigned error = 42;
reason = "access denied";
std::println("severity={},error={},reason={}", severity, error, reason);
int severity = 1;
unsigned error = 42;
reason = "access denied";

std::println("severity={},error={},reason={}", severity, error, reason);

Like this:

Loading...

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK