C++ Core Guidelines: Rules for Strings
The C++ core guidelines use the term string as a sequence of characters. Consequently, the guidelines are about C-strings, C++-strings, the C++17 std::string_view‘s, and std::byte‘s.
I will in this post only loosely refer to the guidelines and ignore the strings which are part of the guidelines support library, such as gsl::string_span, zstring, and czstring. For short, I call in this post a std::string, a C++-string, and a const char* a C-string.
Let me start with the first rule:
SL.str.1: Use std::string
to own character sequences
Maybe, you know another string that owns its character’s sequence: a C-string. Don’t use a C-string! Why? Because you have to take care of the memory management, the string termination character, and the string length.
// stringC.c #include <stdio.h> #include <string.h> int main( void ){ char text[10]; strcpy(text, "The Text is too long for text."); // (1) text is too big printf("strlen(text): %u\n", strlen(text)); // (2) text has no termination character '\0' printf("%s\n", text); text[sizeof(text)-1] = '\0'; printf("strlen(text): %u\n", strlen(text)); return 0; }
The simple program stringC.c has inline (1) and line (2) undefined behavior. Compiling it with a rusty GCC 4.8 seems to work fine.
The C++ variant does not have the same issues.
Modernes C++ Mentoring
Do you want to stay informed: Subscribe.
// stringCpp.cpp #include <iostream> #include <string> int main(){ std::string text{"The Text is not too long."}; std::cout << "text.size(): " << text.size() << std::endl; std::cout << text << std::endl; text +=" And can still grow!"; std::cout << "text.size(): " << text.size() << std::endl; std::cout << text << std::endl; }
The output of the program should not surprise you.
In the case of a C++ string, I cannot make an error because the C++ runtime takes care of the memory management and the termination character. Additionally, if you access the elements of the C++ string with the at-operator instead of the index operator, bounds errors are not possible. You can read the details of the at-operator in my previous post: C++ Core Guidelines: Avoid Bounds Errors.
You know, what was strange in C++, including C++11? There was no way to create a C++ string without a C-string. This is strange because we want to get rid of the C-string. This inconsistency is gone with C++14.
SL.str.12: Use the s
suffix for string literals meant to be standard-library string
s
With C++14, we got C++-string literals. It’s a C-string literal with the suffix s: “cStringLiteral”s.
Let me show you an example that makes my point: C-string literals and C++-string literals a different.
// stringLiteral.cpp #include <iostream> #include <string> #include <utility> int main(){ using namespace std::string_literals; // (1) std::string hello = "hello"; // (2) auto firstPair = std::make_pair(hello, 5); auto secondPair = std::make_pair("hello", 15); // (3) // auto secondPair = std::make_pair("hello"s, 15); // (4) if (firstPair < secondPair) std::cout << "true" << std::endl; // (5) }
It’s a pity; I must include the namespace std::string_literals in line (1) to use the C++-string-literals. Line (2) is the critical line in the example. I use the C-string-literal “hello” to create a C++ string. This is why the type of firstPair is (std::string, int), but the type of the secondPair is (const char*, int). Ultimately, the comparison in line (5) fails because you can not compare different types. Look carefully at the last line of the error message:
When I use the C++-string-literal in line (4 ) instead of the C-string-literal in line (3), the program behaves as expected:
C++-string-literals was a C++14 feature. Let’s jump three years further. With C++17, we got std::string_view and std::byte. I already wrote, in particular, about std::string_view. Therefore, I will only recap the most important facts.
SL.str.2: Use std::string_view
or gsl::string_span
to refer to character sequences
Okay, a std::string view only refers to the character sequence. To say it more explicitly: A std::string_view does not own the character sequence. It represents a view of a sequence of characters. This sequence of characters can be a C++ string or a C-string. A std::string_view only needs two pieces of information: the pointer to the character sequence and their length. It supports the reading part of the interface of the std::string. Additionally to a std::string, std::string_view has two modifying operations: remove_prefix and remove_suffix.
Maybe you wonder: Why do we need a std::string_view? A std::string_view is relatively cheap to copy and needs no memory. My previous post C++17 – Avoid Copying with std::string_view shows the impressive performance numbers of a std::string_view.
As I already mentioned it, we got with C++17 also a std::byte.
SL.str.4: Use char*
to refer to a single character and SL.str.5: Use std::byte
to refer to byte values that do not necessarily represent characters
If you don’t follow rule str.4 and use const char* as a C-string, you may end with critical issues.
char arr[] = {'a', 'b', 'c'}; void print(const char* p) { cout << p << '\n'; } void use() { print(arr); // run-time error; potentially very bad }
arr decays to a pointer when used as an argument of the function print. The undefined behavior is that arr is not zero-terminated. You’re mistaken if you now think you can use std::byte as a character.
std::byte is a distinct type implementing the concept of a byte as specified in the C++ language definition. This means a byte is not an integer or a character and is not open to programmer errors. Its job is to access object storage. Consequently, its interface consists only of methods for bitwise logical operations.
namespace std { template <class IntType> constexpr byte operator<<(byte b, IntType shift); template <class IntType> constexpr byte operator>>(byte b, IntType shift); constexpr byte operator|(byte l, byte r); constexpr byte operator&(byte l, byte r); constexpr byte operator~(byte b); constexpr byte operator^(byte l, byte r); }
You can use the function std::to_integer(std::byte b) to convert a std::byte to an integer type and the call std::byte{integer} to do it the other way around. integer has to be a non-negative value smaller than std::numeric_limits<unsigned_char>::max().
What’s next?
I’m almost done with the rules for the standard library. Only a few rules to iostreams and the C-standard library are left. So you know what I will write about in my next post.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, Philipp Lenk, Charles-Jianye Chen, Keith Jeffery, Matt Godbolt, and Honey Sukesan.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.
My special thanks to Embarcadero | |
My special thanks to PVS-Studio | |
My special thanks to Tipi.build | |
My special thanks to Take Up Code | |
My special thanks to SHAVEDYAKS |
Modernes C++ GmbH
Modernes C++ Mentoring (English)
Rainer Grimm
Yalovastraße 20
72108 Rottenburg
Mail: schulung@ModernesCpp.de
Mentoring: www.ModernesCpp.org
Modernes C++ Mentoring,
Leave a Reply
Want to join the discussion?Feel free to contribute!