The Regular Expression Library


My original plan was it to write about the rules of the C++ Core Guidelines to the regex and chrono library, but besides the subsection title, there is no content available. I already wrote a few posts about time functionality. So I'm done. Today, I fill the gap and write about the regex library.


concept 18290 1280


Okay, here are my rules for regular expressions.

Only use a Regular Expression if you have to

Regular expressions are powerful but also sometimes expensive and complicated machinery to work with text. When the interface of a std::string or the algorithms of the Standard Template Library can do the job, use them.  

Okay, but when should you use regular expressions? Here are the typical use-cases.

Use-Case for Regular Expressions

  • Check if a text matches a text pattern: std::regex_match
  • Search for a text pattern in a text: std::regex_search
  • Replace a text pattern with a text: std::regex_replace
  • Iterate through all text patterns in a text: std::regex_iterator and std::regex_token_iterator

I hope you noticed it. The operations work on text patterns and not on text.

First, you should use raw strings to write your regular expression.

Use Raw Strings for Regular Expressions

First of all, for simplicity purposes, I will break the previous rule.

The regular expression for the text C++ is quite ugly: C\\+\\+. You have to use two backslashes for each + sign. First, the + sign is a special character in a regular expression. Second, the backslash is a special character in a string. Therefore one backslash escapes the + sign, the other backslash escapes the backslash.
By using a raw string literal the second backslash is not necessary any more, because the backslash is not be interpreted in the string.

The following short example may not convince you.

std::string regExpr("C\\+\\+");
std::string regExprRaw(R"(C\+\+)");


Both strings stand for regular expression which matches the text C++. In particular, the raw string R"(C\+\+) is quite ugly to read. R"(Raw String)" delimits the raw string. By the way, regular expressions and path names on windows "C:\temp\newFile.txt" are typical use-case for raw strings.

Imagine, you want to search for a floating point number in a text, which you identify by the following sequence of signs: Tabulator FloatingPointNumber Tabulator \\DELIMITER. Here is a concrete example for this pattern: "\t5.5\t\\DELIMITER".

The following program uses a regular expression encode in a string and in a raw string to match this pattern.

// regexSearchFloatingPoint.cpp

#include <regex>
#include <iostream>
#include <string>

int main(){

    std::cout << std::endl;

    std::string text = "A text with floating pointer number \t5.5\t\\DELIMITER and more text.";
    std::cout << text << std::endl;
    std::cout << std::endl;

    std::regex rgx("\\t[0-9]+\\.[0-9]+\\t\\\\DELIMITER");          // (1) 
    std::regex rgxRaw(R"(\t[0-9]+\.[0-9]+\t\\DELIMITER)");         // (2) 

    if (std::regex_search(text, rgx)) std::cout << "found with rgx" << std::endl;
    if (std::regex_search(text, rgxRaw)) std::cout << "found with rgxRaw" << std::endl;

    std::cout << std::endl;


The regular expression rgx("\\t[0-9]+\\.[0-9]+\\t\\\\DELIMITER") is pretty ugly. To find n "\"-symbols (line 1), you have to write 2 * n "\"-symbols. In constrast, using a raw string to define a regular expression, makes it possible, to express the pattern your are looking for directly in the regular expression: rgxRaw(R"(\t[0-9]+\.[0-9]+\t\\DELIMITER)") (line 2). The subexpression [0-9]+\.[0-9]+ of the regular expression stands for a floating point number: at least one number [0-9]+ followed by a dot \. followed by at least one number [0-9]+. 

Just for completeness, the output of the program.


Honestly, this example was rather simple. Most of the times, you want to analyse your match result.

For further analyse use your match_result

Using a regular expression consists typically of three steps. This holds for std::regex_search, and std::regex_match.

  1. Define the regular expression.
  2. Store the result of the search.
  3. Analyse the result.

Let's see what that means. This time I want to find the first e-mail address in a text. The following regular expression (RFC 5322 Official Standard) for an e-mail address finds not all e-mail addresses because they are very irregular.



For readability, I made a line break in the regular expression. The first line matches the local part and the second line the domain part of the e-mail address. My program uses a simpler regular expression for matching an e-mail address. It's not perfect, but it will do its job. Additionally, I want to match the local part and the domain part of my e-mail address.

Here we are:

// regexSearchEmail.cpp

#include <regex>
#include <iostream>
#include <string>

int main(){

  std::cout << std::endl;

  std::string emailText = "A text with an email address: This email address is being protected from spambots. You need JavaScript enabled to view it..";

  // (1) 
  std::string regExprStr(R"(([\w.%+-]+)@([\w.-]+\.[a-zA-Z]{2,4}))");
  std::regex rgx(regExprStr);

  // (2)
  std::smatch smatch;

  if (std::regex_search(emailText, smatch, rgx)){
    // (3)  

    std::cout << "Text: " << emailText << std::endl;
    std::cout << std::endl;
    std::cout << "Before the email address: " << smatch.prefix() << std::endl;
    std::cout << "After the email address: " << smatch.suffix() << std::endl;
    std::cout << std::endl;
    std::cout << "Length of email adress: " << smatch.length() << std::endl;
    std::cout << std::endl;
    std::cout << "Email address: " << smatch[0] << std::endl;          // (6)
    std::cout << "Local part: " << smatch[1] << std::endl;             // (4)
    std::cout << "Domain name: " << smatch[2] << std::endl;            // (5)


  std::cout << std::endl;



Lines 1, 2, and 3 stand for the beginning of the 3 typical steps of the usage of a regular expression. The regular expression in line 2 needs a few additional words.

Here it is:([\w.%+-]+)@([\w.-]+\.[a-zA-Z]{2,4})

  • [\w.%+-]+: At least one of the following characters: "\w", ".", "%", "+", or "-". "\w" stands for a word character.
  • [\w.-]+\.[a-zA-Z]{2,4}: At least one of a "\w", ".", "-", followed by a dot ".", followed by 2 - 4 characters from the range a-z or the range A-Z.
  • (...)@(...): The round braces stand for a capture group. They allow you to identify a submatch in a match. The first capture (line 4) group is the local part of an address. The second capture group (line 5) is the domain part of the e-mail address. You can address the entire match with the 0-th capture group (line 6).


The output of the program shows the detailed analyse.


What's next?

I'm not done. There is more to write about regular expressions in my next post. I write about various types of text and iterating through all matches.


Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, Marko, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Darshan Mody, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, espkk, Wolfgang Gärtner,  Louis St-Amour, Stephan Roslen, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Avi Kohn, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Neil Wang, Friedrich Huber, Sudhakar Balagurusamy, lennonli, and Pramod Tikare Muralidhara.


Thanks in particular to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, and Dendi Suhubdy



I'm happy to give online-seminars or face-to-face seminars world-wide. Please call me if you have any questions.

Bookable (Online)



Standard Seminars 

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.


Contact Me

Modernes C++,


My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code


Today 7997

Yesterday 6384

Week 7997

Month 166428

All 5035742

Currently are 175 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments