{"id":5740,"date":"2019-07-08T19:19:43","date_gmt":"2019-07-08T19:19:43","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/regular-expressions\/"},"modified":"2023-06-26T10:05:06","modified_gmt":"2023-06-26T10:05:06","slug":"regular-expressions","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/regular-expressions\/","title":{"rendered":"The Regular Expression Library"},"content":{"rendered":"<p>My original plan was to write about the rules of the C++ Core Guidelines for the regex and chrono library, but besides the subsection title, no content is available. I already wrote a few posts about&nbsp;<a href=\"https:\/\/www.modernescpp.com\/index.php\/tag\/time\">time <\/a>functionality. So I&#8217;m done. Today, I fill the gap and write about the regex library.<\/p>\n<p><!--more--><\/p>\n<div id=\"simple-translate\">&nbsp;<\/div>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5737\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/concept-18290_1280.jpg\" alt=\"concept 18290 1280\" width=\"500\" height=\"333\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/concept-18290_1280.jpg 1280w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/concept-18290_1280-300x200.jpg 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/concept-18290_1280-1024x682.jpg 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/concept-18290_1280-768x512.jpg 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>Okay, here are my rules for regular expressions.<\/p>\n<\/p>\n<h2>Only use a Regular Expression if you have to<\/h2>\n<p>Regular expressions are powerful but also sometimes expensive and complicated machinery to work with text. When the interface of a <span style=\"font-family: courier new, courier;\">std::string<\/span> or the algorithms of the Standard Template Library can do the job, use them. &nbsp;<\/p>\n<p>Okay, but when should you use regular expressions? Here are the typical use cases.<\/p>\n<h3>Use-Case for Regular Expressions<\/h3>\n<ul>\n<li>Check if a text matches a text pattern:<span style=\"font-family: courier new, courier;\"> std::regex_match<\/span><\/li>\n<li>Search for a text pattern in a text: <span style=\"font-family: courier new, courier;\">std::regex_search<\/span><\/li>\n<li>Replace a text pattern with a text: <span style=\"font-family: courier new, courier;\">std::regex_replace<\/span><\/li>\n<li>Iterate through all text patterns in a text: <span style=\"font-family: courier new, courier;\">std::regex_iterator<\/span> and <span style=\"font-family: courier new, courier;\">std::regex_token_iterator<\/span><\/li>\n<\/ul>\n<p>I hope you noticed it. The operations work on text patterns and not on text.<\/p>\n<p>First, you should use raw strings to write your regular expression.<\/p>\n<h2>Use Raw Strings for Regular Expressions<\/h2>\n<p>First of all, for simplicity purposes, I will break the previous rule.<\/p>\n<p>The regular expression for the C++ text is quite ugly: <span style=\"font-family: courier new, courier;\">C\\\\+\\\\+<\/span>. You have to use two backslashes for each + sign. First, the + sign is a unique character in a regular expression. Second, the backslash is a special character in a string. Therefore one backslash escapes the + sign; the other backslash escapes the backslash.<br \/>By using a raw string literal, the second backslash is not necessary anymore because the backslash is not interpreted in the string.<\/p>\n<p>The following short example may not convince you.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\">std<span style=\"color: #555555;\">::<\/span>string regExpr(<span style=\"color: #cc3300;\">\"C<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">+<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">+\"<\/span>);\r\nstd<span style=\"color: #555555;\">::<\/span>string regExprRaw(R<span style=\"color: #cc3300;\">\"(C\\+\\+)\"<\/span>);\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Both strings stand for regular expression, which matches the text<span style=\"font-family: courier new, courier;\"> C++<\/span>. In particular, the raw string R&#8221;(C\\+\\+) is quite ugly to read. <span style=\"font-family: courier new, courier; color: #ff0000;\"><strong>R&#8221;(<\/strong><\/span>Raw String<span style=\"font-family: courier new, courier; color: #ff0000;\"><strong>)&#8221;<\/strong> <\/span>delimits the raw string. By the way, regular expressions and path names on windows <span style=\"font-family: courier new, courier;\">&#8220;C:\\temp\\newFile.txt&#8221;<\/span> are typical use cases for raw strings.<\/p>\n<p>Imagine you want to search for a floating-point number in a text, which you identify by the following sequence of signs: Tabulator FloatingPointNumber Tabulator \\\\DELIMITER. Here is a concrete example for this pattern: <span style=\"font-family: courier new, courier;\">&#8220;\\t5.5\\t\\\\DELIMITER<\/span>&#8220;.<\/p>\n<p>The following program uses a regular expression encoded in a string and a raw string to match this pattern.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0099ff; font-style: italic;\">\/\/ regexSearchFloatingPoint.cpp<\/span>\r\n\r\n<span style=\"color: #009999;\">#include &lt;regex&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;string&gt;<\/span>\r\n\r\n<span style=\"color: #007788; font-weight: bold;\">int<\/span> <span style=\"color: #cc00ff;\">main<\/span>(){\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>string text <span style=\"color: #555555;\">=<\/span> <span style=\"color: #cc3300;\">\"A text with floating pointer number <\/span><span style=\"color: #cc3300; font-weight: bold;\">\\t<\/span><span style=\"color: #cc3300;\">5.5<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\t\\\\<\/span><span style=\"color: #cc3300;\">DELIMITER and more text.\"<\/span>;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> text <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    \r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>regex rgx(<span style=\"color: #cc3300;\">\"<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">t[0-9]+<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">.[0-9]+<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">t<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\\\\\<\/span><span style=\"color: #cc3300;\">DELIMITER\"<\/span>);          <span style=\"color: #0099ff; font-style: italic;\">\/\/ (1)<\/span> \r\n    std<span style=\"color: #555555;\">::<\/span>regex rgxRaw(R<span style=\"color: #cc3300;\">\"(<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\t<\/span><span style=\"color: #cc3300;\">[0-9]+\\.[0-9]+<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\t\\\\<\/span><span style=\"color: #cc3300;\">DELIMITER)\"<\/span>);         <span style=\"color: #0099ff; font-style: italic;\">\/\/ (2)<\/span> \r\n\r\n    <span style=\"color: #006699; font-weight: bold;\">if<\/span> (std<span style=\"color: #555555;\">::<\/span>regex_search(text, rgx)) std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"found with rgx\"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    <span style=\"color: #006699; font-weight: bold;\">if<\/span> (std<span style=\"color: #555555;\">::<\/span>regex_search(text, rgxRaw)) std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"found with rgxRaw\"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The regular expression<strong><span style=\"color: #000000; font-family: 'courier new', courier;\"> rgx(&#8220;\\\\t[0-9]+\\\\.[0-9]+\\\\t\\\\\\\\DELIMITER&#8221;)<\/span> <\/strong>is pretty ugly. To find n &#8220;<strong>\\<\/strong>&#8220;-symbols (line 1), you have to write 2 * n &#8220;\\&#8221;-symbols. In contrast, using a raw string to define a regular expression makes it possible to express the pattern you are looking for directly in the regular expression: <strong><span style=\"color: #000000; font-family: 'courier new', courier;\">rgxRaw(R&#8221;(\\t[0-9]+\\.[0-9]+\\t\\\\DELIMITER)&#8221;) <\/span><\/strong><span style=\"color: #000000;\">(line 2). The subexpression <span style=\"font-family: courier new, courier;\"><strong>[0-9]+\\.[0-9]+<\/strong>&nbsp;<\/span>of the regular expression stands for a floating point number: at least one number <strong><span style=\"font-family: 'courier new', courier;\">[0-9]+<\/span> <\/strong>followed by a dot<strong><span style=\"font-family: 'courier new', courier;\"> \\.<\/span><\/strong> followed by at least one number<strong><span style=\"font-family: 'courier new', courier;\"> [0-9]+<\/span><\/strong>.<span style=\"font-family: courier new, courier;\">&nbsp; <\/span><\/span><span style=\"color: #000000; font-family: courier new, courier;\"> <\/span><\/p>\n<p>Just for completeness, the output of the program.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5738\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchFloatingPoint.png\" alt=\"regexSearchFloatingPoint\" width=\"550\" height=\"184\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchFloatingPoint.png 1214w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchFloatingPoint-300x100.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchFloatingPoint-1024x342.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchFloatingPoint-768x256.png 768w\" sizes=\"auto, (max-width: 550px) 100vw, 550px\" \/><\/p>\n<p>Honestly, this example was relatively simple. Most of the time, you want to analyze your match result.<\/p>\n<h2>For further analysis, use your <span style=\"font-family: courier new, courier;\">match_result<\/span><\/h2>\n<p>Using a regular expression typically consists of three steps. This holds for <span style=\"font-family: courier new, courier;\">std::regex_search<\/span>, and <span style=\"font-family: courier new, courier;\">std::regex_match<\/span>.<\/p>\n<ol>\n<li>Define the regular expression.<\/li>\n<li>Store the result of the search.<\/li>\n<li>Analyze the result.<\/li>\n<\/ol>\n<p>Let&#8217;s see what that means\u2014this time I want to find the first e-mail address in a text. The following regular expression (RFC 5322 Official Standard) for an e-mail address finds not all e-mail addresses because they are very irregular.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"> \t\r\n(<span style=\"color: #555555;\">?:<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span><span style=\"color: #555555;\">!<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">#$<\/span><span style=\"color: #555555;\">%&amp;<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">'<\/span><span style=\"color: #555555;\">*+\/=?^<\/span>_<span style=\"color: #aa0000; background-color: #ffaaaa;\">`<\/span>{<span style=\"color: #555555;\">|<\/span>}<span style=\"color: #555555;\">~-<\/span>]<span style=\"color: #555555;\">+<\/span>(<span style=\"color: #555555;\">?:<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>.[az0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span><span style=\"color: #555555;\">!<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">#$<\/span><span style=\"color: #555555;\">%&amp;<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">'<\/span><span style=\"color: #555555;\">*+\/=?^<\/span>_<span style=\"color: #aa0000; background-color: #ffaaaa;\">`<\/span>{<span style=\"color: #555555;\">|<\/span>}<span style=\"color: #555555;\">~-<\/span>]<span style=\"color: #555555;\">+<\/span>)<span style=\"color: #555555;\">*|<\/span><span style=\"color: #cc3300;\">\"(?:[<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x01<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x08\\x0b\\x0c\\x0e<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x1f<\/span><span style=\"color: #cc3300;\">\\x2<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x23<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x5b\\x5d<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x7f<\/span><span style=\"color: #cc3300;\">]|<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\\\<\/span><span style=\"color: #cc3300;\">[<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x01<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x09\\x0b\\x0c\\x0e<\/span><span style=\"color: #cc3300;\">-<\/span><span style=\"color: #cc3300; font-weight: bold;\">\\x7f<\/span><span style=\"color: #cc3300;\">])*\"<\/span>)\r\n<span style=\"color: #aa0000; background-color: #ffaaaa;\">@<\/span>(<span style=\"color: #555555;\">?:<\/span>(<span style=\"color: #555555;\">?:<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>](<span style=\"color: #555555;\">?:<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span><span style=\"color: #555555;\">-<\/span>]<span style=\"color: #555555;\">*<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>])<span style=\"color: #555555;\">?<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>.)<span style=\"color: #555555;\">+<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>](<span style=\"color: #555555;\">?:<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span><span style=\"color: #555555;\">-<\/span>]<span style=\"color: #555555;\">*<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>])<span style=\"color: #555555;\">?|<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>[(<span style=\"color: #555555;\">?:<\/span>(<span style=\"color: #555555;\">?:<\/span><span style=\"color: #ff6600;\">25<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">5<\/span>]<span style=\"color: #555555;\">|<\/span><span style=\"color: #ff6600;\">2<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">4<\/span>][<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>]<span style=\"color: #555555;\">|<\/span>[<span style=\"color: #ff6600;\">01<\/span>]<span style=\"color: #555555;\">?<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>][<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>]<span style=\"color: #555555;\">?<\/span>)<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>.){<span style=\"color: #ff6600;\">3<\/span>}(<span style=\"color: #555555;\">?:<\/span><span style=\"color: #ff6600;\">25<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">5<\/span>]<span style=\"color: #555555;\">|<\/span><span style=\"color: #ff6600;\">2<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">4<\/span>][<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>]<span style=\"color: #555555;\">|<\/span>[<span style=\"color: #ff6600;\">01<\/span>]<span style=\"color: #555555;\">?<\/span>[<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>][<span style=\"color: #ff6600;\">0<\/span><span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>]<span style=\"color: #555555;\">?|<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span><span style=\"color: #555555;\">-<\/span>]<span style=\"color: #555555;\">*<\/span>[a<span style=\"color: #555555;\">-<\/span>z0<span style=\"color: #555555;\">-<\/span><span style=\"color: #ff6600;\">9<\/span>]<span style=\"color: #555555;\">:<\/span>(<span style=\"color: #555555;\">?:<\/span>[<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x01<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x08<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0b<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0c<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0e<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x1f<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x21<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x5a<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x53<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x7f]<span style=\"color: #555555;\">|<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\\\<\/span>[<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x01<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x09<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0b<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0c<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x0e<span style=\"color: #555555;\">-<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>x7f])<span style=\"color: #555555;\">+<\/span>)<span style=\"color: #aa0000; background-color: #ffaaaa;\">\\<\/span>])\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>For readability, I made a line break in the regular expression. The first line matches the local part, and the second line the domain part of the e-mail address. My program uses a more straightforward regular expression for matching an e-mail address. It&#8217;s not perfect, but it will do its job. Additionally, I want to match the local part and the domain part of my e-mail address.<\/p>\n<p>Here we are:<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0099ff; font-style: italic;\">\/\/ regexSearchEmail.cpp<\/span>\r\n\r\n<span style=\"color: #009999;\">#include &lt;regex&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;string&gt;<\/span>\r\n\r\n<span style=\"color: #007788; font-weight: bold;\">int<\/span> <span style=\"color: #cc00ff;\">main<\/span>(){\r\n\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n  std<span style=\"color: #555555;\">::<\/span>string emailText <span style=\"color: #555555;\">=<\/span> <span style=\"color: #cc3300;\">\"A text with an email address: rainer@grimm-jaud.de.\"<\/span>;\r\n\r\n  <span style=\"color: #0099ff; font-style: italic;\">\/\/ (1) <\/span>\r\n  std<span style=\"color: #555555;\">::<\/span>string regExprStr(R<span style=\"color: #cc3300;\">\"(([\\w.%+-]+)@([\\w.-]+\\.[a-zA-Z]{2,4}))\"<\/span>);\r\n  std<span style=\"color: #555555;\">::<\/span>regex rgx(regExprStr);\r\n\r\n  <span style=\"color: #0099ff; font-style: italic;\">\/\/ (2)<\/span>\r\n  std<span style=\"color: #555555;\">::<\/span>smatch smatch;\r\n\r\n  <span style=\"color: #006699; font-weight: bold;\">if<\/span> (std<span style=\"color: #555555;\">::<\/span>regex_search(emailText, smatch, rgx)){\r\n      \r\n    <span style=\"color: #0099ff; font-style: italic;\">\/\/ (3)  <\/span>\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Text: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> emailText <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Before the email address: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch.prefix() <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"After the email address: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch.suffix() <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Length of email adress: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch.length() <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Email address: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch[<span style=\"color: #ff6600;\">0<\/span>] <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;          <span style=\"color: #0099ff; font-style: italic;\">\/\/ (6)<\/span>\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Local part: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch[<span style=\"color: #ff6600;\">1<\/span>] <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;             <span style=\"color: #0099ff; font-style: italic;\">\/\/ (4)<\/span>\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"Domain name: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> smatch[<span style=\"color: #ff6600;\">2<\/span>] <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;            <span style=\"color: #0099ff; font-style: italic;\">\/\/ (5)<\/span>\r\n\r\n  }\r\n\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Lines 1, 2, and 3 begin the three typical steps of using a regular expression. The regular expression in line 2 needs a few additional words.<\/p>\n<p>Here it is:<span style=\"color: #000000; font-family: courier new, courier;\">([\\w.%+-]+)@([\\w.-]+\\.[a-zA-Z]{2,4})<\/span><\/p>\n<ul>\n<li><span style=\"color: #000000; font-family: courier new, courier;\"><strong>[\\w.%+-]+<\/strong>: <\/span><span style=\"color: #000000;\">At least one of the following characters:<\/span><span style=\"color: #000000;\"> <\/span><strong><span style=\"color: #000000;\"><\/span><span style=\"color: #000000; font-family: 'courier new', courier;\">&#8220;\\w&#8221;,<\/span><span style=\"color: #000000;\"><\/span><span style=\"color: #000000; font-family: 'courier new', courier;\"> &#8220;.&#8221;, <\/span><span style=\"color: #000000;\"><\/span><span style=\"color: #000000; font-family: 'courier new', courier;\">&#8220;%&#8221;, <\/span><span style=\"color: #000000;\"><\/span><span style=\"color: #000000; font-family: 'courier new', courier;\">&#8220;+&#8221;, <\/span><\/strong><span style=\"color: #000000;\">or&nbsp;<\/span><strong><span style=\"color: #000000; font-family: 'courier new', courier;\">&#8220;-&#8220;. &#8220;\\w&#8221; <\/span><\/strong><span style=\"color: #000000;\">stands for a word character.<\/span><span style=\"color: #000000; font-family: courier new, courier;\"><br \/><\/span><\/li>\n<li><span style=\"color: #000000; font-family: courier new, courier;\"><strong>[\\w.-]+\\.[a-zA-Z]{2,4}<\/strong>: <\/span><span style=\"color: #000000;\">At least one of a<\/span><span style=\"color: #000000; font-family: courier new, courier;\"><strong> &#8220;\\w&#8221;, &#8220;.&#8221;, &#8220;-&#8220;<\/strong>,<\/span><span style=\"color: #000000;\"> followed by a dot<\/span><span style=\"color: #000000; font-family: courier new, courier;\"><strong> &#8220;.&#8221;<\/strong>, <\/span><span style=\"color: #000000;\">followed by <strong>2 &#8211; 4<\/strong> characters from the range <strong>a-z<\/strong> or <strong>A-Z.<\/strong><\/span><span style=\"color: #000000; font-family: courier new, courier;\"><br \/><\/span><\/li>\n<li><span style=\"color: #000000; font-family: courier new, courier;\"><strong>(&#8230;)@(&#8230;)<\/strong>: <\/span><span style=\"color: #000000;\">The round braces stand for a capture group. They allow you to identify a sub match in a match. The first capture (line 4) group is the local part of an address. The second capture group (line 5) is the domain part of the e-mail address. You can address the match with the 0th capture group (line 6).<\/span><span style=\"color: #000000; font-family: courier new, courier;\"><br \/><\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>The output of the program shows a detailed analysis.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5739\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchEmail.png\" alt=\"regexSearchEmail\" width=\"500\" height=\"281\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchEmail.png 1132w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchEmail-300x168.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchEmail-1024x574.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2019\/07\/regexSearchEmail-768x431.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>I&#8217;m not done. There is more to write about regular expressions in my <a href=\"https:\/\/www.modernescpp.com\/index.php\/more-rules-to-the-regular-expression-library\">next post<\/a>. I write about various types of text and iterate through all matches.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>My original plan was to write about the rules of the C++ Core Guidelines for the regex and chrono library, but besides the subsection title, no content is available. I already wrote a few posts about&nbsp;time functionality. So I&#8217;m done. Today, I fill the gap and write about the regex library.<\/p>\n","protected":false},"author":21,"featured_media":5737,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[372],"tags":[469],"class_list":["post-5740","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-modern-c","tag-regular-expressions"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5740","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=5740"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5740\/revisions"}],"predecessor-version":[{"id":6780,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5740\/revisions\/6780"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/5737"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=5740"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=5740"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=5740"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}