{"id":5036,"date":"2016-11-21T07:18:44","date_gmt":"2016-11-21T07:18:44","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/type-traits-performance-matters\/"},"modified":"2023-06-26T12:35:40","modified_gmt":"2023-06-26T12:35:40","slug":"type-traits-performance-matters","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/type-traits-performance-matters\/","title":{"rendered":"Type-Traits: Performance Matters"},"content":{"rendered":"<p>If you look carefully, you see type-traits have a big optimization potential. The type-traits support the first step to analyze the code at the compile-time and, in the second step, to optimize the code based on that analysis. How is that possible? Dependent on the type of variable, a faster variant of an algorithm will be chosen.<\/p>\n<p><!--more--><\/p>\n<h2>Work on the entire memory area<\/h2>\n<p>The idea is quite straightforward and is used in current Standard Template Library implementations (STL) implementations. If the elements of a container are simple enough, the algorithm of the STL, like <span style=\"font-family: courier new,courier;\">std::copy, std::fill,<\/span> or&nbsp; <span style=\"font-family: courier new,courier;\">std::equal <\/span>will directly be applied to the memory area.&nbsp; Instead of using <span style=\"font-family: courier new,courier;\">std::copy<\/span> to copy the elements individually, all is done in one large step.&nbsp; Internally, C functions like<span style=\"font-family: courier new,courier;\"> <a href=\"http:\/\/en.cppreference.com\/w\/cpp\/string\/byte\/memcmp\">memcmp<\/a>, <a href=\"http:\/\/en.cppreference.com\/w\/cpp\/string\/byte\/memset\">memset<\/a>, <\/span><a href=\"http:\/\/en.cppreference.com\/w\/cpp\/string\/byte\/memcpy\"><span style=\"font-family: courier new,courier;\">memcpy,<\/span> <\/a>or <a href=\"http:\/\/en.cppreference.com\/w\/cpp\/string\/byte\/memmove\"><span style=\"font-family: courier new,courier;\">memmove<\/span><\/a> are used. The small difference between <span style=\"font-family: courier new,courier;\">memcpy<\/span> and <span style=\"font-family: courier new,courier;\">memmove<\/span> is that <span style=\"font-family: courier new,courier;\">memmove<\/span> can deal with overlapping memory areas. <span style=\"font-family: courier new,courier;\"><br \/><\/span><\/p>\n<p>The implementations of the algorithm <span style=\"font-family: courier new,courier;\">std::cop<\/span>y, <span style=\"font-family: courier new,courier;\">std::fill,<\/span> or<span style=\"font-family: courier new,courier;\"> std::equal<\/span> use a simple strategy.&nbsp; <span style=\"font-family: courier new,courier;\">std::copy<\/span> is like a wrapper. This wrapper checks if the element is simple enough. If so, the wrapper will delegate the&nbsp;work to the optimized copy function. If not, the general copy algorithm will be used. This one copies each element after the other. If the elements are simple enough, the functions of the type-traits library will be used to make the right decision.&nbsp;<\/p>\n<p>The graphic shows this strategy once more:<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5033\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/ContainerVersusElement.png\" alt=\"ContainerVersusElement\" width=\"700\" height=\"526\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/ContainerVersusElement.png 806w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/ContainerVersusElement-300x226.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/ContainerVersusElement-768x577.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>That was the theory, but here is the practice. Which strategy is used by <span style=\"font-family: courier new,courier;\">std::fill<\/span>?<\/p>\n<\/p>\n<h2>std::fill<\/h2>\n<p><a href=\"http:\/\/en.cppreference.com\/w\/cpp\/algorithm\/fill\"> std::fill&nbsp;<\/a>assigns each element in the range a value. The listing shows a simple implementation.<\/p>\n<div style=\"background: #ffffff none repeat scroll 0% 0%; overflow: auto; width: auto; border-width: 0.1em 0.1em 0.1em 0.8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0px; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0px; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ fill.cpp<\/span>\r\n \r\n<span style=\"color: #0000ff;\">#include &lt;cstring&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;type_traits&gt;<\/span>\r\n\r\n<span style=\"color: #0000ff;\">namespace<\/span> my{\r\n\r\n  <span style=\"color: #0000ff;\">template<\/span> &lt;<span style=\"color: #0000ff;\">typename<\/span> I, <span style=\"color: #0000ff;\">typename<\/span> T, <span style=\"color: #2b91af;\">bool<\/span> b&gt;\r\n  <span style=\"color: #2b91af;\">void<\/span> fill_impl(I first, I last, <span style=\"color: #0000ff;\">const<\/span> T&amp; val, <span style=\"color: #0000ff;\">const<\/span> std::integral_constant&lt;<span style=\"color: #2b91af;\">bool<\/span>, b&gt;&amp;){\r\n    <span style=\"color: #0000ff;\">while<\/span>(first != last){\r\n      *first = val;\r\n      ++first;\r\n    }\r\n  }\r\n\r\n  <span style=\"color: #0000ff;\">template<\/span> &lt;<span style=\"color: #0000ff;\">typename<\/span> T&gt;\r\n  <span style=\"color: #2b91af;\">void<\/span> fill_impl(T* first, T* last, <span style=\"color: #0000ff;\">const<\/span> T&amp; val, <span style=\"color: #0000ff;\">const<\/span> std::true_type&amp;){\r\n    std::memset(first, val, last-first);\r\n  }\r\n\r\n  <span style=\"color: #0000ff;\">template<\/span> &lt;<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">I<\/span>, <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">T<\/span>&gt;\r\n  <span style=\"color: #0000ff;\">inline<\/span> <span style=\"color: #2b91af;\">void<\/span> fill(I first, I last, <span style=\"color: #0000ff;\">const<\/span> T&amp; val){\r\n    <span style=\"color: #008000;\">\/\/ typedef std::integral_constant&lt;bool,std::has_trivial_copy_assign&lt;T&gt;::value &amp;&amp; (sizeof(T) == 1)&gt; boolType;<\/span>\r\n    <span style=\"color: #0000ff;\">typedef<\/span> std::integral_constant&lt;<span style=\"color: #2b91af;\">bool<\/span>,std::is_trivially_copy_assignable&lt;T&gt;::value &amp;&amp; (<span style=\"color: #0000ff;\">sizeof<\/span>(T) == 1)&gt; boolType;\r\n    fill_impl(first, last, val, boolType());\r\n  }\r\n}\r\n\r\n<span style=\"color: #0000ff;\">const<\/span> <span style=\"color: #2b91af;\">int<\/span> arraySize = 100000000;\r\n<span style=\"color: #2b91af;\">char<\/span> charArray1[arraySize]= {0,};\r\n<span style=\"color: #2b91af;\">char<\/span> charArray2[arraySize]= {0,};\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  my::fill(charArray1, charArray1 + arraySize,1);\r\n  <span style=\"color: #0000ff;\">auto<\/span> last=  std::chrono::system_clock::now() - begin;\r\n  std::cout &lt;&lt;  <span style=\"color: #a31515;\">\"charArray1: \"<\/span> &lt;&lt; std::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt;(last).count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n\r\n  begin= std::chrono::system_clock::now();\r\n  my::fill(charArray2, charArray2 + arraySize, <span style=\"color: #0000ff;\">static_cast<\/span>&lt;<span style=\"color: #2b91af;\">char<\/span>&gt;(1));\r\n  last=  std::chrono::system_clock::now() - begin;\r\n  std::cout &lt;&lt;  <span style=\"color: #a31515;\">\"charArray2: \"<\/span> &lt;&lt; std::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt;(last).count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-family: courier new,courier;\">my::fill<\/span> make in line 27 the decision which implementation of <span style=\"font-family: courier new,courier;\">my::fill_impl<\/span> is applied. To use the optimized variant, the elements should have a compiler-generated copy assignment operator <span style=\"font-family: courier new,courier;\">std::is_trivially_copy_assignable&lt;T&gt;<\/span> and should be 1 byte large: <span style=\"font-family: courier new,courier;\">sizeof(T) == 1. <\/span>The function<span style=\"font-family: courier new,courier;\"> std::is_trivially_copy_assignable <\/span>is part of the type-traits. I explain in the post <a href=\"https:\/\/www.modernescpp.com\/index.php\/check-types\">Check types<\/a> the magic behind the type-traits functions.<\/p>\n<p>My GCC 4.8 calls instead of the function <span style=\"font-family: courier new,courier;\">std::is_trivially_copy_assignable<\/span>&nbsp;<span style=\"font-family: courier new,courier;\">std::has_trivial_copy_assign<\/span>. If you request the copy assignment operator with the keyword default from the compiler, the operator will be trivial. <span style=\"font-family: courier new,courier;\"><\/span><\/p>\n<div style=\"background: #ffffff none repeat scroll 0% 0%; overflow: auto; width: auto; border-width: 0.1em 0.1em 0.1em 0.8em;\">\n<pre style=\"margin: 0px; line-height: 125%;\"><span style=\"color: #0000ff;\">struct<\/span> TrivCopyAssign{\r\n  TrivCopyAssign&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> TrivCopyAssign&amp; other)= <span style=\"color: #0000ff;\">default<\/span>;\r\n};<br \/>\r\n<\/pre>\n<\/div>\n<p>Back to the code example. If the expression <span style=\"font-family: courier new,courier;\">boolType()<\/span> in line 27 is <span style=\"font-family: courier new,courier;\">true,<\/span> the optimized version of <span style=\"font-family: courier new,courier;\">my::fill_impl<\/span> in lines 18 &#8211; 21 will be used. This variant fills in opposite to the generic variant <span style=\"font-family: courier new,courier;\">my::fill_impl<\/span> (line 10 -16) the entire memory area &#8211; consisting of 100 million entries &#8211; with the value 1.&nbsp; <span style=\"font-family: courier new,courier;\">sizeof(char)<\/span> is 1.<\/p>\n<p>What about the performance of the program? I compiled the program without optimization. The execution of the optimized variant is about 3 times faster on windows; about 20 times faster on Linux.<\/p>\n<h3>Microsoft Visual 15<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5034\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/fillWindows.PNG\" alt=\"fillWindows\" width=\"500\" height=\"141\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/fillWindows.PNG 661w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/fillWindows-300x85.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h3>GCC 4.8<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5035\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/filLLinux.png\" alt=\"filLLinux\" width=\"431\" height=\"196\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/filLLinux.png 431w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/filLLinux-300x136.png 300w\" sizes=\"auto, (max-width: 431px) 100vw, 431px\" \/><\/p>\n<p>The decision of which variant of an algorithm should be used is sometimes not so easy to make.<\/p>\n<h2>std::equal<\/h2>\n<p>The implementer of<span style=\"font-family: courier new,courier;\"> std::equal<\/span> had special humor because he called his decision criteria <span style=\"font-family: courier new,courier;\">__simple.<\/span> The code is copied from the GCC 4.8 STL implementation.<\/p>\n<div style=\"background: #ffffff none repeat scroll 0% 0%; overflow: auto; width: auto; border-width: 0.1em 0.1em 0.1em 0.8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0px; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0px; line-height: 125%;\"><span style=\"color: #0000ff;\">template<\/span>&lt;<span style=\"color: #0000ff;\">typename<\/span> _II1, <span style=\"color: #0000ff;\">typename<\/span> _II2&gt;\r\n<span style=\"color: #0000ff;\">inline<\/span> <span style=\"color: #2b91af;\">bool<\/span> __equal_aux(_II1 __first1, _II1 __last1, _II2 __first2){\r\n  <span style=\"color: #0000ff;\">typedef<\/span> <span style=\"color: #0000ff;\">typename<\/span> iterator_traits&lt;_II1&gt;::value_type _ValueType1;\r\n  <span style=\"color: #0000ff;\">typedef<\/span> <span style=\"color: #0000ff;\">typename<\/span> iterator_traits&lt;_II2&gt;::value_type _ValueType2;\r\n  <span style=\"color: #0000ff;\">const<\/span> <span style=\"color: #2b91af;\">bool<\/span> __simple = ((__is_integer&lt;_ValueType1&gt;::__value\r\n                         || __is_pointer&lt;_ValueType1&gt;::__value )\r\n                        &amp;&amp; __is_pointer&lt;_II1&gt;::__value\r\n                        &amp;&amp; __is_pointer&lt;_II2&gt;::__value\r\n                        &amp;&amp;__are_same&lt;_ValueType1, _ValueType2&gt;::__value\r\n                        );\r\n  <span style=\"color: #0000ff;\">return<\/span> std::__equal&lt;__simple&gt;::equal(__first1, __last1, __first2);\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I have a different perception of<span style=\"font-family: courier new,courier;\"> __simple<\/span>. To use the optimized variant of<span style=\"font-family: courier new,courier;\"> std::equal<\/span>, the container elements have to fulfil some assurances. The elements of the container have to be of the same type (line 9) and have to be an integral or a pointer (line 5 and 6). In addition, the iterators have to be pointers (lines 7 and 8).<span style=\"font-family: courier new,courier;\"><br \/><\/span><\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>They didn&#8217;t make it in the C++98 standard. But we have them in C++11: hash tables. The official name is an unordered associative container. Unofficially, they are often called dictionaries. They promise one important feature: performance because their access time is constant in the optimal case.<\/p>\n<p>Why we need the unordered associative container? What makes them different from the C++98 ordered associate containers (<span style=\"font-family: courier new,courier;\">std::map, std::set, std::multimap, <\/span>and<span style=\"font-family: courier new,courier;\"> std::multiset<\/span>)? That&#8217;s the story of the<a href=\"https:\/\/www.modernescpp.com\/index.php\/hash-tables\"> next post.<\/a><span id=\"transmark\"><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p><strong><br \/><\/strong><span class=\"h3\"><\/span><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<\/p>\n<p><span style=\"font-family: courier new,courier;\"><\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>If you look carefully, you see type-traits have a big optimization potential. The type-traits support the first step to analyze the code at the compile-time and, in the second step, to optimize the code based on that analysis. How is that possible? Dependent on the type of variable, a faster variant of an algorithm will [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":5033,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[364],"tags":[419],"class_list":["post-5036","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-embedded","tag-type-traits"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5036","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=5036"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5036\/revisions"}],"predecessor-version":[{"id":6924,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5036\/revisions\/6924"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/5033"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=5036"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=5036"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=5036"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}