{"id":5424,"date":"2018-04-13T11:39:05","date_gmt":"2018-04-13T11:39:05","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/c-core-guidelines-the-remaining-rules-to-performance\/"},"modified":"2023-06-26T11:52:47","modified_gmt":"2023-06-26T11:52:47","slug":"c-core-guidelines-the-remaining-rules-to-performance","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/c-core-guidelines-the-remaining-rules-to-performance\/","title":{"rendered":"C++ Core Guidelines: The Remaining Rules about Performance"},"content":{"rendered":"<p>Today, I will write about the remaining ten rules of performance. Ten rules seem to be a lot, but only two have content.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5408\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/03\/cheetah.jpg\" alt=\"\" style=\"display: block; margin-left: auto; margin-right: auto;\" data-alt=\"cheetah\" width=\"640\" height=\"407\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/03\/cheetah.jpg 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/03\/cheetah-300x191.jpg 300w\" sizes=\"auto, (max-width: 640px) 100vw, 640px\" \/><\/p>\n<p>Here are the remaining rules to performance:<\/p>\n<ul>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-Comp\">Per.11: Move computation from run time to compile time<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-alias\">Per.12: Eliminate redundant aliases<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-indirect\">Per.13: Eliminate redundant indirections<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-alloc\">Per.14: Minimize the number of allocations and deallocations<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-alloc0\">Per.15: Do not allocate on a critical branch<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-compact\">Per.16: Use compact data structures<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-struct\">Per.17: Declare the most-used member of a time-critical struct first<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-space\">Per.18: Space is time<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-access\">Per.19: Access memory predictably<\/a><\/li>\n<li><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-context\">Per.30: Avoid context switches on the critical path<\/a><\/li>\n<\/ul>\n<p>Let&#8217;s have a closer look at rule Per.11 and rule Per.19.<\/p>\n<h3><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-Comp\">Per.11: Move computation from run time to compile time<\/a><\/h3>\n<p>The example shows the simple gcd algorithm calculates the greatest common divisior at runtime. gcd uses for its calculation the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Euclidean_algorithm\">euclidean algorithm<\/a>.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #007788; font-weight: bold;\">int<\/span> <span style=\"color: #cc00ff;\">gcd<\/span>(<span style=\"color: #007788; font-weight: bold;\">int<\/span> a, <span style=\"color: #007788; font-weight: bold;\">int<\/span> b){\r\n    <span style=\"color: #006699; font-weight: bold;\">while<\/span> (b <span style=\"color: #555555;\">!=<\/span> <span style=\"color: #ff6600;\">0<\/span>){\r\n        <span style=\"color: #006699; font-weight: bold;\">auto<\/span> t<span style=\"color: #555555;\">=<\/span> b;\r\n        b<span style=\"color: #555555;\">=<\/span> a <span style=\"color: #555555;\">%<\/span> b;\r\n        a<span style=\"color: #555555;\">=<\/span> t;\r\n    }\r\n    <span style=\"color: #006699; font-weight: bold;\">return<\/span> a;\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>By declaring <span style=\"font-family: courier\\ new, courier;\">gcd<\/span> as <span style=\"font-family: courier\\ new, courier;\">constexpr<\/span> <span style=\"font-family: courier\\ new, courier;\">gcd<\/span> can quite easily make a&nbsp; function that can run at compile time. There are only a few restrictions on <span style=\"font-family: courier\\ new, courier;\">constexpr<\/span> functions in C++14. <span style=\"font-family: courier\\ new, courier;\">gcd<\/span> must not use static or <span style=\"font-family: courier\\ new, courier;\">thread_local<\/span> variables, exception handling, or goto statements, and all variables must be initialized. Additionally, variables have to be of<a href=\"http:\/\/en.cppreference.com\/w\/cpp\/concept\/LiteralType\"> literal type. <\/a><\/p>\n<p>Let&#8217;s try it out.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0099ff; font-style: italic;\">\/\/ gcd.cpp<\/span>\r\n\r\n<span style=\"color: #009999;\">#include &lt;iostream&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #007788; font-weight: bold;\">int<\/span> <span style=\"color: #cc00ff;\">gcd<\/span>(<span style=\"color: #007788; font-weight: bold;\">int<\/span> a, <span style=\"color: #007788; font-weight: bold;\">int<\/span> b){\r\n    <span style=\"color: #006699; font-weight: bold;\">while<\/span> (b <span style=\"color: #555555;\">!=<\/span> <span style=\"color: #ff6600;\">0<\/span>){\r\n        <span style=\"color: #006699; font-weight: bold;\">auto<\/span> t<span style=\"color: #555555;\">=<\/span> b;\r\n        b<span style=\"color: #555555;\">=<\/span> a <span style=\"color: #555555;\">%<\/span> b;\r\n        a<span style=\"color: #555555;\">=<\/span> t;\r\n    }\r\n    <span style=\"color: #006699; font-weight: bold;\">return<\/span> a;\r\n}\r\n\r\n<span style=\"color: #007788; font-weight: bold;\">int<\/span> <span style=\"color: #cc00ff;\">main<\/span>(){\r\n\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n    constexpr <span style=\"color: #006699; font-weight: bold;\">auto<\/span> res <span style=\"color: #555555;\">=<\/span> gcd(<span style=\"color: #ff6600;\">121<\/span>, <span style=\"color: #ff6600;\">11<\/span>);            <span style=\"color: #006699; font-weight: bold;\">\/\/ (1)<\/span>\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"gcd(121, 11) = \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> res <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n    <span style=\"color: #006699; font-weight: bold;\">auto<\/span> val <span style=\"color: #555555;\">=<\/span> <span style=\"color: #ff6600;\">121<\/span>;                               <span style=\"color: #006699; font-weight: bold;\">\/\/ (3)<\/span>\r\n    <span style=\"color: #006699; font-weight: bold;\">auto<\/span> res2 <span style=\"color: #555555;\">=<\/span> gcd(val, <span style=\"color: #ff6600;\">11<\/span>);                     <span style=\"color: #006699; font-weight: bold;\">\/\/ (2)<\/span>\r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"gcd(val, 11) = \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> res2 <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n  \r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Declaring <span style=\"font-family: courier\\ new, courier;\">gcd<\/span> to a <span style=\"font-family: courier\\ new, courier;\">constexpr<\/span> function does not mean that it has to run at compile time. It means that <span style=\"font-family: courier\\ new, courier;\">gcd<\/span> has the potential to run at compile time. A constexpr function has to be executed at compile if used in a constant expression. (1) is a constant expression because I ask for the result with a <span style=\"font-family: courier\\ new, courier;\">contexpr<\/span> variable <span style=\"font-family: courier\\ new, courier;\">res&nbsp;<\/span>(2). (3) is not a constant expression because <span style=\"font-family: courier\\ new, courier;\">res2<\/span> is not. When I change <span style=\"font-family: courier\\ new, courier;\">res2<\/span> to <span style=\"font-family: courier\\ new, courier;\">constexpr auto res2<\/span>, I will get an error: <span style=\"font-family: courier\\ new, courier;\">val<\/span> is not a constant expression. Here is the output of the program.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4980\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/10\/gcd.png\" alt=\"gcd\" width=\"300\" height=\"187\" style=\"display: block; margin-left: auto; margin-right: auto;\" \/><\/p>\n<p>Once more, here is the critical observation. <strong>You can use a <span style=\"font-family: courier\\ new, courier;\">constexpr<\/span> function at runtime and at compile time. To use it at compile, its arguments have to be constant expressions.<\/strong><\/p>\n<p>Don&#8217;t believe that line (1) will be executed at compile time? Here is the proof: the assembler instructions to line (1) generated from GCC 7.3 with maximum optimization level. I created the output with the help of the <a href=\"https:\/\/godbolt.org\/\">Compiler Explorer<\/a> from Matt Godbolt.&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5418\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/gcdAssembler.png\" alt=\"gcdAssembler\" style=\"display: block; margin-left: auto; margin-right: auto;\" width=\"599\" height=\"60\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/gcdAssembler.png 599w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/gcdAssembler-300x30.png 300w\" sizes=\"auto, (max-width: 599px) 100vw, 599px\" \/><\/p>\n<p>The function call&nbsp;<span style=\"font-family: courier\\ new, courier;\">gcd(121, 11)<\/span> boils down to its result: 11.<\/p>\n<p>Templates are often used to decide at compile time. There is an excellent example of this idea in the C++ Core Guidelines. A popular technique is to provide a handle for storing small objects on the stack and big objects on the heap. Here is an example:<\/p>\n<p>&nbsp;<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #006699; font-weight: bold;\">template<\/span><span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #006699; font-weight: bold;\">typename<\/span> T<span style=\"color: #555555;\">&gt;<\/span>\r\n<span style=\"color: #006699; font-weight: bold;\">struct<\/span> Scoped {     <span style=\"color: #0099ff; font-style: italic;\">\/\/ store a T in Scoped<\/span>\r\n        <span style=\"color: #0099ff; font-style: italic;\">\/\/ ...<\/span>\r\n    T obj;\r\n};\r\n\r\n<span style=\"color: #006699; font-weight: bold;\">template<\/span><span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #006699; font-weight: bold;\">typename<\/span> T<span style=\"color: #555555;\">&gt;<\/span>\r\n<span style=\"color: #006699; font-weight: bold;\">struct<\/span> On_heap {    <span style=\"color: #0099ff; font-style: italic;\">\/\/ store a T on the free store<\/span>\r\n        <span style=\"color: #0099ff; font-style: italic;\">\/\/ ...<\/span>\r\n        T<span style=\"color: #555555;\">*<\/span> objp;\r\n};\r\n\r\n<span style=\"color: #006699; font-weight: bold;\">template<\/span><span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #006699; font-weight: bold;\">typename<\/span> T<span style=\"color: #555555;\">&gt;<\/span>\r\n<span style=\"color: #006699; font-weight: bold;\">using<\/span> Handle <span style=\"color: #555555;\">=<\/span> <span style=\"color: #006699; font-weight: bold;\">typename<\/span> std<span style=\"color: #555555;\">::<\/span>conditional<span style=\"color: #555555;\">&lt;<\/span>(<span style=\"color: #006699; font-weight: bold;\">sizeof<\/span>(T) <span style=\"color: #555555;\">&lt;=<\/span> on_stack_max),  <span style=\"color: #0099ff; font-style: italic;\">\/\/ (1)<\/span>\r\n                    Scoped<span style=\"color: #555555;\">&lt;<\/span>T<span style=\"color: #555555;\">&gt;<\/span>,      <span style=\"color: #0099ff; font-style: italic;\">\/\/ first alternative                  (2)<\/span>\r\n                    On_heap<span style=\"color: #555555;\">&lt;<\/span>T<span style=\"color: #555555;\">&gt;<\/span>      <span style=\"color: #0099ff; font-style: italic;\">\/\/ second alternative                 (3)<\/span>\r\n               <span style=\"color: #555555;\">&gt;::<\/span>type;\r\n\r\n<span style=\"color: #007788; font-weight: bold;\">void<\/span> <span style=\"color: #cc00ff;\">f<\/span>()\r\n{\r\n    Handle<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">double<\/span><span style=\"color: #555555;\">&gt;<\/span> v1;                   <span style=\"color: #0099ff; font-style: italic;\">\/\/ the double goes on the stack<\/span>\r\n    Handle<span style=\"color: #555555;\">&lt;<\/span>std<span style=\"color: #555555;\">::<\/span>array<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">double<\/span>, <span style=\"color: #ff6600;\">200<\/span><span style=\"color: #555555;\">&gt;&gt;<\/span> v2;  <span style=\"color: #0099ff; font-style: italic;\">\/\/ the array goes on the free store<\/span>\r\n    <span style=\"color: #0099ff; font-style: italic;\">\/\/ ...<\/span>\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>How does it work? <span style=\"font-family: courier\\ new, courier;\">std::conditional<\/span> in line (1) is a ternary operator from the <a href=\"http:\/\/en.cppreference.com\/w\/cpp\/header\/type_traits\">type-traits library. <\/a> In contrast to the ternary operator at runtime (<code>a&nbsp;? b&nbsp;: c<\/code>)&nbsp;<span style=\"font-family: courier\\ new, courier;\">std::condition<\/span> will be executed at compile time. This means if&nbsp;<span style=\"font-family: courier\\ new, courier;\">std<span style=\"color: #555555;\">::<\/span>conditional<span style=\"color: #555555;\">&lt;<\/span>(<span style=\"color: #006699; font-weight: bold;\">sizeof<\/span>(T) <span style=\"color: #555555;\">&lt;=<\/span> on_stack_max) is <\/span>evaluated to true, the first alternative is used at runtime. If not,&nbsp; the second alternative.&nbsp;<\/p>\n<\/p>\n<h3><a href=\"http:\/\/isocpp.github.io\/CppCoreGuidelines\/CppCoreGuidelines#Rper-access\">Per.19: Access memory predictably<\/a><\/h3>\n<p>What does that mean? If you read an <span style=\"font-family: courier\\ new, courier;\">int<\/span> from memory, more than the size of one <span style=\"font-family: courier\\ new, courier;\">int<\/span> is read from memory. An entire cache line is read from memory and stored in a cache. On modern architectures, a cache line typically has 64 bytes. If you now request an additional variable from memory and this variable is on the previous cache, the read directly uses this cache, and the operation is much faster.<\/p>\n<p>A data structure such as <span style=\"font-family: courier\\ new, courier;\">std::vector<\/span>, which stores its data in a contiguous memory block, is a cache line data structure because each element in the cache line will typically be used. A std::vector has a size and a capacity and can grow only in one direction. The capacity is greater than its size and indicates when it is necessary to allocate memory. This argumentation also applies to&nbsp;<span style=\"font-family: 'courier new', courier;\">std::vector<\/span> and <span style=\"font-family: 'courier new', courier;\">std::array<\/span>, although std::array has no capacity.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5419\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector.png\" alt=\"vector\" width=\"500\" height=\"86\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector.png 2195w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector-300x52.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector-1024x176.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector-768x132.png 768w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector-1536x264.png 1536w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/vector-2048x352.png 2048w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>My argumentation to <span style=\"font-family: 'courier new', courier;\">std::vector<\/span> will not hold for a <span style=\"font-family: courier\\ new, courier;\">std::list<\/span> or a <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span>. Both containers consist of nodes that are double or single-linked.<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5420\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list.png\" alt=\"list\" width=\"500\" height=\"35\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list.png 2249w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list-300x21.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list-1024x72.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list-768x54.png 768w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list-1536x108.png 1536w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/list-2048x144.png 2048w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>The std::forward_list can only grow in one direction.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5421\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list.png\" alt=\"forward list\" width=\"500\" height=\"38\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list.png 2079w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list-300x23.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list-1024x78.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list-768x58.png 768w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list-1536x117.png 1536w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/forward_list-2048x156.png 2048w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p><span style=\"font-family: courier\\ new, courier;\">std::deque<\/span> is in between because it is a double-linked list of small arrays.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5380\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque.png\" alt=\"deque\" width=\"500\" height=\"37\" style=\"display: block; margin-left: auto; margin-right: auto;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque.png 2174w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque-300x22.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque-1024x76.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque-768x57.png 768w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque-1536x114.png 1536w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/02\/deque-2048x152.png 2048w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>This was the theory of cache lines. Now I&#8217;m curious. Does it make a difference to read and accumulate all elements from <span style=\"font-family: courier\\ new, courier;\">std::vector, a std::deque, std::list, <span>and<\/span> std::forward_list<\/span>. The minor program should answer.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0099ff; font-style: italic;\">\/\/ memoryAccess.cpp<\/span>\r\n\r\n<span style=\"color: #009999;\">#include &lt;forward_list&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;deque&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;iomanip&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;list&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;string&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;vector&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;numeric&gt;<\/span>\r\n<span style=\"color: #009999;\">#include &lt;random&gt;<\/span>\r\n\r\n<span style=\"color: #006699; font-weight: bold;\">const<\/span> <span style=\"color: #007788; font-weight: bold;\">int<\/span> SIZE <span style=\"color: #555555;\">=<\/span> <span style=\"color: #ff6600;\">100<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">'<\/span><span style=\"color: #ff6600;\">000<\/span><span style=\"color: #aa0000; background-color: #ffaaaa;\">'<\/span><span style=\"color: #ff6600;\">000<\/span>; \r\n\r\n<span style=\"color: #006699; font-weight: bold;\">template<\/span> <span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #006699; font-weight: bold;\">typename<\/span> T<span style=\"color: #555555;\">&gt;<\/span>\r\n<span style=\"color: #007788; font-weight: bold;\">void<\/span> sumUp(T<span style=\"color: #555555;\">&amp;<\/span> t, <span style=\"color: #006699; font-weight: bold;\">const<\/span> std<span style=\"color: #555555;\">::<\/span>string<span style=\"color: #555555;\">&amp;<\/span> cont){            <span style=\"color: #0099ff; font-style: italic;\">\/\/ (5)<\/span>\r\n  \r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>fixed <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>setprecision(<span style=\"color: #ff6600;\">10<\/span>);\r\n\r\n  <span style=\"color: #006699; font-weight: bold;\">auto<\/span> begin<span style=\"color: #555555;\">=<\/span> std<span style=\"color: #555555;\">::<\/span>chrono<span style=\"color: #555555;\">::<\/span>steady_clock<span style=\"color: #555555;\">::<\/span>now();\r\n  std<span style=\"color: #555555;\">::<\/span><span style=\"color: #007788; font-weight: bold;\">size_t<\/span> res <span style=\"color: #555555;\">=<\/span> std<span style=\"color: #555555;\">::<\/span>accumulate(t.begin(), t.end(), <span style=\"color: #ff6600;\">0LL<\/span>);\r\n  std<span style=\"color: #555555;\">::<\/span>chrono<span style=\"color: #555555;\">::<\/span>duration<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">double<\/span><span style=\"color: #555555;\">&gt;<\/span> last<span style=\"color: #555555;\">=<\/span>  std<span style=\"color: #555555;\">::<\/span>chrono<span style=\"color: #555555;\">::<\/span>steady_clock<span style=\"color: #555555;\">::<\/span>now() <span style=\"color: #555555;\">-<\/span> begin;\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> cont <span style=\"color: #555555;\">&lt;&lt;<\/span>  std<span style=\"color: #555555;\">::<\/span>endl;\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"time: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> last.count() <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> <span style=\"color: #cc3300;\">\"res: \"<\/span> <span style=\"color: #555555;\">&lt;&lt;<\/span> res <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n \r\n  std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n     \r\n}\r\n\r\n<span style=\"color: #007788; font-weight: bold;\">int<\/span> main(){\r\n    \r\n    std<span style=\"color: #555555;\">::<\/span>cout <span style=\"color: #555555;\">&lt;&lt;<\/span> std<span style=\"color: #555555;\">::<\/span>endl;\r\n    \r\n    std<span style=\"color: #555555;\">::<\/span>random_device seed;                            <span style=\"color: #0099ff; font-style: italic;\">\/\/ (1)<\/span>\r\n    std<span style=\"color: #555555;\">::<\/span>mt19937 engine(seed());\r\n    std<span style=\"color: #555555;\">::<\/span>uniform_int_distribution<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span> dist(<span style=\"color: #ff6600;\">0<\/span>, <span style=\"color: #ff6600;\">100<\/span>);\r\n    std<span style=\"color: #555555;\">::<\/span>vector<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span> randNumbers;\r\n    randNumbers.reserve(SIZE);\r\n    <span style=\"color: #006699; font-weight: bold;\">for<\/span> (<span style=\"color: #007788; font-weight: bold;\">int<\/span> i<span style=\"color: #555555;\">=<\/span><span style=\"color: #ff6600;\">0<\/span>; i <span style=\"color: #555555;\">&lt;<\/span> SIZE; <span style=\"color: #555555;\">++<\/span>i){\r\n        randNumbers.push_back(dist(engine));\r\n    }\r\n    \r\n    {\r\n      std<span style=\"color: #555555;\">::<\/span>vector<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span> myVec(randNumbers.begin(), randNumbers.end());\r\n      sumUp(myVec,<span style=\"color: #cc3300;\">\"std::vector&lt;int&gt;\"<\/span>);                 <span style=\"color: #0099ff; font-style: italic;\">\/\/ (2)<\/span>\r\n    }\r\n\r\n    \r\n    {\r\n      std<span style=\"color: #555555;\">::<\/span>deque<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span>myDec(randNumbers.begin(), randNumbers.end());\r\n      sumUp(myDec,<span style=\"color: #cc3300;\">\"std::deque&lt;int&gt;\"<\/span>);                 <span style=\"color: #0099ff; font-style: italic;\">\/\/ (3)<\/span>\r\n    }\r\n    \r\n    {\r\n      std<span style=\"color: #555555;\">::<\/span>list<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span>myList(randNumbers.begin(), randNumbers.end());\r\n      sumUp(myList,<span style=\"color: #cc3300;\">\"std::list&lt;int&gt;\"<\/span>);                 <span style=\"color: #0099ff; font-style: italic;\">\/\/ (4)<\/span>\r\n    }\r\n    \r\n    {\r\n      std<span style=\"color: #555555;\">::<\/span>forward_list<span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #007788; font-weight: bold;\">int<\/span><span style=\"color: #555555;\">&gt;<\/span>myForwardList(randNumbers.begin(), randNumbers.end());\r\n      sumUp(myForwardList,<span style=\"color: #cc3300;\">\"std::forward_list&lt;int&gt;\"<\/span>);  <span style=\"color: #0099ff; font-style: italic;\">\/\/ (5)<\/span>\r\n    } \r\n    \r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The program <span style=\"font-family: courier\\ new, courier;\">memoryAccess.cpp<\/span>&nbsp;first creates 100 Million random numbers between 0 and 100 (1). Then it accumulates the elements using a <span style=\"font-family: courier\\ new, courier;\">std::vector<\/span> (2), a <span style=\"font-family: courier\\ new, courier;\">std::deque<\/span> (3), a <span style=\"font-family: courier\\ new, courier;\">std::list<\/span> (4), and a <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span> (5). The actual work is done in the function <span style=\"font-family: courier\\ new, courier;\">sumUp (<\/span>6). I assume that Linux and Windows use a pretty similar implementation of <span style=\"font-family: courier\\ new, courier;\">std::accumulate<\/span>; therefore, the access time of the elements is the dominant factor for the overall performance.<\/p>\n<div style=\"background: #f0f3f3; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #006699; font-weight: bold;\">template<\/span><span style=\"color: #555555;\">&lt;<\/span><span style=\"color: #006699; font-weight: bold;\">class<\/span> <span style=\"color: #00aa88; font-weight: bold;\">InputIt<\/span>, <span style=\"color: #006699; font-weight: bold;\">class<\/span> <span style=\"color: #00aa88; font-weight: bold;\">T<\/span><span style=\"color: #555555;\">&gt;<\/span>\r\nT accumulate(InputIt first, InputIt last, T init)\r\n{\r\n    <span style=\"color: #006699; font-weight: bold;\">for<\/span> (; first <span style=\"color: #555555;\">!=<\/span> last; <span style=\"color: #555555;\">++<\/span>first) {\r\n        init <span style=\"color: #555555;\">=<\/span> init <span style=\"color: #555555;\">+<\/span> <span style=\"color: #555555;\">*<\/span>first;\r\n    }\r\n    <span style=\"color: #006699; font-weight: bold;\">return<\/span> init;\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I compiled the program with maximum optimization and executed it on Linux and Windows. I&#8217;m not interested in the comparison between Linux and Windows because that would be a comparison between a desktop PC and a Laptop. I&#8217;m interested in the relative performance numbers of the four containers. Here are they:<img loading=\"lazy\" decoding=\"async\" class=\" alignright size-full wp-image-5422\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessWin.PNG\" alt=\"memoryAccessWin\" width=\"250\" height=\"371\" style=\"float: right;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessWin.PNG 505w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessWin-202x300.png 202w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" alignleft size-full wp-image-5423\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessLinux.png\" alt=\"memoryAccessLinux\" width=\"250\" height=\"368\" style=\"float: left;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessLinux.png 353w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2018\/04\/memoryAccessLinux-204x300.png 204w\" sizes=\"auto, (max-width: 250px) 100vw, 250px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>To get the big picture. Here are my observations:<\/p>\n<ul>\n<li><span style=\"font-family: courier\\ new, courier;\">std::vector&nbsp;<\/span>is 10 times faster than <span style=\"font-family: courier\\ new, courier;\">std::list<\/span> or <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span> on Linux and <span style=\"font-family: courier\\ new, courier;\">std::vector<\/span> is 30 times faster than <span style=\"font-family: courier\\ new, courier;\">std::list<\/span> or <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span> on Windows.<\/li>\n<li><span style=\"font-family: courier\\ new, courier;\">std::vector<\/span> is 1.5 times faster than <span style=\"font-family: courier\\ new, courier;\">std::deque<\/span> on Linux, but&nbsp; <span style=\"font-family: courier\\ new, courier;\">std::vecto<\/span>r is 5 times faster than <span style=\"font-family: courier\\ new, courier;\">std::deque<\/span> on Windows. The reason may be that the double-linked small arrays are bigger on Linux than on Windows. The effect is that the access time is more vectors.<\/li>\n<li><span style=\"font-family: courier\\ new, courier;\">std::deque is<\/span> 6 times faster than <span style=\"font-family: courier\\ new, courier;\">std::list<\/span> and <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span>. This holds for Linux and Windows.<\/li>\n<li><span style=\"font-family: courier\\ new, courier;\">std::list<\/span> and <span style=\"font-family: courier\\ new, courier;\">std::forward_list<\/span> are in the same ballpark on Linux and Windows.<\/li>\n<\/ul>\n<p>I don&#8217;t want to overestimate my performance numbers, but one key observation is obvious. The more cache line aware the container is, the faster is the access time of the elements: <span style=\"font-family: courier\\ new, courier;\"><strong>std::vector &gt; std::deque &gt; (std::list, std::forward_list).<\/strong><\/span><\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>That was it to performance. In the <a href=\"https:\/\/goo.gl\/GpvmGT\">next post<\/a>, I will start to write about the concurrency rules. I&#8217;m pretty curious.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today, I will write about the remaining ten rules of performance. Ten rules seem to be a lot, but only two have content.<\/p>\n","protected":false},"author":21,"featured_media":5408,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[372],"tags":[470],"class_list":["post-5424","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-modern-c","tag-performance"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=5424"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5424\/revisions"}],"predecessor-version":[{"id":6831,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5424\/revisions\/6831"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/5408"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=5424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=5424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=5424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}