{"id":4944,"date":"2016-09-11T08:53:37","date_gmt":"2016-09-11T08:53:37","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/my-conclusion-summation-of-a-vector-in-three-variants\/"},"modified":"2023-06-26T12:43:18","modified_gmt":"2023-06-26T12:43:18","slug":"my-conclusion-summation-of-a-vector-in-three-variants","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/my-conclusion-summation-of-a-vector-in-three-variants\/","title":{"rendered":"My Conclusion: Summation of a Vector in three Variants"},"content":{"rendered":"<p>After I&#8217;ve calculated in three different ways the sum of a <span style=\"font-family: courier new,courier;\">std::vector<\/span> I want to draw my conclusions.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_three_strategies\"><\/span>The three strategies<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>At first, all numbers are in an overview. First, the <a href=\"https:\/\/www.modernescpp.com\/index.php\/single-threaded-sum-of-the-elements-of-a-vector\">single-threaded <\/a>variant; second, the <a href=\"https:\/\/www.modernescpp.com\/index.php\/multithreaded-summation-of-a-vector\">multiple threads with a shared summation variable<\/a>; last, the <a href=\"https:\/\/www.modernescpp.com\/index.php\/multithreaded-summation-with-minimal-synchronization\">multiple threads with minimal synchronization<\/a>. I have to admit that I was astonished by the last variant.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Single-threaded_1\"><\/span><span style=\"color: #3366ff;\"><\/span>Single-threaded<span style=\"color: #3366ff;\"><a href=\"https:\/\/www.modernescpp.com\/index.php\/blog\/schnelle-addition-in-einem-thread\"> (1)<\/a><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4902\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/SingleThreadedAdditionEng.png\" alt=\"SingleThreadedAdditionEng\" width=\"607\" height=\"276\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/SingleThreadedAdditionEng.png 607w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/SingleThreadedAdditionEng-300x136.png 300w\" sizes=\"auto, (max-width: 607px) 100vw, 607px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Multiple_threads_with_a_shared_summation_variable_2\"><\/span><span style=\"color: #3366ff;\"><\/span>Multiple threads with a shared summation variable <span style=\"color: #3366ff;\"><a href=\"https:\/\/www.modernescpp.com\/index.php\/blog\/multithreaded-additon-mit-einer-geteilten-variable\">(2)<\/a><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4920\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultithraedeSharedVariableEng.png\" alt=\"MultithraedeSharedVariableEng\" width=\"700\" height=\"186\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultithraedeSharedVariableEng.png 881w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultithraedeSharedVariableEng-300x80.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultithraedeSharedVariableEng-768x204.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Multiple_threads_with_minimal_synchronization_3\"><\/span><span style=\"color: #3366ff;\"><\/span>Multiple threads with minimal synchronization <span style=\"color: #3366ff;\"><a href=\"https:\/\/www.modernescpp.com\/index.php\/blog\/multithreaded-addition-in-unabhaengigen-threads\">(3)<\/a><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4941\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng.png\" alt=\"MultipleThreadsEng\" width=\"700\" height=\"196\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng.png 868w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng-300x84.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng-768x215.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"My_observations\"><\/span>My observations<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>For simplicity reasons, I will only reason about Linux. Thanks to Andreas Sch\u00e4fer (<a href=\"https:\/\/plus.google.com\/u\/0\/+AndreasSch%C3%A4fer_gentryx\" class=\"moz-txt-link-freetext\">https:\/\/plus.google.com\/u\/0\/+AndreasSch%C3%A4fer_gentryx<\/a>), who gave me more profound insight.&nbsp;<\/p>\n<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Single_threaded\"><\/span>Single threaded<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The range-based for-loop and the STL algorithm std::accumulate are in the same league. This observation holds for the maximal optimized and non-optimized programs. Interestingly, the maximally optimized version is about 30 times faster than the non-optimized version. The compiler uses for the summation in case of the optimized version&nbsp;vectorized instruction (SSE or AVX). Therefore, the loop counter will be increased by 2 (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Streaming_SIMD_Extensions\">SSE<\/a>) or 4 (AVC).<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Multiple_threads_with_a_shared_summation_variable\"><\/span>Multiple threads with a shared summation variable<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The synchronization on each access to the shared variable <span style=\"color: #3366ff;\"><strong>(2)<\/strong><\/span> shows on point: Synchronization is expensive. Although I break the sequential consistency with the relaxed semantics, the program is about 40 times slower than the pendants <span style=\"color: #3366ff;\"><strong>(1)<\/strong> <\/span>or <span style=\"color: #3366ff;\"><strong>(3). <\/strong><span style=\"color: #000000;\">Not only out of performance reasons, but our goal must also be to minimize the synchronization of the shared variable.<\/span><\/span><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Multiple_threads_with_minimal_synchronization\"><\/span>Multiple threads with minimal synchronization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The summation with minimal synchronized threads (4 atomic operations or locks) <span style=\"color: #3366ff;\"><strong>(3)<\/strong><\/span>&nbsp;is hardly faster than the range-based for-loop or <span style=\"font-family: courier new,courier;\">std::accumulate <span style=\"color: #3366ff;\"><strong>(1)<\/strong><\/span><\/span>. However, that holds in the multithreading variant, where four threads can work independently on four cores. That surprised me because I was expecting a nearly fourfold improvement. But what surprised me, even more was that my four cores were not fully utilized.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4943\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadUtilization.png\" alt=\"threadUtilization\" width=\"800\" height=\"169\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadUtilization.png 851w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadUtilization-300x63.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadUtilization-768x162.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The reason is simple. The cores can&#8217;t get the data fast enough from memory. Or to say it the other way around. The memory slows down the cores.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"My_conclusion\"><\/span>My conclusion<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>My conclusion from the performance measurements is to use for such a simple operation std::accumulate. That&#8217;s for two reasons. First, the <em>performance boost<\/em> of variant <span style=\"color: #00ccff;\"><strong><span style=\"color: #3366ff;\">(3) <\/span><\/strong><span style=\"color: #3366ff;\"><span style=\"color: #000000;\">doesn&#8217;t justify the expense; second, C++ will have in C++17 a parallel version of <span style=\"font-family: courier new,courier;\">std::accumulate. <\/span>Therefore, switching from the sequential to the parallel version is very easy.&nbsp; <\/span> <\/span><span style=\"color: #3366ff;\"> <\/span><\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Whats_next\"><\/span>What&#8217;s next?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The time library does not belong to the multithreading library, but it&#8217;s an essential component of the multithreading capabilities of C++. For example, you have to wait for an absolute time for a lock or put your thread for a relative time to sleep. So in the <a href=\"https:\/\/www.modernescpp.com\/index.php\/the-time-library\">next post, <\/a>I will write about time.<\/p>\n<p>&nbsp;<\/p>\n<p>.&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After I&#8217;ve calculated in three different ways the sum of a std::vector I want to draw my conclusions.<\/p>\n","protected":false},"author":21,"featured_media":4902,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[369],"tags":[434,470,504,519],"class_list":["post-4944","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-application","tag-atomics","tag-performance","tag-relaxed-semantics","tag-sequential-consistency"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4944","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4944"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4944\/revisions"}],"predecessor-version":[{"id":6948,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4944\/revisions\/6948"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4902"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4944"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4944"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4944"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}