{"id":4942,"date":"2016-09-07T19:36:31","date_gmt":"2016-09-07T19:36:31","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/multithreaded-summation-with-minimal-synchronization\/"},"modified":"2023-06-26T12:43:40","modified_gmt":"2023-06-26T12:43:40","slug":"multithreaded-summation-with-minimal-synchronization","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/multithreaded-summation-with-minimal-synchronization\/","title":{"rendered":"Multithreaded: Summation with Minimal Synchronization"},"content":{"rendered":"<p>Until now, I&#8217;ve used two strategies to summate a <span style=\"font-family: courier new,courier;\">std::vector. <\/span>First, I did the whole math in one thread (<a href=\"https:\/\/www.modernescpp.com\/index.php\/single-threaded-sum-of-the-elements-of-a-vector\">Single Threaded: Summation of a vector<\/a>); second multiple threads shared the same variable for the result (<a href=\"https:\/\/www.modernescpp.com\/index.php\/multithreaded-summation-of-a-vector\">Multithreaded: Summation of a vector<\/a>). In particular, the second strategy was extremely naive. In this post, I will apply my knowledge of both posts. My goal is that the thread will perform their summation as independently from each other as possible and therefore reduce the synchronization overhead.&nbsp;<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p>To let the threads work independently and therefore minimize the synchronization, I have a few ideas in my mind. Local variables,<a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-local-data\"> thread-local data <\/a>but also<a href=\"https:\/\/www.modernescpp.com\/index.php\/tasks\"> tasks<\/a> should work. Now I&#8217;m curious.<\/p>\n<\/p>\n<h2><span class=\"ez-toc-section\" id=\"My_strategy\"><\/span>My strategy<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>My strategy keeps the same. As in my last post, I use my desktop PC with four cores and GCC and my laptop with two cores and cl.exe. I provide the results without and with maximum optimization. For the details, look here: <a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-a-singleton\">Thread-safe initialization of a singleton.<\/a>&nbsp;<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Local_variables\"><\/span>Local variables<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Since each thread has a local summation variable, it can do its job without synchronization. It&#8217;s only necessary to sum up the local summation variables. Adding the local results is a critical section that must be protected. This can be done in various ways. A quick remark before. Since only four addition takes place, it doesn&#8217;t matter so much from a performance perspective which synchronization I will use. But instead of my remark, I will use a <span style=\"font-family: courier new,courier;\">std::lock_guard<\/span> and an atomic with sequential consistency and relaxed semantics.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"std_lock_guard\"><\/span>std::lock_guard<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ localVariable.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;random&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;utility&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;vector&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> size= 100000000;   \r\n\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> firBound=  25000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> secBound=  50000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> thiBound=  75000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> fouBound= 100000000;\r\n\r\nstd::mutex myMutex;\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> sumUp(<span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span>&amp; sum, <span style=\"color: #0000ff;\">const<\/span> std::vector&lt;<span style=\"color: #2b91af;\">int<\/span>&gt;&amp; val, <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> beg, <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> end){\r\n    <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> tmpSum{};\r\n    <span style=\"color: #0000ff;\">for<\/span> (<span style=\"color: #0000ff;\">auto<\/span> i= beg; i &lt; end; ++i){\r\n        tmpSum += val[i];\r\n    }\r\n    std::lock_guard&lt;std::mutex&gt; lockGuard(myMutex);\r\n    sum+= tmpSum;\r\n}\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::vector&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; randValues;\r\n  randValues.reserve(size);\r\n\r\n  std::mt19937 engine;\r\n  std::uniform_int_distribution&lt;&gt; uniformDist(1,10);\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> i=0 ; i&lt; size ; ++i) randValues.push_back(uniformDist(engine));\r\n \r\n  <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> sum{}; \r\n  <span style=\"color: #0000ff;\">auto<\/span> start = std::chrono::system_clock::now();\r\n  \r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t1(sumUp,std::ref(sum),std::ref(randValues),0,firBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t2(sumUp,std::ref(sum),std::ref(randValues),firBound,secBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t3(sumUp,std::ref(sum),std::ref(randValues),secBound,thiBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t4(sumUp,std::ref(sum),std::ref(randValues),thiBound,fouBound);   \r\n  \r\n  t1.join();\r\n  t2.join();\r\n  t3.join();\r\n  t4.join();\r\n  \r\n  std::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; dur= std::chrono::system_clock::now() - start;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Time for addition \"<\/span> &lt;&lt; dur.count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Result: \"<\/span> &lt;&lt; sum &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Lines 25 and 26 are critical lines. Here the local summation results<span style=\"font-family: courier new,courier;\"> tmpSum<\/span> will be added to the global&nbsp;<span style=\"font-family: comic sans ms,sans-serif;\">sum.<\/span> What is the spot at which the examples with the local variables will vary?<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Without_optimization\"><\/span>Without optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4922\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariable.png\" alt=\"localVariable\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariable.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariable-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4923\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariablewin.png\" alt=\"localVariablewin\" width=\"450\" height=\"145\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariablewin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariablewin-300x97.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Maximum_optimization\"><\/span>Maximum optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4924\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOpt.png\" alt=\"localVariableOpt\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOpt.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOpt-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4925\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOptwin.png\" alt=\"localVariableOptwin\" width=\"450\" height=\"145\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOptwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableOptwin-300x97.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Atomic_operations_with_sequential_consistency\"><\/span>Atomic operations with sequential consistency<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>My first optimization is it to replace the by a <span style=\"font-family: courier new,courier;\">std::lock_guard<\/span> protected global summation <span style=\"font-family: courier new,courier;\">sum<\/span> variable with an atomic.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ localVariableAtomic.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;random&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;utility&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;vector&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> size= 100000000;   \r\n\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> firBound=  25000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> secBound=  50000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> thiBound=  75000000;\r\nconstexpr <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> fouBound= 100000000;\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> sumUp(std::atomic&lt;<span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span>&gt;&amp; sum, <span style=\"color: #0000ff;\">const<\/span> std::vector&lt;<span style=\"color: #2b91af;\">int<\/span>&gt;&amp; val, <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> beg, <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> end){\r\n    <span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">int<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> tmpSum{};\r\n    <span style=\"color: #0000ff;\">for<\/span> (<span style=\"color: #0000ff;\">auto<\/span> i= beg; i &lt; end; ++i){\r\n\t    tmpSum += val[i];\r\n    }\r\n    sum+= tmpSum;\r\n}\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::vector&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; randValues;\r\n  randValues.reserve(size);\r\n\r\n  std::mt19937 engine;\r\n  std::uniform_int_distribution&lt;&gt; uniformDist(1,10);\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span> i=0 ; i&lt; size ; ++i) randValues.push_back(uniformDist(engine));\r\n \r\n  std::atomic&lt;<span style=\"color: #2b91af;\">unsigned<\/span> <span style=\"color: #2b91af;\">long<\/span> <span style=\"color: #2b91af;\">long<\/span>&gt; sum{}; \r\n  <span style=\"color: #0000ff;\">auto<\/span> start = std::chrono::system_clock::now();\r\n  \r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t1(sumUp,std::ref(sum),std::ref(randValues),0,firBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t2(sumUp,std::ref(sum),std::ref(randValues),firBound,secBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t3(sumUp,std::ref(sum),std::ref(randValues),secBound,thiBound);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t4(sumUp,std::ref(sum),std::ref(randValues),thiBound,fouBound);   \r\n  \r\n  t1.join();\r\n  t2.join();\r\n  t3.join();\r\n  t4.join();\r\n  \r\n  std::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; dur= std::chrono::system_clock::now() - start;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Time for addition \"<\/span> &lt;&lt; dur.count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Result: \"<\/span> &lt;&lt; sum &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<h3><span class=\"ez-toc-section\" id=\"Without_optimization-2\"><\/span>Without optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4926\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomic.png\" alt=\"localVariableAtomic\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomic.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomic-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4927\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicwin.png\" alt=\"localVariableAtomicwin\" width=\"500\" height=\"161\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicwin-300x96.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicwin-768x245.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Maximum_optimization-2\"><\/span>Maximum optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h2><span class=\"ez-toc-section\" id=\"i\"><\/span>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4928\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOpt.png\" alt=\"localVariableAtomicOpt\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOpt.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOpt-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4929\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOptwin.png\" alt=\"localVariableAtomicOptwin\" width=\"500\" height=\"161\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOptwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicOptwin-300x97.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h2><span class=\"ez-toc-section\" id=\"Atomic_operations_with_relaxed_semantic\"><\/span>Atomic operations with relaxed semantic<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>We can do better. Instead of the default memory model of sequential consistency, I use relaxed semantics. That&#8217;s well-defined because it doesn&#8217;t matter in which order the additions in line 23 take place.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\">\/\/ localVariableAtomicRelaxed.cpp\r\n\r\n<span style=\"color: #008000;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;random&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;thread&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;utility&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;vector&gt;<\/span>\r\n\r\nconstexpr long long size= 100000000;   \r\n\r\nconstexpr long long firBound=  25000000;\r\nconstexpr long long secBound=  50000000;\r\nconstexpr long long thiBound=  75000000;\r\nconstexpr long long fouBound= 100000000;\r\n\r\nvoid sumUp(std::atomic&lt;unsigned long long&gt;&amp; sum, const std::vector&lt;int&gt;&amp; val, unsigned long long beg, unsigned long long end){\r\n    unsigned int long long tmpSum{};\r\n    <span style=\"color: #0000ff;\">for<\/span> (auto i= beg; i &lt; end; ++i){\r\n\t    tmpSum += val[i];\r\n    }\r\n    sum.fetch_add(tmpSum,std::memory_order_relaxed);\r\n}\r\n\r\nint main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::vector&lt;int&gt; randValues;\r\n  randValues.reserve(size);\r\n\r\n  std::mt19937 engine;\r\n  std::uniform_int_distribution&lt;&gt; uniformDist(1,10);\r\n  <span style=\"color: #0000ff;\">for<\/span> ( long long i=0 ; i&lt; size ; ++i) randValues.push_back(uniformDist(engine));\r\n \r\n  std::atomic&lt;unsigned long long&gt; sum{}; \r\n  auto start = std::chrono::system_clock::now();\r\n  \r\n  std::thread t1(sumUp,std::ref(sum),std::ref(randValues),0,firBound);\r\n  std::thread t2(sumUp,std::ref(sum),std::ref(randValues),firBound,secBound);\r\n  std::thread t3(sumUp,std::ref(sum),std::ref(randValues),secBound,thiBound);\r\n  std::thread t4(sumUp,std::ref(sum),std::ref(randValues),thiBound,fouBound);   \r\n  \r\n \r\n  t1.join();\r\n  t2.join();\r\n  t3.join();\r\n  t4.join();\r\n  std::chrono::duration&lt;double&gt; dur= std::chrono::system_clock::now() - start;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Time for addition \"<\/span> &lt;&lt; dur.count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Result: \"<\/span> &lt;&lt; sum &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Without_optimization-3\"><\/span>Without optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4930\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxed.png\" alt=\"localVariableAtomicRelaxed\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxed.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxed-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4931\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedwin.png\" alt=\"localVariableAtomicRelaxedwin\" width=\"500\" height=\"161\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedwin-300x97.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Maximum_optimization-3\"><\/span>Maximum optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4932\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOpt.png\" alt=\"localVariableAtomicRelaxedOpt\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOpt.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOpt-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4933\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOptwin.png\" alt=\"localVariableAtomicRelaxedOptwin\" width=\"500\" height=\"161\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOptwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/localVariableAtomicRelaxedOptwin-300x97.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>The following strategy is similar. But now I use thread local data.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Thread_local_data\"><\/span>Thread local data<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-local-data\">Thread local data<\/a> is data that each thread&nbsp;exclusively owns. They will be created when needed. Therefore, thread local data perfectly fit the local summation variable <span style=\"font-family: courier new,courier;\">tmpSum<\/span>.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\">\/\/ threadLocal.cpp\r\n\r\n<span style=\"color: #008000;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;random&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;thread&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;utility&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;vector&gt;<\/span>\r\n\r\nconstexpr long long size= 100000000;   \r\n\r\nconstexpr long long firBound=  25000000;\r\nconstexpr long long secBound=  50000000;\r\nconstexpr long long thiBound=  75000000;\r\nconstexpr long long fouBound= 100000000;\r\n\r\nthread_local unsigned long long tmpSum= 0;\r\n\r\nvoid sumUp(std::atomic&lt;unsigned long long&gt;&amp; sum, const std::vector&lt;int&gt;&amp; val, unsigned long long beg, unsigned long long end){\r\n    <span style=\"color: #0000ff;\">for<\/span> (auto i= beg; i &lt; end; ++i){\r\n        tmpSum += val[i];\r\n    }\r\n    sum.fetch_add(tmpSum,std::memory_order_relaxed);\r\n}\r\n\r\nint main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::vector&lt;int&gt; randValues;\r\n  randValues.reserve(size);\r\n\r\n  std::mt19937 engine;\r\n  std::uniform_int_distribution&lt;&gt; uniformDist(1,10);\r\n  <span style=\"color: #0000ff;\">for<\/span> ( long long i=0 ; i&lt; size ; ++i) randValues.push_back(uniformDist(engine));\r\n \r\n  std::atomic&lt;unsigned long long&gt; sum{}; \r\n  auto start = std::chrono::system_clock::now();\r\n  \r\n  std::thread t1(sumUp,std::ref(sum),std::ref(randValues),0,firBound);\r\n  std::thread t2(sumUp,std::ref(sum),std::ref(randValues),firBound,secBound);\r\n  std::thread t3(sumUp,std::ref(sum),std::ref(randValues),secBound,thiBound);\r\n  std::thread t4(sumUp,std::ref(sum),std::ref(randValues),thiBound,fouBound);   \r\n  \r\n  t1.join();\r\n  t2.join();\r\n  t3.join();\r\n  t4.join();\r\n  \r\n  std::chrono::duration&lt;double&gt; dur= std::chrono::system_clock::now() - start;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Time for addition \"<\/span> &lt;&lt; dur.count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Result: \"<\/span> &lt;&lt; sum &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I declare in line 18 the thread-local variable <span style=\"font-family: courier new,courier;\">tmpSum<\/span> and use it for the addition in lines 22 and 24. The small difference between the thread-local variable and the local variable in the previous programs is that the thread-local variable&#8217;s lifetime is bound to its thread&#8217;s lifetime. The lifetime of the local variable depends on its scope.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Without_optimization-4\"><\/span>Without optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4754\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/05\/threadLocal.png\" alt=\"threadLocal\" width=\"400\" height=\"182\" style=\"margin: 15px;\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4934\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalwin.png\" alt=\"threadLocalwin\" width=\"500\" height=\"162\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalwin-300x97.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/h3>\n<h3><span class=\"ez-toc-section\" id=\"Maximum_optimization-4\"><\/span>Maximum optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4935\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOpt.png\" alt=\"threadLocalOpt\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOpt.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOpt-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4936\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOptwin.png\" alt=\"threadLocalOptwin\" width=\"500\" height=\"162\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOptwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/threadLocalOptwin-300x97.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<p>The question is. Is it possible to calculate the sum in a fast way without synchronization? Yes.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Tasks\"><\/span>Tasks<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With <a href=\"https:\/\/www.modernescpp.com\/index.php\/tasks\">task, <\/a>we can do the whole job without synchronization. Each summation is performed in a separate thread and the final summation is in a single thread. Here are the details of <a href=\"https:\/\/www.modernescpp.com\/index.php\/tasks\">tasks.<\/a> I will use promise and future in the following program.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59\r\n60\r\n61\r\n62\r\n63\r\n64\r\n65\r\n66\r\n67\r\n68<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\">\/\/ tasks.cpp\r\n\r\n<span style=\"color: #008000;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;future&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;random&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;thread&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;utility&gt;<\/span>\r\n<span style=\"color: #008000;\">#include &lt;vector&gt;<\/span>\r\n\r\nconstexpr long long size= 100000000;   \r\n\r\nconstexpr long long firBound=  25000000;\r\nconstexpr long long secBound=  50000000;\r\nconstexpr long long thiBound=  75000000;\r\nconstexpr long long fouBound= 100000000;\r\n\r\nvoid sumUp(std::promise&lt;unsigned long long&gt;&amp;&amp; prom, const std::vector&lt;int&gt;&amp; val, unsigned long long beg, unsigned long long end){\r\n\tunsigned long long sum={};\r\n\t<span style=\"color: #0000ff;\">for<\/span> (auto i= beg; i &lt; end; ++i){\r\n\t    sum += val[i];\r\n    }\r\n    prom.set_value(sum);\r\n}\r\n\r\nint main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::vector&lt;int&gt; randValues;\r\n  randValues.reserve(size);\r\n\r\n  std::mt19937 engine;\r\n  std::uniform_int_distribution&lt;&gt; uniformDist(1,10);\r\n  <span style=\"color: #0000ff;\">for<\/span> ( long long i=0 ; i&lt; size ; ++i) randValues.push_back(uniformDist(engine));\r\n \r\n  std::promise&lt;unsigned long long&gt; prom1;\r\n  std::promise&lt;unsigned long long&gt; prom2;\r\n  std::promise&lt;unsigned long long&gt; prom3;\r\n  std::promise&lt;unsigned long long&gt; prom4;\r\n  \r\n  auto fut1= prom1.get_future();\r\n  auto fut2= prom2.get_future();\r\n  auto fut3= prom3.get_future();\r\n  auto fut4= prom4.get_future();\r\n  \r\n  \r\n  auto start = std::chrono::system_clock::now();\r\n\r\n  std::thread t1(sumUp,std::move(prom1),std::ref(randValues),0,firBound);\r\n  std::thread t2(sumUp,std::move(prom2),std::ref(randValues),firBound,secBound);\r\n  std::thread t3(sumUp,std::move(prom3),std::ref(randValues),secBound,thiBound);\r\n  std::thread t4(sumUp,std::move(prom4),std::ref(randValues),thiBound,fouBound);\r\n  \r\n  auto sum= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n \r\n  std::chrono::duration&lt;double&gt; dur= std::chrono::system_clock::now() - start;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Time for addition \"<\/span> &lt;&lt; dur.count() &lt;&lt; <span style=\"color: #a31515;\">\" seconds\"<\/span> &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Result: \"<\/span> &lt;&lt; sum &lt;&lt; std::endl;\r\n  \r\n  t1.join();\r\n  t2.join();\r\n  t3.join();\r\n  t4.join();\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I define in lines 37 &#8211; 45 the four promises and create from them the associated futures. Each promise is moved in lines 50 &#8211; 52 in a separate thread. A promise can only be moved; therefore, I use <span style=\"font-family: courier new,courier;\">std::move<\/span>. The work package of the thread is the function sumUp (lines 18 &#8211; 24). sumUp takes as the first argument a promise by rvalue reference. The futures ask in line 55 for the results. The<span style=\"font-family: courier new,courier;\"> get<\/span> call is blocking.&nbsp;<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Without_optimization-5\"><\/span>Without optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4937\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasks.png\" alt=\"tasks\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasks.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasks-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4938\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/taskswin.png\" alt=\"taskswin\" width=\"450\" height=\"145\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/taskswin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/taskswin-300x97.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Maximum_optimization-5\"><\/span>Maximum optimization<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4939\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOpt.png\" alt=\"tasksOpt\" width=\"400\" height=\"182\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOpt.png 415w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOpt-300x137.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4940\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOptwin.png\" alt=\"tasksOptwin\" width=\"450\" height=\"145\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOptwin.png 724w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/tasksOptwin-300x97.png 300w\" sizes=\"auto, (max-width: 450px) 100vw, 450px\" \/><\/p>\n<p>All numbers in the overview<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_overview\"><\/span>The overview<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As previously mentioned, the numbers are quite similar for Linux. That&#8217;s no surprise because I always use the same strategy: Calculate the partial sum locally without synchronization and add the local sums. The addition of the partial sums has to be synchronized. What astonished me was that maximum optimization makes no significant difference.&nbsp;<\/p>\n<p>On Windows, the story is different. First, it makes a big difference if I compile the program with maximum or without optimization; second, Windows is much slower than Linux. I&#8217;m unsure if that is since Windows has only two cores but Linux 4.<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4941\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng.png\" alt=\"MultipleThreadsEng\" width=\"700\" height=\"196\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng.png 868w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng-300x84.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/09\/MultipleThreadsEng-768x215.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Whats_next\"><\/span>What&#8217;s next?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I will reason in the<a href=\"https:\/\/www.modernescpp.com\/index.php\/my-conclusion-summation-of-a-vector-in-three-variants\"> next post <\/a>about the numbers for summing up a vector and the results that can be derived from it.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Until now, I&#8217;ve used two strategies to summate a std::vector. First, I did the whole math in one thread (Single Threaded: Summation of a vector); second multiple threads shared the same variable for the result (Multithreaded: Summation of a vector). In particular, the second strategy was extremely naive. In this post, I will apply my [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":4922,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[369],"tags":[434,470,504,519,446,487],"class_list":["post-4942","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-application","tag-atomics","tag-performance","tag-relaxed-semantics","tag-sequential-consistency","tag-tasks","tag-thread_local"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4942","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4942"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4942\/revisions"}],"predecessor-version":[{"id":6949,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4942\/revisions\/6949"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4922"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4942"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4942"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4942"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}