{"id":5191,"date":"2017-02-16T18:43:10","date_gmt":"2017-02-16T18:43:10","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/multithreading-in-c-17-and-c-20\/"},"modified":"2023-06-26T12:22:35","modified_gmt":"2023-06-26T12:22:35","slug":"multithreading-in-c-17-and-c-20","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/multithreading-in-c-17-and-c-20\/","title":{"rendered":"Multithreading with C++17 and C++20"},"content":{"rendered":"<p>Forecasts about the future are difficult. In particular, when they are about C++20. Nevertheless, I will look into the crystal ball and write in the following posts about what we will get with C++17&nbsp; and what we can hope for with C++20.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5188\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20.png\" alt=\"timelineCpp17andCpp20\" width=\"700\" height=\"305\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20.png 890w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20-300x131.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20-768x335.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>Since C++11, C++ faces the requirements of multicore architectures. The 2011 published standard defines how a program should behave in the presence of many threads. The multithreading capabilities of C++11 consist of two parts. On the one hand, there is the well-defined memory model; on the other hand, there is the standardized threading API.<\/p>\n<p>The well-defined memory model deals with the following questions.<\/p>\n<ol>\n<li>What are atomic operations?<\/li>\n<li>Which sequence of operations is guaranteed?<\/li>\n<li>When are the memory effects of operations visible?<\/li>\n<\/ol>\n<p>The standardized threading interface in C++11 consists of the following components.<\/p>\n<ol>\n<li>Threads<\/li>\n<li>Tasks<\/li>\n<li>Thread-local data<\/li>\n<li>Condition variables<\/li>\n<\/ol>\n<p>If that is not too boring, read the posts about the <a href=\"https:\/\/www.modernescpp.com\/index.php\/category\/multithreading-memory-model\">memory model <\/a>and the <a href=\"https:\/\/www.modernescpp.com\/index.php\/category\/multithreading\">standardized threading API<\/a>.<\/p>\n<p>Wearing my multithreading glasses, C++14 has not have much to offer. C++14 added <a href=\"https:\/\/www.modernescpp.com\/index.php\/reader-writer-locks\">Reader-Writer Locks<\/a>.<\/p>\n<p>The question, which arises, is: What has the C++ future to offer?<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5189\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20_1.png\" alt=\"timelineCpp17andCpp20 1\" width=\"700\" height=\"365\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20_1.png 896w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20_1-300x157.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/timelineCpp17andCpp20_1-768x401.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<h2>C++17<\/h2>\n<p>With C++17, most of the algorithms of the Standard Template Library will be available in a parallel version. Therefore, you can invoke an algorithm with a so-called execution policy. This execution policy specifies if the algorithm runs sequential (<span style=\"font-family: courier new,courier;\">std::seq)<\/span>, parallel (<span style=\"font-family: courier new,courier;\">std::par<\/span>), or parallel and vectorized (<span style=\"font-family: courier new,courier;\">std::par_unseq<\/span>).<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0000ff;\"><\/span><span style=\"color: #0000ff;\"><\/span>\r\nstd::vector&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; vec ={3, 2, 1, 4, 5, 6, 10, 8, 9, 4};\r\n<span style=\"color: #008000;\"><\/span>\r\nstd::sort(vec.begin(), vec.end());                            <span style=\"color: #008000;\">\/\/ sequential as ever<\/span>\r\nstd::sort(std::execution::seq, vec.begin(), vec.end());       <span style=\"color: #008000;\">\/\/ sequential<\/span>\r\nstd::sort(std::execution::par, vec.begin(), vec.end());       <span style=\"color: #008000;\">\/\/ parallel<\/span>\r\nstd::sort(std::execution::par_unseq, vec.begin(), vec.end()); <span style=\"color: #008000;\">\/\/ parallel and vectorized<\/span>\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Therefore, the first and second variations of the sort algorithm run sequential, the third parallel, and the fourth parallel and vectorized.<\/p>\n<p>C++20 offers totally new multithreading concepts. The key idea is that multithreading becomes a lot simpler and less error-prone.<\/p>\n<\/p>\n<h2>C++20<\/h2>\n<h3>Atomic smart pointer<\/h3>\n<p>The atomic smart pointers <span style=\"font-family: courier new,courier;\">std::shared_ptr<\/span> and <span style=\"font-family: courier new,courier;\">std::weak_ptr<\/span> have a conceptual issue in multithreading programs. They share a mutable state. Therefore, they a prone to <a href=\"https:\/\/www.modernescpp.com\/index.php\/threads-sharing-data\">data races <\/a>and, therefore, undefined behavior. <span style=\"font-family: courier new,courier;\">std::shared_ptr<\/span> and <span style=\"font-family: courier new,courier;\">std::weak_ ptr<\/span> guarantee that the in- or decrementing of the reference counter is an atomic operation and the resource will be deleted exactly once. Still, both do not guarantee that the access to its resource is atomic. The new atomic smart pointers solve this issue.<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\">std::atomic_shared_ptr\r\nstd::atomic_weak_ptr\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>With tasks called promises and futures, we got a new multithreading concept in C++11. Although tasks have a lot to offer, they have a big drawbacks. Futures can not be composed in C++11.<\/p>\n<h3>std::future extensions<\/h3>\n<p>That will not hold for futures in C++20. Therefore, a future becomes ready, when<\/p>\n<ul>\n<li>its predecessor becomes ready:<\/li>\n<\/ul>\n<p style=\"padding-left: 60px;\">&nbsp;<strong>then:<\/strong><\/p>\n<p style=\"padding-left: 60px;\"><!-- HTML generated using hilite.me --><\/p>\n<pre style=\"margin: 0px; line-height: 125%; padding-left: 60px;\">future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; f1= async([]() {<span style=\"color: #0000ff;\">return<\/span> 123;});\r\nfuture&lt;string&gt; f2 = f1.then([](future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; f) {     \r\n  <span style=\"color: #0000ff;\">return<\/span> f.get().to_string(); \r\n});<br \/>\r\n<\/pre>\n<ul>\n<li>one of its predecessors become ready:<\/li>\n<\/ul>\n<p style=\"padding-left: 60px;\"><strong>when_any:<\/strong><\/p>\n<div style=\"background: #ffffff none repeat scroll 0% 0%; overflow: auto; width: auto; border-width: 0.1em 0.1em 0.1em 0.8em; padding-left: 60px;\">\n<pre style=\"margin: 0px; line-height: 125%;\">future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; futures[] = {async([]() { <span style=\"color: #0000ff;\">return<\/span> intResult(125); }),                          \r\n                         async([]() { <span style=\"color: #0000ff;\">return<\/span> intResult(456); })};\r\nfuture&lt;vector&lt;future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt;&gt;&gt; any_f = when_any(begin(futures),end(futures));<br \/>\r\n<\/pre>\n<\/div>\n<ul>\n<li>all of its predecessors become ready:<\/li>\n<\/ul>\n<p style=\"padding-left: 60px;\"><strong>when_all:<\/strong><\/p>\n<div style=\"background: #ffffff none repeat scroll 0% 0%; overflow: auto; width: auto; border-width: 0.1em 0.1em 0.1em 0.8em; padding-left: 60px;\">\n<pre style=\"margin: 0px; line-height: 125%;\">future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; futures[] = {async([]() { <span style=\"color: #0000ff;\">return<\/span> intResult(125); }),                          \r\n                         async([]() { <span style=\"color: #0000ff;\">return<\/span> intResult(456); })};\r\nfuture&lt;vector&lt;future&lt;<span style=\"color: #2b91af;\">int<\/span>&gt;&gt;&gt; all_f = when_all(begin(futures), end(futures));\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>C++14 has no semaphores. Semaphores enable threads can control access to a shared resource. No problem; with C++20, we get latches and barriers.<\/p>\n<h3>Latches and barriers<\/h3>\n<p>You can use latches and barriers for waiting at a synchronization point until the counter becomes zero. The difference is <span style=\"font-family: courier new,courier;\">std::latch<\/span> can only be used once; <span style=\"font-family: courier new,courier;\">std::barrier<\/span> and <span style=\"font-family: courier new,courier;\">std::flex_barrier<\/span> more the once. Contrary to a <span style=\"font-family: courier new,courier;\">std::barrier,<\/span> a <span style=\"font-family: courier new,courier;\">std::flex_barrier<\/span> can adjust its counter after each iteration.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #2b91af;\">void<\/span> doWork(threadpool* pool){\r\n  latch completion_latch(NUMBER_TASKS);\r\n  <span style=\"color: #0000ff;\">for<\/span> (<span style=\"color: #2b91af;\">int<\/span> i = 0; i &lt; NUMBER_TASKS; ++i){\r\n    pool-&gt;add_task([&amp;]{\r\n      <span style=\"color: #008000;\">\/\/ perform the work<\/span>\r\n      ...\r\n      completion_latch.count_down();\r\n    });\r\n  }\r\n  <span style=\"color: #008000;\">\/\/ block until all tasks are done<\/span>\r\n  completion_latch.wait();\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The thread running the function <span style=\"font-family: courier new,courier;\">doWork<\/span> waits in line 11 until the <span style=\"font-family: courier new,courier;\">completion_latch<\/span> becomes 0. The <span style=\"font-family: courier new,courier;\">completion_latch<\/span> is set to <span style=\"font-family: courier new,courier;\">NUMBER_TASKS<\/span> in line 2 and decremented in line 7.<\/p>\n<p>Coroutines are generalized functions. Contrary to functions, you can suspend and resume the execution of the coroutine while keeping its state.<\/p>\n<h3>Coroutines<\/h3>\n<p>Coroutines are often the means of choice to implement cooperative multitasking in operating systems, event loops, infinite lists, or pipelines.<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\">generator&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; getInts(<span style=\"color: #2b91af;\">int<\/span> first, <span style=\"color: #2b91af;\">int<\/span> last){\r\n  <span style=\"color: #0000ff;\">for<\/span> (<span style=\"color: #0000ff;\">auto<\/span> i= first; i &lt;= last; ++i){\r\n    co_yield i;\r\n  }\r\n}\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n  <span style=\"color: #0000ff;\">for<\/span> (<span style=\"color: #0000ff;\">auto<\/span> i: getInts(5, 10)){\r\n    std::cout &lt;&lt; i &lt;&lt; <span style=\"color: #a31515;\">\" \"<\/span>;                      \/\/ 5 6 7 8 9 10\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The function <span style=\"font-family: courier new,courier;\">getInts<\/span> (lines 1 &#8211; 5) gives back a generator that returns on request a value. The expression <span style=\"font-family: courier new,courier;\">co_yield<\/span> serves two purposes. At first, it returns a new value, and a second, it waits until a new value is requested. The range-based for-loop successively requests values from 5 to 10.<\/p>\n<p>With transaction memory, the well-established idea of transactions will be applied in software.<\/p>\n<h3>Transactional memory<\/h3>\n<p>The transactional memory idea is based on transactions from the database theory. A transaction is an action that provides the properties <strong>A<\/strong>tomicity, <strong>C<\/strong>onsistency,<strong> I<\/strong>solation, and<strong> D<\/strong>urability (ACID). Except for durability, all properties will hold for transactional memory in C++. C++ will have transactional memory in two flavors. One is called synchronized blocks, and the other atomic blocks. Both have in common that they will be executed in total order and behave as a global lock protecting them. Contrary to synchronized blocks, atomic blocks can not execute transaction-unsafe code.<\/p>\n<p>Therefore, you can invoke <span style=\"font-family: courier new,courier;\">std::cout<\/span> in a synchronized block but not an atomic one.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #2b91af;\">int<\/span> func() { \r\n  <span style=\"color: #0000ff;\">static<\/span> <span style=\"color: #2b91af;\">int<\/span> i = 0; \r\n  synchronized{ \r\n    std::cout &lt;&lt; <span style=\"color: #a31515;\">\"Not interleaved \\n\"<\/span>; \r\n    ++i; \r\n    <span style=\"color: #0000ff;\">return<\/span> i;  \r\n  } \r\n}\r\n \r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n  std::vector&lt;std::<span style=\"color: #0000ff;\">thread<\/span>&gt; v(10); \r\n  <span style=\"color: #0000ff;\">for<\/span>(<span style=\"color: #0000ff;\">auto<\/span>&amp; t: v) \r\n    t = std::<span style=\"color: #0000ff;\">thread<\/span>([]{ <span style=\"color: #0000ff;\">for<\/span>(<span style=\"color: #2b91af;\">int<\/span> n = 0; n &lt; 10; ++n) func(); });\r\n} \r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The <span style=\"font-family: courier new,courier;\">synchronized<\/span> keyword in line 3 guarantees that the execution of the synchronized block (lines 3 &#8211; 7) will not overlap. That means, in particular, that there is a single, total order between all synchronized blocks. To say it the other way around. The end of each synchronized block synchronizes with the start of the next synchronized block.<\/p>\n<p>&nbsp;<\/p>\n<p>Although I called this post Multithreading in C++17 and C++20, we get with task blocks beside the parallel STL more parallel features in C++.<\/p>\n<h3>Task blocks<\/h3>\n<p>Task Blocks implement the fork-join paradigm. The graphic shows the key idea.<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5190\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/ForkJoin.png\" alt=\"ForkJoin\" width=\"700\" height=\"178\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/ForkJoin.png 1271w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/ForkJoin-300x76.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/ForkJoin-1024x261.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2017\/02\/ForkJoin-768x196.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>By using <span style=\"font-family: courier new,courier;\">run<\/span> in a task block, you can fork new tasks that will be joined at the end of the task block.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table style=\"width: 525px; height: 186px;\">\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0000ff;\">template<\/span> &lt;<span style=\"color: #0000ff;\">typename<\/span> Func&gt; \r\n<span style=\"color: #2b91af;\">int<\/span> traverse(node&amp; n, Func &amp;&amp; f){ \r\n    <span style=\"color: #2b91af;\">int<\/span> left = 0, right = 0; \r\n    define_task_block(                 \r\n        [&amp;](task_block&amp; tb){ \r\n            <span style=\"color: #0000ff;\">if<\/span> (n.left) tb.run([&amp;]{ left = traverse(*n.left, f); }); \r\n            <span style=\"color: #0000ff;\">if<\/span> (n.right) tb.run([&amp;]{ right = traverse(*n.right, f); });\r\n         }\r\n    );                                                         \r\n    <span style=\"color: #0000ff;\">return<\/span> f(n) + left + right; \r\n} <\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p><span style=\"font-family: courier new,courier;\">traverse<\/span> is a function template that invokes the function <span style=\"font-family: courier new,courier;\">Func<\/span> on each node of its <span style=\"font-family: courier new,courier;\">tree.<\/span> The expression&nbsp;<span style=\"font-family: courier new,courier;\"><\/span><span style=\"font-family: courier new,courier;\">define_task_block<\/span> defines the task block. In this region, you have a task block <span style=\"font-family: courier new,courier;\">tb<\/span> at your disposal to start new tasks. Exactly that is happening in the left and right branches of the tree (lines 6 and 7). Line 9 is the end of the task block and, therefore, the synchronization point.<\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>After I have given the overview of the new multithreading features in C++17 and C++20, I will provide the details in the <a href=\"https:\/\/www.modernescpp.com\/index.php\/parallel-algorithm-of-the-standard-template-library\">next posts<\/a>. I will start with the parallel STL. I&#8217;m quite sure that my post has left more questions open than answered.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Forecasts about the future are difficult. In particular, when they are about C++20. Nevertheless, I will look into the crystal ball and write in the following posts about what we will get with C++17&nbsp; and what we can hope for with C++20.<\/p>\n","protected":false},"author":21,"featured_media":5188,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[367],"tags":[482],"class_list":["post-5191","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-c-17-and-c-20","tag-outdated"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5191","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=5191"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5191\/revisions"}],"predecessor-version":[{"id":6886,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5191\/revisions\/6886"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/5188"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=5191"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=5191"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=5191"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}