{"id":4853,"date":"2016-08-16T05:53:27","date_gmt":"2016-08-16T05:53:27","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/ongoing-optimization-acquire-release-semantic-with-cppmem\/"},"modified":"2023-10-21T20:13:09","modified_gmt":"2023-10-21T20:13:09","slug":"ongoing-optimization-acquire-release-semantic-with-cppmem","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/ongoing-optimization-acquire-release-semantic-with-cppmem\/","title":{"rendered":"Ongoing Optimization: Acquire-Release Semantics with CppMem"},"content":{"rendered":"<p>With the acquire-release semantics, we break the sequential consistency. In the acquire-release semantics, synchronization occurs between atomic operations on the same atomic and not between threads.<\/p>\n<p><!--more--><\/p>\n<h2>Acquire-release semantic<\/h2>\n<p>The acquire-release semantic is more lightweight and, therefore, faster than the <a href=\"https:\/\/www.modernescpp.com\/index.php\/sequential-consistency\">sequential consistency<\/a> because the synchronization only occurs between atomic operations. But although the intellectual challenge increases.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\n 2\n 3\n 4\n 5\n 6\n 7\n 8\n 9\n10\n11\n12\n13\n14\n15\n16\n17\n18\n19\n20\n21\n22\n23\n24\n25<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ ongoingOptimizationAcquireRelease.cpp<\/span>\n\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\n\nstd::atomic&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; x{0};\nstd::atomic&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; y{0};\n\n<span style=\"color: #2b91af;\">void<\/span> writing(){  \n  x.store(2000,std::memory_order_relaxed);  \n  y.store(11,std::memory_order_release);\n}\n\n<span style=\"color: #2b91af;\">void<\/span> reading(){  \n  std::cout &lt;&lt; y.load(std::memory_order_acquire) &lt;&lt; <span style=\"color: #a31515;\">\" \"<\/span>;  \n  std::cout &lt;&lt; x.load(std::memory_order_relaxed) &lt;&lt; std::endl;\n}\n\n<span style=\"color: #2b91af;\">int<\/span> main(){\n  std::<span style=\"color: #0000ff;\">thread<\/span> thread1(writing);\n  std::<span style=\"color: #0000ff;\">thread<\/span> thread2(reading);\n  thread1.join();\n  thread2.join();\n};\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>At first glance, you will notice that all operations are atomic. So the program is well-defined. But the second glance shows more. The atomic operations on y are attached with the flags<span style=\"font-family: courier new,courier;\"> std::memory_order_release<\/span> (line 12) and <span style=\"font-family: courier new,courier;\">std::memory_order_acquire <\/span>(line 16). In contrast, the atomic operations on x are annotated with <span style=\"font-family: courier new,courier;\">std::memory_order_relaxed<\/span>. So there are no <a href=\"https:\/\/www.modernescpp.com\/index.php\/synchronization-and-ordering-constraints\">synchronization and ordering constraints<\/a> for x. The key for the possible values for x and y can only be answered by y.<\/p>\n<p>It holds:<\/p>\n<ol>\n<li><span style=\"font-family: courier new,courier;\">y.store(11,std::memory_order_release)<\/span> <strong>synchronizes-with<\/strong> <span style=\"font-family: courier new,courier;\">y.load(std::memory_order_acquire)<\/span><\/li>\n<li><span style=\"font-family: courier new,courier;\">x.store(2000,std::memory_order_relaxed<\/span><strong> is visible before <\/strong> <span style=\"font-family: courier new,courier;\">y.store(11,std::memory_order_release)<\/span><\/li>\n<li>y<span style=\"font-family: courier new,courier;\">.load(std::memory_order_acquire)<\/span> <strong>is visible before <\/strong> <span style=\"font-family: courier new,courier;\">x.load(std::memory_order_relaxed)<\/span><\/li>\n<\/ol>\n<p>I will elaborate a little bit more on these three statements. The key idea is that the store of y in line 10 synchronizes with the load of y in line 16. The reason is that the operations occur on the same atomic and follow the acquire-release semantic. So y uses <span style=\"font-family: courier new,courier;\">std::memory_order_release <\/span>in line 12 and <span style=\"font-family: courier new,courier;\">std::memory_order_acquire <\/span>in line 16. But the pairwise operations on y have another very interesting property. They establish a kind of barrier relative to y. So\u00a0<span style=\"font-family: courier new,courier;\">x.store(2000,std::memory_order_relaxed)<\/span> can not be executed <strong>after <\/strong><span style=\"font-family: courier new,courier;\">y.store(std::memory_order_release)<\/span>, so\u00a0<span style=\"font-family: courier new,courier;\">x.load()\u00a0<\/span>can not be executed <strong>before<\/strong><span style=\"font-family: courier new,courier;\"> y.load()<\/span>.<\/p>\n<p>The reasoning was in the case of the acquire-release semantic more sophisticated than in the case of the<a href=\"https:\/\/www.modernescpp.com\/index.php\/sequential-consistency\"> sequential consistency<\/a>. But the possible values for x and y are the same. Only the combination y == 11 and x == 0 is no possible.<\/p>\n<p>Three different interleavings of the threads are possible, producing the three different combinations of x and y.<\/p>\n<ol>\n<li><span style=\"font-family: courier new,courier;\">thread1<\/span> will be executed before <span style=\"font-family: courier new,courier;\">thread2<\/span>.<\/li>\n<li><span style=\"font-family: courier new,courier;\">thread2<\/span> will be executed before <span style=\"font-family: courier new,courier;\">thread1<\/span>.<\/li>\n<li><span style=\"font-family: courier new,courier;\">thread1<\/span> executes<span style=\"font-family: courier new,courier;\"> x.store(2000)<\/span>, before <span style=\"font-family: courier new,courier;\">thread2<\/span> will be\u00a0exectued.<\/li>\n<\/ol>\n<p>At the end the table.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4849\" style=\"margin: 15px;\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png\" alt=\"sukzessiveOptimierungSequenzielleKonsistenzEng\" width=\"308\" height=\"230\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png 308w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng-300x224.png 300w\" sizes=\"auto, (max-width: 308px) 100vw, 308px\" \/><\/p>\n<h3>CppMem<\/h3>\n<p>At first, the program once more with <a href=\"http:\/\/svr-pes20-cppmem.cl.cam.ac.uk\/cppmem\/\">CppMem<\/a>.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #2b91af;\">int<\/span> main(){\n  atomic_int x= 0; \n  atomic_int y= 0;\n  {{{ { \n      x.store(2000,memory_order_relaxed);\n      y.store(11,memory_order_release);\n      }\n  ||| {\n      y.load(memory_order_acquire);\n      x.load(memory_order_relaxed);\n      } \n  }}}\n}\n<\/pre>\n<\/div>\n<p>We already know all results except of (y=11, x=0) are possible.<\/p>\n<h3>Possible executions<\/h3>\n<p>Have a look at the three graphs with consistent execution. The graphs show an acquire-release semantics between the store-release of y and the load-acquire from y. It makes no difference if the reading of y (<strong><span style=\"color: #ff0000;\">rf<\/span><\/strong>) occurs in the main thread or a separate thread. The graphs show the synchronizes-with relation with an <span style=\"color: #a23ff3;\">sw <\/span>arrow.<\/p>\n<h4>Execution for (y=0, x= 0)<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4842\" style=\"margin: 15px;\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first.png\" alt=\"first\" width=\"380\" height=\"232\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first.png 380w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first-300x183.png 300w\" sizes=\"auto, (max-width: 380px) 100vw, 380px\" \/><\/p>\n<h4>Execution for (y= 0, x= 2000)<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4843\" style=\"margin: 15px;\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second.png\" alt=\"second\" width=\"448\" height=\"221\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second.png 448w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second-300x148.png 300w\" sizes=\"auto, (max-width: 448px) 100vw, 448px\" \/><\/p>\n<h4>Execution for (y=11, x= 2000)<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4844\" style=\"margin: 15px;\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third.png\" alt=\"third\" width=\"448\" height=\"221\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third.png 448w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third-300x148.png 300w\" sizes=\"auto, (max-width: 448px) 100vw, 448px\" \/><\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>But we can do better. Why should x be atomic? There is no reason. That was my first but incorrect assumption. Why? You will read it in the <a href=\"https:\/\/www.modernescpp.com\/index.php\/ongoing-optimization-a-data-race-with-cppmem\">next post.<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With the acquire-release semantics, we break the sequential consistency. In the acquire-release semantics, synchronization occurs between atomic operations on the same atomic and not between threads.<\/p>\n","protected":false},"author":21,"featured_media":4849,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[369],"tags":[505,434,486,521],"class_list":["post-4853","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-application","tag-acquire-release-semantic","tag-atomics","tag-cppmem","tag-ongoing-optimization"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4853","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4853"}],"version-history":[{"count":2,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4853\/revisions"}],"predecessor-version":[{"id":8535,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4853\/revisions\/8535"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4849"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4853"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4853"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4853"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}