{"id":4852,"date":"2016-08-13T04:55:35","date_gmt":"2016-08-13T04:55:35","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/ongoing-optiimization-sequential-consistency-with-cppmem\/"},"modified":"2023-06-26T12:46:44","modified_gmt":"2023-06-26T12:46:44","slug":"ongoing-optiimization-sequential-consistency-with-cppmem","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/ongoing-optiimization-sequential-consistency-with-cppmem\/","title":{"rendered":"Ongoing Optimization: Sequential Consistency with CppMem"},"content":{"rendered":"<p>With atomic data types, you can tailor your program to your needs and optimize it. But now we are in the domain of multithreading experts.<\/p>\n<p><!--more--><\/p>\n<h2>Sequential consistency<\/h2>\n<p>If you don&#8217;t specify the <a href=\"https:\/\/www.modernescpp.com\/index.php\/c-memory-model\">memory model<\/a>, <a href=\"https:\/\/www.modernescpp.com\/index.php\/sequential-consistency\">sequential consistency<\/a> will be used. The sequential consistency guarantees two properties. Each thread executes its instructions in source code order, and all threads follow a global order.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; border-width: .1em .1em .1em .8em; padding: .2em .6em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ OngoingOptimizationSequentialConsistency.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\nstd::atomic&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; x{0};\r\nstd::atomic&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; y{0};\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> writing(){  \r\n  x.store(2000);  \r\n  y.store(11);\r\n}\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> reading(){  \r\n  std::cout &lt;&lt; y.load() &lt;&lt; <span style=\"color: #a31515;\">\" \"<\/span>;  \r\n  std::cout &lt;&lt; x.load() &lt;&lt; std::endl;\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>This knowledge is sufficient to analyze the program. Because x and y are atomic, the program has no <a href=\"https:\/\/www.modernescpp.com\/index.php\/threads-sharing-data\">race condition<\/a>. So there is only the question. What values are possible for x and y? But the question is <em>easy<\/em> to answer. Because of the sequential consistency, all threads must follow a global order.<\/p>\n<p>&nbsp;<\/p>\n<p>It holds:<\/p>\n<ol>\n<li><span style=\"font-family: courier new,courier;\">x.store(2000); <\/span><strong>happens-before<\/strong> <span style=\"font-family: courier new,courier;\">y.store(11);<\/span><\/li>\n<li><span style=\"font-family: courier new,courier;\">std::cout &lt;&lt; y.load() &lt;&lt; &#8221; &#8220;;<\/span> <strong>happens-before<\/strong> <span style=\"font-family: courier new,courier;\">std::cout &lt;&lt; x.load() &lt;&lt; std::endl;<\/span><\/li>\n<\/ol>\n<p>Therefor: <span style=\"font-family: courier new,courier;\">x.load()<\/span> can not have 0, if <span style=\"font-family: courier new,courier;\">y.load()<\/span> is 11, because<span style=\"font-family: courier new,courier;\"> x.store(2000)<\/span> happens before<span style=\"font-family: courier new,courier;\"> y.store(11)<\/span>.<\/p>\n<p>All other values for x and y are possible. Here are three possible interleavings, producing the three different results for x and y.<\/p>\n<ol>\n<li><span style=\"font-family: courier new,courier;\">thread1<\/span> will be completely executed before <span style=\"font-family: courier new,courier;\">thread2<\/span>.<\/li>\n<li><span style=\"font-family: courier new,courier;\">thread2<\/span> will be completely executed before <span style=\"font-family: courier new,courier;\">thread1<\/span>.<\/li>\n<li><span style=\"font-family: courier new,courier;\">thread1<\/span> executes the first instruction <span style=\"font-family: courier new,courier;\"> x.store(2000)<\/span>, before <span style=\"font-family: courier new,courier;\">thread2<\/span>&nbsp;will be completely executed.<\/li>\n<\/ol>\n<p>Here all values for x and y.&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4849\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png\" alt=\"sukzessiveOptimierungSequenzielleKonsistenzEng\" style=\"margin: 15px;\" width=\"308\" height=\"230\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png 308w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng-300x224.png 300w\" sizes=\"auto, (max-width: 308px) 100vw, 308px\" \/><\/p>\n<p>So how does this look like in <a href=\"https:\/\/www.modernescpp.com\/index.php\/cppmem-an-overview\">CppMem<\/a>.<\/p>\n<\/p>\n<h2>CppMem<\/h2>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #2b91af;\">int<\/span> main(){\r\n  atomic_int x= 0; \r\n  atomic_int y= 0;\r\n  {{{ { \r\n      x.store(2000);\r\n      y.store(11);\r\n      }\r\n  ||| {\r\n      y.load();\r\n      x.load();\r\n      } \r\n  }}}\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>At first a little bit of syntax of CppMem. CppMem uses in line 2 and 3 the&nbsp;<span style=\"font-family: courier new,courier;\">typedef atomic_int<\/span> for <span style=\"font-family: courier new,courier;\">std::atomic&lt;int&gt;<\/span>.<\/p>\n<p>If I execute the program, I&#8217;m overwhelmed by the sheer number of execution candidates.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4801\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/07\/SequenzielleKonsistenz.png\" alt=\"SequenzielleKonsistenz\" width=\"800\" height=\"615\" style=\"margin: 15px;\" \/><\/p>\n<p>384 (<span style=\"color: #ff0000;\"><strong>1<\/strong><\/span>) possible execution candidates, and only 6 of them are consistent. No candidate has a <a href=\"https:\/\/www.modernescpp.com\/index.php\/threads-sharing-data\">data race<\/a>. How does that work?&nbsp;<\/p>\n<p>But I&#8217;m only interested in consistent executions. I use interface (<strong><span style=\"color: #ff0000;\">2<\/span><\/strong>) to analyze the six annotated graphs. The other (378) are not consistent. That means, for example, they do not respect the <a href=\"http:\/\/en.cppreference.com\/w\/cpp\/atomic\/memory_order\">modification order<\/a>. So I totally ignore them.<\/p>\n<p>We know already that all values for x and y are possible, except for y= 11 and x= 0. That&#8217;s because of the default memory model.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4849\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png\" alt=\"sukzessiveOptimierungSequenzielleKonsistenzEng\" style=\"margin: 15px;\" width=\"308\" height=\"230\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng.png 308w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/sukzessiveOptimierungSequenzielleKonsistenzEng-300x224.png 300w\" sizes=\"auto, (max-width: 308px) 100vw, 308px\" \/><\/p>\n<p>Now the questions are. Which interleavings of the threads produce which values for x and y? I already introduce the symbols in the annotated graph (<a href=\"https:\/\/www.modernescpp.com\/index.php\/cppmem-an-overview\">CppMem &#8211; An overview<\/a>); therefore, I will concentrate my analysis on the results for x and y.<\/p>\n<h3>Execution for (y= 0, x= 0)<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4842\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first.png\" alt=\"first\" style=\"margin: 15px;\" width=\"380\" height=\"232\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first.png 380w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/first-300x183.png 300w\" sizes=\"auto, (max-width: 380px) 100vw, 380px\" \/><\/p>\n<h3>Executions for (y= 0, x= 2000)<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4843\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second.png\" alt=\"second\" style=\"margin: 15px;\" width=\"448\" height=\"221\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second.png 448w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/second-300x148.png 300w\" sizes=\"auto, (max-width: 448px) 100vw, 448px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4844\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third.png\" alt=\"third\" style=\"margin: 15px;\" width=\"448\" height=\"221\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third.png 448w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/third-300x148.png 300w\" sizes=\"auto, (max-width: 448px) 100vw, 448px\" \/><\/p>\n<h3><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4845\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/four.png\" alt=\"four\" style=\"margin: 15px;\" width=\"448\" height=\"221\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/four.png 448w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/four-300x148.png 300w\" sizes=\"auto, (max-width: 448px) 100vw, 448px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4850\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/five.png\" alt=\"five\" style=\"margin: 15px;\" width=\"382\" height=\"229\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/five.png 382w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/five-300x180.png 300w\" sizes=\"auto, (max-width: 382px) 100vw, 382px\" \/><\/h3>\n<h3>&nbsp;<\/h3>\n<h3>Execution for (y= 11, x= 2000)<\/h3>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4851\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/six.png\" alt=\"six\" style=\"margin: 15px;\" width=\"382\" height=\"229\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/six.png 382w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/six-300x180.png 300w\" sizes=\"auto, (max-width: 382px) 100vw, 382px\" \/><\/p>\n<p>Do you know why I used the red numbers in the graphs? I have because I&#8217;m not done with my analysis.&nbsp;<\/p>\n<h3>Deeper insights<\/h3>\n<p>If I look at the six different thread interleavings in the following graphic, I have the question? Which sequence of instructions corresponds to which graph? Here is the solution. I have assigned to each sequence of instructions the corresponding graph.<\/p>\n<p>&nbsp;<\/p>\n<h4>Sequences of&nbsp;instructions<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4786\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/atomicInterleavingEng.png\" alt=\"atomicInterleavingEng\" width=\"700\" height=\"525\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/atomicInterleavingEng.png 960w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/atomicInterleavingEng-300x225.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/atomicInterleavingEng-768x576.png 768w\" sizes=\"auto, (max-width: 700px) 100vw, 700px\" \/><\/p>\n<p>I start with the simpler cases:<\/p>\n<ul>\n<li><strong><span style=\"color: #ff0000;\">(1)<\/span><\/strong>: It&#8217;s quite simple to assign the graph (1) to the sequence (1). In the sequence (1) have x and y the values&nbsp;0, because<span style=\"font-family: courier new,courier;\"> y.load()<\/span> and <span style=\"font-family: courier new,courier;\">x.load()<\/span> are executed before the operations <span style=\"font-family: courier new,courier;\">x.store(2000)<\/span> and<span style=\"font-family: courier new,courier;\"> y.store(11)<\/span>.<\/li>\n<li><strong><span style=\"color: #ff0000;\">(6)<\/span><\/strong>: The argumentation for the execution (6) is accordingly. y has the value 11 and y the value 2000 if all load operations happen after all store operations.&nbsp;<\/li>\n<li><strong><span style=\"color: #ff0000;\">(2),(3),(4),(5): <\/span><\/strong><span style=\"color: #000000;\">Now to the more interesting cases, in which y has den value 0 and x has the value 2000. The yellow arrows (sc) are the key to my reasoning because they stand for the sequence of instructions. For example, let&#8217;s look at execution <strong><span style=\"color: #ff0000;\">(2)<\/span><\/strong>.<\/span>\n<ul>\n<li><strong><span style=\"color: #ff0000;\">(2)<\/span><\/strong>: The sequence of the yellow arrows (sc) in the graph <strong><span style=\"color: #ff0000;\">(2)<\/span><\/strong> is: Write x= 2000 =&gt; Read y= 0 =&gt; Write y= 11 =&gt; Read x= 2000. This corresponds to the instructions of the second interleaving of threads <strong><span style=\"color: #ff0000;\">(2)<\/span><\/strong>.&nbsp;<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<\/li>\n<\/ul>\n<h2>What&#8217;s next?<\/h2>\n<p>In the<a href=\"https:\/\/www.modernescpp.com\/index.php\/ongoing-optimization-acquire-release-semantic-with-cppmem\"> next post<\/a>, I will break the sequential consistency. So what will happen if a base my optimization on the acquire-release semantic?<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>With atomic data types, you can tailor your program to your needs and optimize it. But now we are in the domain of multithreading experts.<\/p>\n","protected":false},"author":21,"featured_media":4849,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[369],"tags":[434,486,521,519],"class_list":["post-4852","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-application","tag-atomics","tag-cppmem","tag-ongoing-optimization","tag-sequential-consistency"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4852","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4852"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4852\/revisions"}],"predecessor-version":[{"id":6956,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4852\/revisions\/6956"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4849"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4852"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4852"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4852"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}