{"id":4790,"date":"2016-06-17T06:44:57","date_gmt":"2016-06-17T06:44:57","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/the-atomic-flag\/"},"modified":"2023-06-26T12:52:48","modified_gmt":"2023-06-26T12:52:48","slug":"the-atomic-flag","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/the-atomic-flag\/","title":{"rendered":"The Atomic Flag"},"content":{"rendered":"<p>Atomics guarantee two characteristics. On the one hand, they are atomic, on the other, they provide synchronization and order constraints on the program execution.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p>I introduced in the last post sequential consistency as the default behavior of atomic operations. But what does that mean? You can specify for each atomic operation the memory order. If not specified, <span style=\"font-family: courier new,courier;\">std::memory_order_seq_cst<\/span> is used.<\/p>\n<p>So this piece of code<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\">x.store(1);\r\nres= x.load();\r\n<\/pre>\n<\/div>\n<p>is equivalent to the following piece of code.<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\">x.store(1,std::memory_order_seq_cst);\r\nres= x.load(std::memory_order_seq_cst);<span style=\"font-family: arial,helvetica,sans-serif;\"><\/span>\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>For simplicity reasons, I will use the first spelling in this post.<\/p>\n<\/p>\n<h2>std::atomic_flag<\/h2>\n<p><span style=\"font-family: courier new,courier;\">std::atomic_flag<\/span> has a simple interface. Its method <span style=\"font-family: courier new,courier;\">clear<\/span> enables you the set its value to <span style=\"font-family: courier new,courier;\">false,<\/span> with <span style=\"font-family: courier new,courier;\">test_and_set<\/span> back to <span style=\"font-family: courier new,courier;\">true. <\/span>In case you use <span style=\"font-family: courier new,courier;\">test_and_set<\/span>, you get the old value back. To use <span style=\"font-family: courier new,courier;\">std::atomic_flag<\/span> it must be initialized to <span style=\"font-family: courier new,courier;\">false<\/span> with the constant ATOMIC_FLAG_INIT<span style=\"font-family: courier new,courier;\">.<\/span> That is not so thrilling. But <span style=\"font-family: courier new,courier;\">std::atomic_flag<\/span> has two very interesting properties.&nbsp;<\/p>\n<p><strong><span style=\"font-family: courier new,courier;\">std::atomic_flag<\/span><\/strong> is<\/p>\n<ul>\n<li>the only lock-free atomic.<\/li>\n<li>the building block for higher thread abstractions.<\/li>\n<\/ul>\n<p>The only lock-free atomic? The remaining more powerful atomics can provide their functionality by using a <a href=\"https:\/\/www.modernescpp.com\/index.php\/the-risk-of-mutexes\">mutex.<\/a> That is according to the C++ standard.&nbsp; So these atomics have a method <span style=\"font-family: courier new,courier;\">is_lock_free<\/span> to check if the atomic uses a mutex internally. On popular platforms, I always get the answer <span style=\"font-family: courier new,courier;\">false.<\/span> But you should be aware of that.<\/p>\n<p>The interface of a std::atomic_flag is sufficient to build a spinlock. You can protect a critical section similar to a mutex with a spinlock. But the spinlock will not passively wait in opposition to a mutex until it gets it mutex. It will eagerly ask for the critical section. So it saves the expensive context change in the wait state, but it fully utilizes the CPU:<\/p>\n<p>The example shows the implementation of a spinlock with the help of <span style=\"font-family: courier new,courier;\">std::atomic_flag.<\/span><\/p>\n<p>&nbsp;<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ spinLock.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Spinlock<\/span>{\r\n  std::atomic_flag flag;\r\npublic:\r\n  Spinlock(): flag(ATOMIC_FLAG_INIT) {}\r\n\r\n  <span style=\"color: #2b91af;\">void<\/span> lock(){\r\n    <span style=\"color: #0000ff;\">while<\/span>( flag.test_and_set() );\r\n  }\r\n\r\n  <span style=\"color: #2b91af;\">void<\/span> unlock(){\r\n    flag.clear();\r\n  }\r\n};\r\n\r\nSpinlock spin;\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> workOnResource(){\r\n  spin.lock();\r\n  <span style=\"color: #008000;\">\/\/ shared resource<\/span>\r\n  spin.unlock();\r\n}\r\n\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t(workOnResource);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t2(workOnResource);\r\n\r\n  t.join();\r\n  t2.join();\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Threads t and t2 (lines 31 and 32) fight for the critical section. For simplicity reasons, the critical section in line 24 consists only of a comment. How does it work? The class <span style=\"font-family: courier new,courier;\">Spinlock<\/span> has &#8211; similar to a mutex &#8211; the two methods <span style=\"font-family: courier new,courier;\">lock<\/span> and <span style=\"font-family: courier new,courier;\">unlock.<\/span> In addition, Spinlock&#8217;s constructor initializes in line 9 the <span style=\"font-family: courier new,courier;\">std::atomic_flag<\/span> to <span style=\"font-family: courier new,courier;\">false.<\/span> Two scenarios can happen if thread t wants to execute the function workOnResource.<\/p>\n<p><strong>First,<\/strong> the thread t gets the lock. So the lock invocation was successful. The lock invocation is successful if the initial value of the flag in line 12 is <span style=\"font-family: courier new,courier;\">false.<\/span> In this case, thread t sets it in an atomic operation to <span style=\"font-family: courier new,courier;\">true.<\/span> That value <span style=\"font-family: courier new,courier;\">true<\/span> is the value the while loop returns to the other thread t2 if it tries to get the lock. So thread t2 is caught in the rat race. Thread t2 cannot set the value of the flag to <span style=\"font-family: courier new,courier;\">false.<\/span> So t2 must eagerly wait until thread t1 executes the <span style=\"font-family: courier new,courier;\">unlock<\/span> method and sets the flag to <span style=\"font-family: courier new,courier;\">false<\/span> (lines 15 &#8211; 17).<\/p>\n<p><strong>Second,<\/strong> the thread t don&#8217;t get the lock. So we are in scenario 1 with changed roles.<\/p>\n<p>It&#8217;s very interesting to compare a spinlock&#8217;s active waiting with a mutex&#8217;s passive waiting.<\/p>\n<h2>Spinlock versus Mutex<\/h2>\n<p>What happens to the CPU load if the function <span style=\"font-family: courier new,courier;\">workOnRessoucre<\/span> locks the spinlock for 2 seconds (lines 23 &#8211; 25)?<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ spinLockSleep.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Spinlock<\/span>{\r\n  std::atomic_flag flag;\r\npublic:\r\n  Spinlock(): flag(ATOMIC_FLAG_INIT) {}\r\n\r\n  <span style=\"color: #2b91af;\">void<\/span> lock(){\r\n    <span style=\"color: #0000ff;\">while<\/span>( flag.test_and_set() );\r\n  }\r\n\r\n  <span style=\"color: #2b91af;\">void<\/span> unlock(){\r\n    flag.clear();\r\n  }\r\n};\r\n\r\nSpinlock spin;\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> workOnResource(){\r\n  spin.lock();\r\n  std::this_thread::sleep_for(std::chrono::milliseconds(2000));\r\n  spin.unlock();\r\n}\r\n\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t(workOnResource);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t2(workOnResource);\r\n\r\n  t.join();\r\n  t2.join();\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>If the theory is right, one of the four cores of my PC must be fully utilized, precisely what you can see in the screenshot.<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4788\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/spinLockSleep.png\" alt=\"spinLockSleep\" width=\"628\" height=\"592\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/spinLockSleep.png 628w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/spinLockSleep-300x283.png 300w\" sizes=\"auto, (max-width: 628px) 100vw, 628px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>The screenshot shows nicely that the load of one core gets 100%. But my PC is fair. Each time a different core has to perform the busy waiting.<\/p>\n<p>I use in the next concise program a mutex instead of a spinlock.<\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\nstd::mutex mut;\r\n\r\n<span style=\"color: #2b91af;\">void<\/span> workOnResource(){\r\n  mut.lock();\r\n  std::this_thread::sleep_for(std::chrono::milliseconds(5000));\r\n  mut.unlock();\r\n}\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t(workOnResource);\r\n  std::<span style=\"color: #0000ff;\">thread<\/span> t2(workOnResource);\r\n\r\n  t.join();\r\n  t2.join();\r\n\r\n}\r\n<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Although I execute the program several times, I cannot observe a higher load of the cores.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4789\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/mutex.png\" alt=\"mutex\" width=\"631\" height=\"592\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/mutex.png 631w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/06\/mutex-300x281.png 300w\" sizes=\"auto, (max-width: 631px) 100vw, 631px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>The <a href=\"https:\/\/www.modernescpp.com\/index.php\/the-atomic-boolean\">next post<\/a> will be about the class template <span style=\"font-family: courier new,courier;\">std::atomic<\/span>. The various specializations for bools, integral types, and pointers provide a more powerful interface than <span style=\"font-family: courier new,courier;\">std::atomic_flag.<\/span>(<strong>Proofreader Alexey Elymanov<\/strong>)<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Atomics guarantee two characteristics. On the one hand, they are atomic, on the other, they provide synchronization and order constraints on the program execution.<\/p>\n","protected":false},"author":21,"featured_media":4788,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[368],"tags":[434,519],"class_list":["post-4790","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-memory-model","tag-atomics","tag-sequential-consistency"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4790","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4790"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4790\/revisions"}],"predecessor-version":[{"id":6970,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4790\/revisions\/6970"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4788"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4790"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4790"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4790"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}