{"id":4885,"date":"2016-08-23T05:49:48","date_gmt":"2016-08-23T05:49:48","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-a-singleton\/"},"modified":"2023-06-26T12:44:42","modified_gmt":"2023-06-26T12:44:42","slug":"thread-safe-initialization-of-a-singleton","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-a-singleton\/","title":{"rendered":"Thread-Safe Initialization of a Singleton"},"content":{"rendered":"<p>There are a lot of issues with the singleton pattern. I&#8217;m aware of that. But the singleton pattern is an ideal use case for a variable, which can only be initialized in a thread-safe way. From that point on, you can use it without synchronization. So in this post, I discuss different ways to initialize a singleton in a multithreading environment. You get the performance numbers and can reason about your use cases for the thread-safe initialization of a variable.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p>There are many different ways to initialize a singleton in C++11 in a thread-safe way. From a birds-eye, you can have guarantees from the C++ runtime, locks, or atomics.&nbsp;I&#8217;m totally curious about the performance implications.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"My_strategy\"><\/span>My strategy<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>As a reference point for my performance measurement, I use a singleton object which I sequential access 40 million times. The first access will initialize the object. In contrast, the access from the multithreading program will be done by four threads. Here I&#8217;m only interested in the performance. The program will run on two real PCs. My Linux PC has four; my Windows PC has two cores. I compile the program with maximum and without optimization. To translate the program with maximum optimization, I have to use a <span style=\"font-family: courier new,courier;\">volatile<\/span> variable in the static method <span style=\"font-family: courier new,courier;\">getInstance<\/span>. If not, the compiler will optimize away my access to the singleton, and my program becomes too fast.<\/p>\n<p>I have three questions in my mind:<\/p>\n<ol>\n<li>What is the relative performance of the different singleton implementations?<\/li>\n<li>Is there a significant difference between Linux (GCC) and Windows (cl.exe)?<\/li>\n<li>What&#8217;s the difference between the optimized and non-optimized versions?<\/li>\n<\/ol>\n<p>Finally, I collect all numbers in a table. The numbers are in seconds.<\/p>\n<\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_reference_values\"><\/span>The reference values<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<h3><span class=\"ez-toc-section\" id=\"The_both_compilers\"><\/span>The both compilers<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The command line gives you the details of the compiler Here are the gcc and the cl.exe.<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4858\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/gcc.png\" alt=\"gcc\" width=\"400\" height=\"247\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/gcc.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/gcc-300x185.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4859\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/cl_exe.PNG\" alt=\"cl exe\" width=\"500\" height=\"132\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/cl_exe.PNG 1235w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/cl_exe-300x79.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/cl_exe-1024x270.png 1024w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/cl_exe-768x203.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h3><span class=\"ez-toc-section\" id=\"The_reference_code\"><\/span>The reference code<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>At first, the single-threaded case. Of course, without synchronization.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonSingleThreaded.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton&amp; getInstance(){\r\n    <span style=\"color: #0000ff;\">static<\/span> MySingleton instance;\r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> instance;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  \r\n};\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n    \r\n  constexpr <span style=\"color: #0000ff;\">auto<\/span> fourtyMill= 4* tenMill;\r\n  \r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  \r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= fourtyMill; ++i){\r\n       MySingleton::getInstance();\r\n  }\r\n  \r\n  <span style=\"color: #0000ff;\">auto<\/span> end= std::chrono::system_clock::now() - begin;\r\n  \r\n  std::cout &lt;&lt; std::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt;(end).count() &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I use in the reference implementation the so-called Meyers Singleton. The elegance of this implementation is that the singleton object <span style=\"font-family: courier new,courier;\">instance<\/span> in line 11 is a static variable with block scope. Therefore, <span style=\"font-family: courier new,courier;\">instance<\/span> will exactly be initialized when the static method <span style=\"font-family: courier new,courier;\">getInstance<\/span> (lines 10 &#8211; 14) will be executed the first time. In line 14, the <span style=\"font-family: courier new,courier;\">volatile<\/span> variable <span style=\"font-family: courier new,courier;\">dummy<\/span> is commented out. When I translate the program with maximum optimization, that has to change, so the call <span style=\"font-family: courier new,courier;\">MySingleton::getInstance()<\/span> will not be optimized away.&nbsp;<\/p>\n<p>Now the raw numbers on Linux and Windows.<\/p>\n<h4>Without optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4860\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded.png\" alt=\"singletonSingleThreaded\" style=\"margin: 15px;\" width=\"411\" height=\"146\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded.png 411w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded-300x107.png 300w\" sizes=\"auto, (max-width: 411px) 100vw, 411px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4861\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win.png\" alt=\"singletonSingleThreaded win\" width=\"500\" height=\"117\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win.png 912w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win-300x70.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win-768x180.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h4>Maximum Optimization<\/h4>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4862\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_opt.png\" alt=\"singletonSingleThreaded opt\" width=\"500\" height=\"114\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_opt.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_opt-300x68.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4863\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win_opt.png\" alt=\"singletonSingleThreaded win opt\" width=\"400\" height=\"108\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win_opt.png 691w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSingleThreaded_win_opt-300x81.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Guarantees_of_the_C_runtime\"><\/span>Guarantees of the C++ runtime<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I already presented the details to the thread-safe initialization of variables in the post <a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-data\">Thread-safe initialization of data. <\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Meyers_Singleton\"><\/span>Meyers Singleton<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The beauty of the Meyers Singleton in C++11 is that it&#8217;s automatically thread-safe. That is guaranteed by the standard: <a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-data\">Static variables with block scope. <\/a>The Meyers Singleton is a static variable with block scope, so we are done. It&#8217;s still left to rewrite the program for four threads.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonMeyers.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;future&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton&amp; getInstance(){\r\n    <span style=\"color: #0000ff;\">static<\/span> MySingleton instance;\r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> instance;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n\r\n};\r\n\r\nstd::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; getTime(){\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= tenMill; ++i){\r\n      MySingleton::getInstance();\r\n  }\r\n  <span style=\"color: #0000ff;\">return<\/span> std::chrono::system_clock::now() - begin;\r\n  \r\n};\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n \r\n    <span style=\"color: #0000ff;\">auto<\/span> fut1= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut2= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut3= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut4= std::async(std::launch::async,getTime);\r\n    \r\n    <span style=\"color: #0000ff;\">auto<\/span> total= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n    \r\n    std::cout &lt;&lt; total.count() &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>I use the singleton object in the function <span style=\"font-family: courier new,courier;\">getTime<\/span> (lines 24 &#8211; 32). The function is executed by the four <a href=\"https:\/\/www.modernescpp.com\/index.php\/asynchronous-function-calls\"><span style=\"font-family: courier new,courier;\">promise<\/span><\/a> in lines 36 &#8211; 39. The results of the associate <a href=\"https:\/\/www.modernescpp.com\/index.php\/asynchronous-function-calls\"><span style=\"font-family: courier new,courier;\">futures<\/span><\/a> are summed up in line 41. That&#8217;s all. Only the execution time is missing.<\/p>\n<h4>Without optimization<\/h4>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4864\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers.png\" alt=\"singletonMeyers\" width=\"400\" height=\"142\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers.png 411w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers-300x107.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4865\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win.png\" alt=\"singletonMeyers win\" width=\"500\" height=\"117\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win.png 912w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win-300x70.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win-768x180.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h4>Maximum optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4866\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_opt.png\" alt=\"singletonMeyers opt\" width=\"500\" height=\"113\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_opt.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_opt-300x68.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4867\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win_opt.png\" alt=\"singletonMeyers win opt\" width=\"400\" height=\"108\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win_opt.png 691w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonMeyers_win_opt-300x81.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>The next step is the function <span style=\"font-family: courier new,courier;\">std::call_once<\/span> in combination with the flag <span style=\"font-family: courier new,courier;\">std::once_flag.<\/span><span style=\"font-family: courier new,courier;\"><\/span><span style=\"font-family: courier new,courier;\"><br \/><\/span><\/p>\n<h2><span class=\"ez-toc-section\" id=\"The_function_std_call_once_and_the_flag_std_once_flag\"><\/span>The function std::call_once and the flag std::once_flag<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>You can use the function <span style=\"font-family: courier new,courier;\"><a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-data\">std::call_once<\/a> <\/span>to register a callable executed exactly once. The flag <span style=\"font-family: courier new,courier;\">std::call_once<\/span> in the following implementation guarantees that the singleton will be thread-safe initialized.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonCallOnce.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;future&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton&amp; getInstance(){\r\n    std::call_once(initInstanceFlag, &amp;MySingleton::initSingleton);\r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> *instance;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton* instance;\r\n  <span style=\"color: #0000ff;\">static<\/span> std::once_flag initInstanceFlag;\r\n\r\n  <span style=\"color: #0000ff;\">static<\/span> <span style=\"color: #2b91af;\">void<\/span> initSingleton(){\r\n    instance= <span style=\"color: #0000ff;\">new<\/span> MySingleton;\r\n  }\r\n};\r\n\r\nMySingleton* MySingleton::instance= nullptr;\r\nstd::once_flag MySingleton::initInstanceFlag;\r\n\r\nstd::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; getTime(){\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= tenMill; ++i){\r\n      MySingleton::getInstance();\r\n  }\r\n  <span style=\"color: #0000ff;\">return<\/span> std::chrono::system_clock::now() - begin;\r\n  \r\n};\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut1= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut2= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut3= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut4= std::async(std::launch::async,getTime);\r\n    \r\n    <span style=\"color: #0000ff;\">auto<\/span> total= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n    \r\n    std::cout &lt;&lt; total.count() &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Here are the numbers.<\/p>\n<h4>Without optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4868\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce.png\" alt=\"singletoCallOnce\" width=\"400\" height=\"132\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce.png 471w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce-300x99.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4869\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win.png\" alt=\"singletoCallOnce win\" width=\"500\" height=\"117\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win.png 795w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win-300x70.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win-768x180.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h4>Maximum optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4870\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_opt.png\" alt=\"singletoCallOnce opt\" width=\"500\" height=\"113\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_opt.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_opt-300x68.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4871\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win_opt.png\" alt=\"singletoCallOnce win opt\" width=\"400\" height=\"108\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win_opt.png 691w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletoCallOnce_win_opt-300x81.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>Of course, the most obvious way is it protects the singleton with a lock.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Lock\"><\/span>Lock<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>The mutex wrapped in a <a href=\"https:\/\/www.modernescpp.com\/index.php\/prefer-locks-to-mutexes\">lock<\/a> guarantees that the singleton will be thread-safe initialized.<\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonLock.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;chrono&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;future&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\nstd::mutex myMutex;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton&amp; getInstance(){\r\n    std::lock_guard&lt;std::mutex&gt; myLock(myMutex);\r\n    <span style=\"color: #0000ff;\">if<\/span> ( !instance ){\r\n        instance= <span style=\"color: #0000ff;\">new<\/span> MySingleton();\r\n    }\r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> *instance;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton* instance;\r\n};\r\n\r\n\r\nMySingleton* MySingleton::instance= nullptr;\r\n\r\nstd::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; getTime(){\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= tenMill; ++i){\r\n       MySingleton::getInstance();\r\n  }\r\n  <span style=\"color: #0000ff;\">return<\/span> std::chrono::system_clock::now() - begin;\r\n  \r\n};\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n  \r\n    <span style=\"color: #0000ff;\">auto<\/span> fut1= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut2= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut3= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut4= std::async(std::launch::async,getTime);\r\n    \r\n    <span style=\"color: #0000ff;\">auto<\/span> total= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n    \r\n    std::cout &lt;&lt; total.count() &lt;&lt; std::endl;\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>How fast is the classical thread-safe implementation of the singleton pattern?<\/p>\n<h4>Without optimization<\/h4>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4872\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock.png\" alt=\"singletonLock\" width=\"500\" height=\"139\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock.png 549w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock-300x84.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4873\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win.png\" alt=\"singletonLock win\" width=\"400\" height=\"140\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win.png 613w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win-300x105.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<h4>Maximum optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4874\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_opt.png\" alt=\"singletonLock opt\" width=\"500\" height=\"139\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_opt.png 549w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_opt-300x84.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4875\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win_opt.png\" alt=\"singletonLock win opt\" width=\"400\" height=\"140\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win_opt.png 613w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonLock_win_opt-300x105.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>Not so fast. Atomics should make a difference.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"Atomic_variables\"><\/span>Atomic variables<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>With atomic variables, my job becomes exceptionally challenging. Now I have to use the <a href=\"https:\/\/www.modernescpp.com\/index.php\/c-memory-model\">C++ memory model<\/a>. I base my implementation on the well-known <a href=\"https:\/\/www.modernescpp.com\/index.php\/thread-safe-initialization-of-data\">double-checked locking pattern.<\/a><\/p>\n<h3><span class=\"ez-toc-section\" id=\"Sequential_consistency\"><\/span>Sequential consistency<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The handle to the singleton is atomic. Because I didn&#8217;t specify the C++ memory model, the default applies <a href=\"https:\/\/www.modernescpp.com\/index.php\/sequential-consistency\">Sequential consistency. <\/a><\/p>\n<p>&nbsp;<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59\r\n60\r\n61\r\n62<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonAcquireRelease.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;future&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton* getInstance(){\r\n    MySingleton* sin= instance.load();\r\n    <span style=\"color: #0000ff;\">if<\/span> ( !sin ){\r\n      std::lock_guard&lt;std::mutex&gt; myLock(myMutex);\r\n      sin= instance.load();\r\n      <span style=\"color: #0000ff;\">if<\/span>( !sin ){\r\n        sin= <span style=\"color: #0000ff;\">new<\/span> MySingleton();\r\n        instance.store(sin);\r\n      }\r\n    }   \r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> sin;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n\r\n  <span style=\"color: #0000ff;\">static<\/span> std::atomic&lt;MySingleton*&gt; instance;\r\n  <span style=\"color: #0000ff;\">static<\/span> std::mutex myMutex;\r\n};\r\n\r\n\r\nstd::atomic&lt;MySingleton*&gt; MySingleton::instance;\r\nstd::mutex MySingleton::myMutex;\r\n\r\nstd::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; getTime(){\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= tenMill; ++i){\r\n       MySingleton::getInstance();\r\n  }\r\n  <span style=\"color: #0000ff;\">return<\/span> std::chrono::system_clock::now() - begin;\r\n  \r\n};\r\n\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut1= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut2= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut3= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut4= std::async(std::launch::async,getTime);\r\n    \r\n    <span style=\"color: #0000ff;\">auto<\/span> total= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n    \r\n    std::cout &lt;&lt; total.count() &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Now I&#8217;m curious.<\/p>\n<h4>Without optimization<\/h4>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4876\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency.png\" alt=\"singletonSequentialConsistency\" width=\"400\" height=\"132\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency.png 471w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency-300x99.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4877\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win.png\" alt=\"singletonSequentialConsistency win\" width=\"500\" height=\"117\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win.png 795w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win-300x70.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win-768x180.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h4>Maximum optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4878\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_opt.png\" alt=\"singletonSequentialConsistency opt\" width=\"500\" height=\"113\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_opt.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_opt-300x68.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4879\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win_opt.png\" alt=\"singletonSequentialConsistency win opt\" width=\"400\" height=\"108\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win_opt.png 691w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonSequentialConsistency_win_opt-300x81.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><\/p>\n<p>&nbsp;<\/p>\n<p>But we can do better. There is an additional optimization possibility.<\/p>\n<h3><span class=\"ez-toc-section\" id=\"Acquire-release_Semantic\"><\/span>Acquire-release Semantic<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p>The reading of the singleton (line 14) is an acquire operation, and the writing is a release operation (line 20). Because both operations occur on the same atomic I don&#8217;t need sequential consistency. The C++ standard guarantees that an acquire operation synchronizes with a release operation on the same atomic. These conditions hold in this case. Therefore, I can weaken the C++ memory model in lines 14 and 20. <a href=\"https:\/\/www.modernescpp.com\/index.php\/acquire-release-semantic\">Acquire-release semantic <\/a>is sufficient.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59\r\n60\r\n61\r\n62<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ singletonAcquireRelease.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;atomic&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;future&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;mutex&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;thread&gt;<\/span>\r\n\r\nconstexpr <span style=\"color: #0000ff;\">auto<\/span> tenMill= 10000000;\r\n\r\n<span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">MySingleton<\/span>{\r\npublic:\r\n  <span style=\"color: #0000ff;\">static<\/span> MySingleton* getInstance(){\r\n    MySingleton* sin= instance.load(std::memory_order_acquire);\r\n    <span style=\"color: #0000ff;\">if<\/span> ( !sin ){\r\n      std::lock_guard&lt;std::mutex&gt; myLock(myMutex);\r\n      sin= instance.load(std::memory_order_relaxed);\r\n      <span style=\"color: #0000ff;\">if<\/span>( !sin ){\r\n        sin= <span style=\"color: #0000ff;\">new<\/span> MySingleton();\r\n        instance.store(sin,std::memory_order_release);\r\n      }\r\n    }   \r\n    <span style=\"color: #008000;\">\/\/ volatile int dummy{};<\/span>\r\n    <span style=\"color: #0000ff;\">return<\/span> sin;\r\n  }\r\nprivate:\r\n  MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  ~MySingleton()= <span style=\"color: #0000ff;\">default<\/span>;\r\n  MySingleton(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n  MySingleton&amp; <span style=\"color: #0000ff;\">operator<\/span>=(<span style=\"color: #0000ff;\">const<\/span> MySingleton&amp;)= <span style=\"color: #0000ff;\">delete<\/span>;\r\n\r\n  <span style=\"color: #0000ff;\">static<\/span> std::atomic&lt;MySingleton*&gt; instance;\r\n  <span style=\"color: #0000ff;\">static<\/span> std::mutex myMutex;\r\n};\r\n\r\n\r\nstd::atomic&lt;MySingleton*&gt; MySingleton::instance;\r\nstd::mutex MySingleton::myMutex;\r\n\r\nstd::chrono::duration&lt;<span style=\"color: #2b91af;\">double<\/span>&gt; getTime(){\r\n\r\n  <span style=\"color: #0000ff;\">auto<\/span> begin= std::chrono::system_clock::now();\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">size_t<\/span> i= 0; i &lt;= tenMill; ++i){\r\n       MySingleton::getInstance();\r\n  }\r\n  <span style=\"color: #0000ff;\">return<\/span> std::chrono::system_clock::now() - begin;\r\n  \r\n};\r\n\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut1= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut2= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut3= std::async(std::launch::async,getTime);\r\n    <span style=\"color: #0000ff;\">auto<\/span> fut4= std::async(std::launch::async,getTime);\r\n    \r\n    <span style=\"color: #0000ff;\">auto<\/span> total= fut1.get() + fut2.get() + fut3.get() + fut4.get();\r\n    \r\n    std::cout &lt;&lt; total.count() &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>&nbsp;<\/p>\n<p>The acquire-release semantic has a similar performance as the sequential consistency. That&#8217;s not surprising because on x86, both memory models are very similar. We would get different numbers on an ARMv7 or PowerPC architecture. You can read the details on Jeff Preshing&#8217;s blog <a href=\"http:\/\/preshing.com\/\">Preshing on Programming<\/a>.<\/p>\n<h4>Without optimization<\/h4>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4880\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease.png\" alt=\"singletonAcquireRelease\" width=\"400\" height=\"132\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease.png 471w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease-300x99.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4881\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win.png\" alt=\"singletonAcquireRelease win\" width=\"500\" height=\"117\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win.png 795w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win-300x70.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win-768x180.png 768w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><\/p>\n<h4>Maximum optimization<\/h4>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4882\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_opt.png\" alt=\"singletonAcquireRelease opt\" width=\"500\" height=\"113\" style=\"margin: 15px;\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_opt.png 640w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_opt-300x68.png 300w\" sizes=\"auto, (max-width: 500px) 100vw, 500px\" \/><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4883\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win_opt.png\" alt=\"singletonAcquireRelease win opt\" width=\"400\" height=\"108\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win_opt.png 691w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/singletonAcquireRelease_win_opt-300x81.png 300w\" sizes=\"auto, (max-width: 400px) 100vw, 400px\" \/>.<\/p>\n<p>If I forget an import variant of the thread-safe singleton pattern, please let me know and send me the code. I will measure it and add the numbers to the comparison.<\/p>\n<h2><span class=\"ez-toc-section\" id=\"All_numbers_at_one_glance\"><\/span>All numbers at one glance<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>Don&#8217;t take the numbers too seriously. I executed each program only once, and the executable is optimized for four cores on my two-core windows PC. But the numbers give a clear indication. The Meyers Singleton is the easiest to get and the fastest one. In particular, the lock-based implementation is by far the slowest one. The numbers are independent of the used platform.<\/p>\n<p>But the numbers show more. Optimization counts. This statement is not totally accurate for the&nbsp;<span style=\"font-family: courier new,courier;\">std::lock_guard<\/span> based implementation of the singleton pattern.<\/p>\n<p>&nbsp;<\/p>\n<p><img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-4884\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/comparisonSingletonEng.png\" alt=\"comparisonSingletonEng\" width=\"800\" height=\"204\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/comparisonSingletonEng.png 911w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/comparisonSingletonEng-300x76.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/08\/comparisonSingletonEng-768x196.png 768w\" sizes=\"auto, (max-width: 800px) 100vw, 800px\" \/><\/p>\n<h2><span class=\"ez-toc-section\" id=\"Whats_next\"><\/span>What&#8217;s next?<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p>I&#8217;m not so sure. This post is a translation of a german post I wrote half a year ago. My German post gets a lot of reaction. I&#8217;m not sure what will happen this time. A few days&#8217; later, I&#8217;m sure. The <a href=\"https:\/\/www.modernescpp.com\/index.php\/single-threaded-sum-of-the-elements-of-a-vector\">next post<\/a> will be about adding the elements of a vector. First, it takes in one thread.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>There are a lot of issues with the singleton pattern. I&#8217;m aware of that. But the singleton pattern is an ideal use case for a variable, which can only be initialized in a thread-safe way. From that point on, you can use it without synchronization. So in this post, I discuss different ways to initialize [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":4858,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[369],"tags":[505,434,430,433,470,519,404,520],"class_list":["post-4885","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-multithreading-application","tag-acquire-release-semantic","tag-atomics","tag-lock","tag-mutex","tag-performance","tag-sequential-consistency","tag-singleton","tag-static"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4885","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=4885"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4885\/revisions"}],"predecessor-version":[{"id":6952,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/4885\/revisions\/6952"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/4858"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=4885"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=4885"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=4885"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}