Ongoing Optimization: Acquire-Release Semantics with CppMem

With the acquire-release semantics, we break the sequential consistency. In the acquire-release semantics, synchronization occurs between atomic operations on the same atomic and not between threads.

Acquire-release semantic

The acquire-release semantic is more lightweight and, therefore, faster than the sequential consistency because the synchronization only occurs between atomic operations. But although the intellectual challenge increases.

// ongoingOptimizationAcquireRelease.cpp

#include <atomic>
#include <iostream>
#include <thread>

std::atomic<int> x{0};
std::atomic<int> y{0};

void writing(){  
  x.store(2000,std::memory_order_relaxed);  
  y.store(11,std::memory_order_release);
}

void reading(){  
  std::cout << y.load(std::memory_order_acquire) << " ";  
  std::cout << x.load(std::memory_order_relaxed) << std::endl;
}

int main(){
  std::thread thread1(writing);
  std::thread thread2(reading);
  thread1.join();
  thread2.join();
};

At first glance, you will notice that all operations are atomic. So the program is well-defined. But the second glance shows more. The atomic operations on y are attached with the flags std::memory_order_release (line 12) and std::memory_order_acquire (line 16). In contrast, the atomic operations on x are annotated with std::memory_order_relaxed. So there are no synchronization and ordering constraints for x. The key for the possible values for x and y can only be answered by y.

It holds:

y.store(11,std::memory_order_release) synchronizes-with y.load(std::memory_order_acquire)
x.store(2000,std::memory_order_relaxed is visible before y.store(11,std::memory_order_release)
y.load(std::memory_order_acquire) is visible before x.load(std::memory_order_relaxed)

I will elaborate a little bit more on these three statements. The key idea is that the store of y in line 10 synchronizes with the load of y in line 16. The reason is that the operations occur on the same atomic and follow the acquire-release semantic. So y uses std::memory_order_release in line 12 and std::memory_order_acquire in line 16. But the pairwise operations on y have another very interesting property. They establish a kind of barrier relative to y. So x.store(2000,std::memory_order_relaxed) can not be executed after y.store(std::memory_order_release), so x.load() can not be executed before y.load().

The reasoning was in the case of the acquire-release semantic more sophisticated than in the case of the sequential consistency. But the possible values for x and y are the same. Only the combination y == 11 and x == 0 is no possible.

Three different interleavings of the threads are possible, producing the three different combinations of x and y.

thread1 will be executed before thread2.
thread2 will be executed before thread1.
thread1 executes x.store(2000), before thread2 will be exectued.

At the end the table.

Modernes C++ Mentoring

"Fundamentals for C++ Professionals" (open)

"Design Patterns and Architectural Patterns with C++" (open)

"C++20: Get the Details" (open)

"Concurrency with Modern C++" (open)

"Embedded Programming with Modern C++": (open)

"Generic Programming (Templates) with C++": (open)

"Clean Code: Best Practices for Modern C++": September 2025

Do you want to stay informed: Subscribe.

CppMem

At first, the program once more with CppMem.

int main(){
  atomic_int x= 0; 
  atomic_int y= 0;
  {{{ { 
      x.store(2000,memory_order_relaxed);
      y.store(11,memory_order_release);
      }
  ||| {
      y.load(memory_order_acquire);
      x.load(memory_order_relaxed);
      } 
  }}}
}

We already know all results except of (y=11, x=0) are possible.

Possible executions

Have a look at the three graphs with consistent execution. The graphs show an acquire-release semantics between the store-release of y and the load-acquire from y. It makes no difference if the reading of y (rf) occurs in the main thread or a separate thread. The graphs show the synchronizes-with relation with an sw arrow.

Execution for (y=0, x= 0)

Execution for (y= 0, x= 2000)

Execution for (y=11, x= 2000)

What’s next?

But we can do better. Why should x be atomic? There is no reason. That was my first but incorrect assumption. Why? You will read it in the next post.

Post Views: 10,454

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, Philipp Lenk, Charles-Jianye Chen, Keith Jeffery, Matt Godbolt, Honey Sukesan, bruce_lee_wayne, Silviu Ardelean, schnapper79, Seeker, and Sundareswaran Senthilvel.

Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

My special thanks to Embarcadero
My special thanks to PVS-Studio
My special thanks to Tipi.build
My special thanks to Take Up Code
My special thanks to SHAVEDYAKS