With the acquire-releae semantic, we break the sequential consistency. In the acquire-release semantic the synchronization takes place between atomic operations on the same atomic and not between threads.
The acquire-release semantic is more lightweight and therefore faster than the sequential consistency, because the synchronization only takes place between atomic operations. But although the intellectual challenge increases.
std::cout << y.load(std::memory_order_acquire) << " ";
std::cout << x.load(std::memory_order_relaxed) << std::endl;
On the first glance you will notice, that all operations are atomic. So the program is well defined. But the second glance shows more. The atomic operations on y are attached with the flag std::memory_order_release (line 12) and std::memory_order_acquire (line 16). In opposite to that, the atomic operations on x are annotated with std::memory_order_relaxed. So there is no synchronization and ordering constraints for x. The key for the possible values for x and y can only be answered by y.
- y.store(11,std::memory_order_release) synchronizes-with y.load(std::memory_order_acquire)
- x.store(2000,std::memory_order_relaxed is visible before y.store(11,std::memory_order_release)
- y.load(std::memory_order_acquire) is visible before x.load(std::memory_order_relaxed)
I will elaborate a little bit more on these three statements. The key idea is, that the store of y in line 10 synchronizes with the load of y in line 16. The reason is, that the operations takes place on the same atomic and they follow the acquire-release semantic. So y uses std::memory_order_release in line 12 and std::memory_order_acquire in line 16. But the pairwise operations on y have another very interesting property. They establish a kind of barrier relative to y. So x.store(2000,std::memory_order_relaxed) can not be executed after y.store(std::memory_order_release), so x.load() can not be executed before y.load().
The reasoning was in the case of the acquire-release semantic a more sophisticated than in the case of the sequential consistency. But the possible values for x and y are the same. Only the combination y == 11 and x == 0 is no possible.
There are three different interleavings of the threads possible, which produces in the three different combinations of the values x and y.
- thread1 will be executed before thread2.
- thread2 will be executed before thread1.
- thread1 executes x.store(2000), before thread2 will be exectued.
At the end the table.
At first, the program once more with CppMem.
atomic_int x= 0;
atomic_int y= 0;
We already know, all results except of (y=11, x=0) are possible.
Have a look at the three graphs, with the consistent execution. The graphs show, that there is an acquire-release semantic between the store-release of y and the load-acquire from y. It makes no difference, if the reading of y (rf) takes places in the main thread or in a separate thread. The graphs show the synchronizes-with relation with a sw arrow.
Execution for (y=0, x= 0)
Execution for (y= 0, x= 2000)
Execution for (y=11, x= 2000)
But we can do better. Why should x be an atomic? There is no reason. That was my first, but incorrect assumption. Why? You will read in the next post.
Go to Leanpub/cpplibrary "What every professional C++ programmer should know about the C++ standard library". Get your e-book. Support my blog.