Multithreading with C++17 and C++20

Contents[Show]

Forecasts about the future are difficult. In particular, when they are about C++20. Nevertheless, I will take a look into the crystal ball and will write in the next posts about, what we will get with C++17  and we can hope for with C++20.

 

timelineCpp17andCpp20

Since C++11 C++ faces the requirements of the multicore architectures. The 2011 published standard defines, how a program should behave in the presence of many threads. The multithreading capabilities of C++11 consist of two parts. At on hand, there is the well-defined memory model; at the other hand, there is the standardised threading API.

The well-defined memory model deals with the following questions.

  1. What are atomic operations?
  2. Which sequence of operations is guaranteed?
  3. When are memory effects of operations visible?

The standardised threading interface in C++11 consists of the following components.

  1. Threads
  2. Tasks
  3. Thread-local data
  4. Condition variables

If that is not too boring for you, read the posts about the memory model and the standardised threading API.

Wearing my multithreading glasses, C++14 has not much to offer. C++14 added Reader-Writer Locks.

The questions, which arises, is: What has the C++ future to offer?

 

timelineCpp17andCpp20 1

C++17

With C++17, the most of the algorithm of the Standard Template Library will be available in a parallel version. Therefore, you can invoke an algorithm with a so-called execution policy. This execution policy specifies if the algorithm runs sequential (std::seq), parallel (std::par), or parallel and vectorized (std::par_unseq).


std::vector<int> vec ={3, 2, 1, 4, 5, 6, 10, 8, 9, 4};

std::sort(vec.begin(), vec.end());                            // sequential as ever
std::sort(std::execution::seq, vec.begin(), vec.end());       // sequential
std::sort(std::execution::par, vec.begin(), vec.end());       // parallel
std::sort(std::execution::par_unseq, vec.begin(), vec.end()); // parallel and vectorized

 

Therefore, the first and second variations of the sort algorithm run sequential, the third parallel, and the fourth parallel and vectorized.

C++20 offers totally new multithreading concepts. The key idea is, that multithreading becomes a lot simpler and less error-prone.

C++20

Atomic smart pointer

The atomic smart pointer std::shared_ptr and std::weak_ptr have a conceptional issue in multithreading programs. They share mutable state. Therefore, they a prone to data races and therefore undefined behaviour. std::shared_ptr and std::weak_ ptr guarantee that the in- or decrementing of the reference counter is an atomic operation and the resource will be deleted exactly once, but both does not guarantee that the access to its resource is atomic. The new atomic smart pointers solve this issue.

std::atomic_shared_ptr
std::atomic_weak_ptr

 

With tasks called promises and futures, we got a new multithreading concept in C++11. Although tasks have a lot to offer, they have a big drawback. Futures can not be composed in C++11.

std::future extensions

That will not hold for futures in C++20. Therefore, a future becomes ready, when

  • its predecessor becomes ready:

 then:

future<int> f1= async([]() {return 123;});
future<string> f2 = f1.then([](future<int> f) {     
  return f.get().to_string(); 
});
  • one of its predecessors become ready:

when_any:

future<int> futures[] = {async([]() { return intResult(125); }),                          
                         async([]() { return intResult(456); })};
future<vector<future<int>>> any_f = when_any(begin(futures),end(futures));
  • all of its predecessors become ready:

when_all:

future<int> futures[] = {async([]() { return intResult(125); }),                          
                         async([]() { return intResult(456); })};
future<vector<future<int>>> all_f = when_all(begin(futures), end(futures));

 

C++14 has no semaphores. Semaphores enable it that threads can control access to a common resource. No problem, with C++20 we get latches and barriers.

Latches and barriers

You can use latches and barriers for waiting at a synchronisation point until the counter becomes zero. The difference is, std::latch can only be used once; std::barrier and std::flex_barrier more the once. In contrary to a std::barrier, a std::flex_barrier can adjust its counter after each iteration.

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
void doWork(threadpool* pool){
  latch completion_latch(NUMBER_TASKS);
  for (int i = 0; i < NUMBER_TASKS; ++i){
    pool->add_task([&]{
      // perform the work
      ...
      completion_latch.count_down();
    });
  }
  // block until all tasks are done
  completion_latch.wait();
}

 

The thread running the function doWork is waiting in line 11 until the completion_latch becomes 0. The completion_latch is set to NUMBER_TASKS in line 2 and decremented in line 7.

Coroutines are generalised functions. In contrary to functions, you can suspend and resume the execution of the coroutine while keeping its state.

Coroutines

Coroutines are often the matter of choice to implement cooperative multitasking in operating systems, event loop, infinite lists, or pipelines.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
generator<int> getInts(int first, int last){
  for (auto i= first; i <= last; ++i){
    co_yield i;
  }
}

int main(){
  for (auto i: getInts(5, 10)){
    std::cout << i << " ";                      // 5 6 7 8 9 10
}

 

The function getInts (line 1 - 5) gives back a generator that returns on request a value. The expression co_yield serves two purposes. At first, it returns a new value and at second, it waits until a new value is requested. The range-based for-loop successively requests the values from 5 to 10.

With transaction memory, the well-established idea of transactions will be applied in software.

Transactional memory

Transactional memory idea is based on transactions from the database theory. A transaction is an action which provides the properties Atomicity, Consistency, Isolation and Durability (ACID). Accept for durability, all properties will hold for transactional memory in C++. C++ will have transactional memory in two flavours. One is called synchronised blocks and the other atomic blocks. Both have in common, that they will be executed in total order and behave as they were protected by a global lock. In contrary to synchronised blocks, atomic blocks can not execute transaction-unsafe code.

Therefore, you can invoke std::cout in a synchronised block but not in an atomic block.

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
int func() { 
  static int i = 0; 
  synchronized{ 
    std::cout << "Not interleaved \n"; 
    ++i; 
    return i;  
  } 
}
 
int main(){
  std::vector<std::thread> v(10); 
  for(auto& t: v) 
    t = std::thread([]{ for(int n = 0; n < 10; ++n) func(); });
} 

 

The synchronized keyword in line 3 guarantees that the execution of the synchronised block (line 3 - 7) will not overlap. That means in particular that there is a single, total order between all synchronised blocks. To say it the other way around. The end of each synchronised block synchronizes-with the start of the next synchronised block.

 

Although I called this post Multithreading in C++17 and C++20, we get with task blocks beside of the parallel STL more parallel features in C++.

Task blocks

Task Blocks implement the fork-join paradigm. The graphic shows the key idea.

 ForkJoin

By using run in a task block you can fork new tasks that will be joined at the end of the task block.

 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
template <typename Func> 
int traverse(node& n, Func && f){ 
    int left = 0, right = 0; 
    define_task_block(                 
        [&](task_block& tb){ 
            if (n.left) tb.run([&]{ left = traverse(*n.left, f); }); 
            if (n.right) tb.run([&]{ right = traverse(*n.right, f); });
         }
    );                                                         
    return f(n) + left + right; 
} 

 

traverse is a function template that invokes the function Func on each node of its tree. The expression define_task_block defines the task block. In this region, you have a task block tb to your disposal to start new tasks. Exactly that is happening in the left and right branch of the tree (line 6 and 7). Line 9 is the end of the task block and therefore the synchronisation point.

What's next?

After I have given the overview to the new multithreading features in C++17 and C++20, I will provide the details in the next posts. I will start with the parallel STL. I'm quite sure that my post has left more question open than answered.

 

 

 

 

 

 

 

 

 

 

 

title page smalltitle page small Go to Leanpub/cpplibrary "What every professional C++ programmer should know about the C++ standard library".   Get your e-book. Support my blog.

 

Tags: C++20, C++17

Comments   

0 #1 Andrew 2017-02-23 19:24
Can you reccomend a compiler that currently supports std::execution(::par,::par_unseq) specifiers? Thanks in advance.
Quote

Add comment


My Newest E-Book

Latest comments

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 346

All 278869

Currently are 146 guests and no members online