async

Asynchronous Function Calls

std:.async feels like an asynchronous function call. Under the hood std::async is a task. One, which is extremely easy to use.

std::async

std::async gets a callable as a work package. In this example, it’s a function, a function object, or a lambda function. 

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// async.cpp

#include <future>
#include <iostream>
#include <string>

std::string helloFunction(const std::string& s){
  return "Hello C++11 from " + s + ".";
}

class HelloFunctionObject{
  public:
    std::string operator()(const std::string& s) const {
      return "Hello C++11 from " + s + ".";
    }
};

int main(){

  std::cout << std::endl;

  // future with function
  auto futureFunction= std::async(helloFunction,"function");

  // future with function object
  HelloFunctionObject helloFunctionObject;
  auto futureFunctionObject= std::async(helloFunctionObject,"function object");

  // future with lambda function
  auto futureLambda= std::async([](const std::string& s ){return "Hello C++11 from " + s + ".";},"lambda function");

  std::cout << futureFunction.get() << "\n" 
	    << futureFunctionObject.get() << "\n" 
	    << futureLambda.get() << std::endl;

  std::cout << std::endl;

}

 

The program execution is not so exciting.

 async

 

Rainer D 6 P2 500x500Modernes C++ Mentoring

Be part of my mentoring programs:

  • "Fundamentals for C++ Professionals" (open)
  • "Design Patterns and Architectural Patterns with C++" (open)
  • "C++20: Get the Details" (open)
  • "Concurrency with Modern C++" (starts March 2024)
  • Do you want to stay informed: Subscribe.

     

    The future gets a function (line 23), a function object (line 27), and a lambda function (line 30). In the end, each future requests its value (line 32).

    And again, a little bit more formal. The std::async calls in lines 23, 27, and 30 create a data channel between the two endpoints’ future and promise. The promise immediately starts to execute its work package. But that is only the default behavior. By the get call, the future requests the result of Its work packages.

    Eager or lazy evaluation

    Eager or lazy evaluation are two orthogonal strategies to calculate the result of an expression. In the case of eager evaluation, the expression will immediately be evaluated; in the case of lazy evaluation, the expression will only be evaluated if needed. Often lazy evaluation is called call-by-need. With lazy evaluation, you save time and compute power because there is no evaluation on suspicion. An expression can be a mathematical calculation, a function, or a std::async call. 

    By default, std::async executed its work package immediately. The C++ runtime decides if the calculation happens in the same or a new thread. With the flag std::launch::async std::async will run its work package in a new thread. In opposition to that, the flag std::launch::deferred expresses that std::async runs in the same thread. The execution is, in this case, lazy. That implies that the eager evaluations start immediately, but the lazy evaluation with the policy std::launch::deferred starts when the future asks for the value with its get call. 

    The program shows different behavior.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    // asyncLazy.cpp
    
    #include <chrono>
    #include <future>
    #include <iostream>
    
    int main(){
    
      std::cout << std::endl;
    
      auto begin= std::chrono::system_clock::now();
    
      auto asyncLazy=std::async(std::launch::deferred,[]{ return  std::chrono::system_clock::now();});
    
      auto asyncEager=std::async( std::launch::async,[]{ return  std::chrono::system_clock::now();});
    
      std::this_thread::sleep_for(std::chrono::seconds(1));
    
      auto lazyStart= asyncLazy.get() - begin;
      auto eagerStart= asyncEager.get() - begin;
    
      auto lazyDuration= std::chrono::duration<double>(lazyStart).count();
      auto eagerDuration=  std::chrono::duration<double>(eagerStart).count();
    
      std::cout << "asyncLazy evaluated after : " << lazyDuration << " seconds." << std::endl;
      std::cout << "asyncEager evaluated after: " << eagerDuration << " seconds." << std::endl;
    
      std::cout << std::endl;
    
    }
    

     

    Both std::async calls (lines 13 and 15) return the current time point. But the first call is lazy, the second greedy. The short sleep of one second in line 17 makes that obvious. By the call asyncLazy.get() in line 19, the result will be available after a short nap.  This is not true for asyncEager. asyncEager.get() gets the result from the immediately executed work package.

    asyncLazy

    A bigger computing job

    std::async is quite convenient to put a bigger compute job on more shoulders. So, the scalar product is calculated in the program with four asynchronous function calls.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    // dotProductAsync.cpp
    
    #include <chrono>
    #include <iostream>
    #include <future>
    #include <random>
    #include <vector>
    #include <numeric>
    
    static const int NUM= 100000000;
    
    long long getDotProduct(std::vector<int>& v, std::vector<int>& w){
    
      auto future1= std::async([&]{return std::inner_product(&v[0],&v[v.size()/4],&w[0],0LL);});
      auto future2= std::async([&]{return std::inner_product(&v[v.size()/4],&v[v.size()/2],&w[v.size()/4],0LL);});
      auto future3= std::async([&]{return std::inner_product(&v[v.size()/2],&v[v.size()*3/4],&w[v.size()/2],0LL);});
      auto future4= std::async([&]{return std::inner_product(&v[v.size()*3/4],&v[v.size()],&w[v.size()*3/4],0LL);});
    
      return future1.get() + future2.get() + future3.get() + future4.get();
    }
    
    
    int main(){
    
      std::cout << std::endl;
    
      // get NUM random numbers from 0 .. 100
      std::random_device seed;
    
      // generator
      std::mt19937 engine(seed());
    
      // distribution
      std::uniform_int_distribution<int> dist(0,100);
    
      // fill the vectors
      std::vector<int> v, w;
      v.reserve(NUM);
      w.reserve(NUM);
      for (int i=0; i< NUM; ++i){
        v.push_back(dist(engine));
        w.push_back(dist(engine));
      }
    
      // measure the execution time
      std::chrono::system_clock::time_point start = std::chrono::system_clock::now();
      std::cout << "getDotProduct(v,w): " << getDotProduct(v,w) << std::endl;
      std::chrono::duration<double> dur  = std::chrono::system_clock::now() - start;
      std::cout << "Parallel Execution: "<< dur.count() << std::endl;
    
      std::cout << std::endl;
    
    }
    

     

    The program uses the functionality of the random and time library. Both libraries are part of C++11. The vectors v and w are created and filled with a random number in lines 27 – 43.  Each vector gets (lines 40 – 43) a hundred million elements. dist(engine) in lines 41 and 42 generated random numbers uniformly distributed from 0 to 100. The current calculation of the scalar product takes place in the function getDotProduct (lines 12 – 20). std::async uses the standard template library algorithm std::inner_product internally. The return statement sums up the results of the futures.

    It takes about 0.4 seconds to calculate the result on my PC.

    dotProductAsync

    But now the question is. How fast is the program if I executed it on one core? A slight modification of the function getDotProduct, and we know the truth.


    long
    long getDotProduct(std::vector<int>& v,std::vector<int>& w){ return std::inner_product(v.begin(),v.end(),w.begin(),0LL); }

     

    The execution of the program is four times slower.

     

    dotProduct

    Optimization

    But, if I compile the program with maximal optimization level O3 with my GCC, the performance difference is nearly gone. The parallel execution is about 10 percent faster.

     

    dotProductComparisonOptimization

    What’s next?

    In the next post, I will show you how to parallelize a big compute job by using std::packaged_task. (Proofreader Alexey Elymanov)

     

     

     

     

     

     

    Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, moon, Philipp Lenk, Hobsbawm, and Charles-Jianye Chen.

    Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

    My special thanks to Embarcadero
    My special thanks to PVS-Studio
    My special thanks to Tipi.build 
    My special thanks to Take Up Code
    My special thanks to SHAVEDYAKS

    Seminars

    I’m happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

    Standard Seminars (English/German)

    Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

    • C++ – The Core Language
    • C++ – The Standard Library
    • C++ – Compact
    • C++11 and C++14
    • Concurrency with Modern C++
    • Design Pattern and Architectural Pattern with C++
    • Embedded Programming with Modern C++
    • Generic Programming (Templates) with C++
    • Clean Code with Modern C++
    • C++20

    Online Seminars (German)

    Contact Me

    Modernes C++ Mentoring,

     

     

    0 replies

    Leave a Reply

    Want to join the discussion?
    Feel free to contribute!

    Leave a Reply

    Your email address will not be published. Required fields are marked *