{"id":10386,"date":"2024-12-02T10:30:52","date_gmt":"2024-12-02T10:30:52","guid":{"rendered":"https:\/\/www.modernescpp.com\/?p=10386"},"modified":"2025-07-04T15:46:41","modified_gmt":"2025-07-04T15:46:41","slug":"stdexecution-inclusive-scan","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/stdexecution-inclusive-scan\/","title":{"rendered":"std::execution: Inclusive Scan"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Inclusive scan solves problems related to range queries, such as calculating the sum of a range of elements in an array. It is also used in range minimum queries and various other algorithms.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"711\" height=\"491\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/Timeexecution-1.png\" alt=\"\" class=\"wp-image-10389\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/Timeexecution-1.png 711w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/Timeexecution-1-300x207.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/Timeexecution-1-705x487.png 705w\" sizes=\"auto, (max-width: 711px) 100vw, 711px\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Before I present the asynchronous inclusive scan, I introduce the inclusive scan, aka prefix sum.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Prefix Sum<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/en.wikipedia.org\/wiki\/Computer_science\">computer science<\/a>, the <strong>prefix sum<\/strong>, <strong>cumulative sum<\/strong>, <strong>inclusive scan<\/strong>, or simply <strong>scan<\/strong> of a sequence of numbers <em>x<\/em><sub>0<\/sub>, <em>x<\/em><sub>1<\/sub>, <em>x<\/em><sub>2<\/sub>, &#8230; is a second sequence of numbers <em>y<\/em><sub>0<\/sub>, <em>y<\/em><sub>1<\/sub>, <em>y<\/em><sub>2<\/sub>, &#8230;, the <a href=\"https:\/\/en.wikipedia.org\/wiki\/Summation\">sums<\/a> of <a href=\"https:\/\/en.wikipedia.org\/wiki\/Prefix_(computer_science)\">prefixes<\/a> (<a href=\"https:\/\/en.wikipedia.org\/wiki\/Running_total\">running totals<\/a>) of the input sequence:<\/p>\n\n\n\n<!-- HTML generated using hilite.me --><div style=\"background: #f0f3f3; overflow:auto;width:auto;gray;border-width:.1em .1em .1em .8em\"><pre style=\"margin: 0; line-height: 125%\">    y0 <span style=\"color: #555555\">=<\/span> x0\n    y1 <span style=\"color: #555555\">=<\/span> x0 <span style=\"color: #555555\">+<\/span> x1\n    y2 <span style=\"color: #555555\">=<\/span> x0 <span style=\"color: #555555\">+<\/span> x1<span style=\"color: #555555\">+<\/span> x2\n    ...\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">(<a href=\"https:\/\/en.wikipedia.org\/wiki\/Prefix_sum\">https:\/\/en.wikipedia.org\/wiki\/Prefix_sum<\/a>)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Proposal <a href=\"https:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2024\/p2300r10.html\">P2300R10<\/a> has an asynchronous implementation of inclusive scan.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Asynchronous Inclusive Scan<\/h2>\n\n\n\n<!-- HTML generated using hilite.me --><div style=\"background: #f0f3f3; overflow:auto;width:auto;gray;border-width:.1em .1em .1em .8em\"><pre style=\"margin: 0; line-height: 125%\"><span style=\"color: #006699; font-weight: bold\">using<\/span> <span style=\"color: #006699; font-weight: bold\">namespace<\/span> std<span style=\"color: #555555\">::<\/span>execution;\n\nsender <span style=\"color: #006699; font-weight: bold\">auto<\/span> <span style=\"color: #CC00FF\">async_inclusive_scan<\/span>(scheduler <span style=\"color: #006699; font-weight: bold\">auto<\/span> sch,                          <span style=\"color: #0099FF; font-style: italic\">\/\/ 2<\/span>\n                                 std<span style=\"color: #555555\">::<\/span>span<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #006699; font-weight: bold\">const<\/span> <span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;<\/span> input,               <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                                 std<span style=\"color: #555555\">::<\/span>span<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;<\/span> output,                    <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                                 <span style=\"color: #007788; font-weight: bold\">double<\/span> init,                                 <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                                 std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> tile_count)                      <span style=\"color: #0099FF; font-style: italic\">\/\/ 3<\/span>\n{\n  std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> <span style=\"color: #006699; font-weight: bold\">const<\/span> tile_size <span style=\"color: #555555\">=<\/span> (input.size() <span style=\"color: #555555\">+<\/span> tile_count <span style=\"color: #555555\">-<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">\/<\/span> tile_count;\n\n  std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;<\/span> partials(tile_count <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>);                               <span style=\"color: #0099FF; font-style: italic\">\/\/ 4<\/span>\n  partials[<span style=\"color: #FF6600\">0<\/span>] <span style=\"color: #555555\">=<\/span> init;                                                         <span style=\"color: #0099FF; font-style: italic\">\/\/ 4<\/span>\n\n  <span style=\"color: #006699; font-weight: bold\">return<\/span> just(std<span style=\"color: #555555\">::<\/span>move(partials))                                            <span style=\"color: #0099FF; font-style: italic\">\/\/ 5<\/span>\n       <span style=\"color: #555555\">|<\/span> continues_on(sch)\n       <span style=\"color: #555555\">|<\/span> bulk(tile_count,                                                     <span style=\"color: #0099FF; font-style: italic\">\/\/ 6<\/span>\n           [ <span style=\"color: #555555\">=<\/span> ](std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> i, std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;&amp;<\/span> partials) {              <span style=\"color: #0099FF; font-style: italic\">\/\/ 7<\/span>\n             <span style=\"color: #006699; font-weight: bold\">auto<\/span> start <span style=\"color: #555555\">=<\/span> i <span style=\"color: #555555\">*<\/span> tile_size;                                      <span style=\"color: #0099FF; font-style: italic\">\/\/ 8<\/span>\n             <span style=\"color: #006699; font-weight: bold\">auto<\/span> end   <span style=\"color: #555555\">=<\/span> std<span style=\"color: #555555\">::<\/span>min(input.size(), (i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">*<\/span> tile_size);        <span style=\"color: #0099FF; font-style: italic\">\/\/ 8<\/span>\n             partials[i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>] <span style=\"color: #555555\">=<\/span> <span style=\"color: #555555\">*--<\/span>std<span style=\"color: #555555\">::<\/span>inclusive_scan(begin(input) <span style=\"color: #555555\">+<\/span> start,   <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n                                                      begin(input) <span style=\"color: #555555\">+<\/span> end,     <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n                                                      begin(output) <span style=\"color: #555555\">+<\/span> start); <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n           })                                                                 <span style=\"color: #0099FF; font-style: italic\">\/\/ 10<\/span>\n       <span style=\"color: #555555\">|<\/span> then(                                                                <span style=\"color: #0099FF; font-style: italic\">\/\/ 11<\/span>\n           [](std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;&amp;&amp;<\/span> partials) {\n             std<span style=\"color: #555555\">::<\/span>inclusive_scan(begin(partials), end(partials),              <span style=\"color: #0099FF; font-style: italic\">\/\/ 12<\/span>\n                                 begin(partials));                            <span style=\"color: #0099FF; font-style: italic\">\/\/ 12<\/span>\n             <span style=\"color: #006699; font-weight: bold\">return<\/span> std<span style=\"color: #555555\">::<\/span>move(partials);                                      <span style=\"color: #0099FF; font-style: italic\">\/\/ 13<\/span>\n           })\n       <span style=\"color: #555555\">|<\/span> bulk(tile_count,                                                     <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n           [ <span style=\"color: #555555\">=<\/span> ](std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> i, std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;&amp;<\/span> partials) {              <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n             <span style=\"color: #006699; font-weight: bold\">auto<\/span> start <span style=\"color: #555555\">=<\/span> i <span style=\"color: #555555\">*<\/span> tile_size;                                      <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n             <span style=\"color: #006699; font-weight: bold\">auto<\/span> end   <span style=\"color: #555555\">=<\/span> std<span style=\"color: #555555\">::<\/span>min(input.size(), (i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">*<\/span> tile_size);        <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n             std<span style=\"color: #555555\">::<\/span>for_each(begin(output) <span style=\"color: #555555\">+<\/span> start, begin(output) <span style=\"color: #555555\">+<\/span> end,        <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n               [<span style=\"color: #555555\">&amp;<\/span>] (<span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&amp;<\/span> e) { e <span style=\"color: #555555\">=<\/span> partials[i] <span style=\"color: #555555\">+<\/span> e; }                       <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n             );\n           })\n       <span style=\"color: #555555\">|<\/span> then(                                                                <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n           [ <span style=\"color: #555555\">=<\/span> ](std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">double<\/span><span style=\"color: #555555\">&gt;&amp;&amp;<\/span> partials) {                            <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n             <span style=\"color: #006699; font-weight: bold\">return<\/span> output;                                                   <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n           });                                                                <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n}\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\"> Here&#8217;s the explanation from the proposal <a href=\"https:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2024\/p2300r10.html\">P2300R10<\/a>:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>It scans a sequence of <code>double<\/code>s (represented as the <code>std::span&lt;const double&gt;<\/code> <code>input<\/code>) and stores the result in another sequence of <code>double<\/code>s (represented as <code>std::span&lt;double&gt;<\/code> <code>output<\/code>).<\/li>\n\n\n\n<li>It takes a scheduler, which specifies what execution resource the scan should be launched on.<\/li>\n\n\n\n<li>It also takes a <code>tile_count<\/code> parameter that controls the number of execution agents that will be spawned.<\/li>\n\n\n\n<li>First we need to allocate temporary storage needed for the algorithm, which we\u2019ll do with a <code>std::vector<\/code>, <code>partials<\/code>. We need one <code>double<\/code> of temporary storage for each execution agent we create.<\/li>\n\n\n\n<li>Next we\u2019ll create our initial sender with <code>execution::just <\/code>and <code>execution::continues_on<\/code>. These senders will send the temporary storage, which we\u2019ve moved into the sender. The sender has a completion scheduler of <code>sch<\/code>, which means the next item in the chain will use <code>sch<\/code>.<\/li>\n\n\n\n<li>Senders and sender adaptors support composition via <code>operator|<\/code>, similar to C++ ranges. We\u2019ll use <code>operator|<\/code> to attach the next piece of work, which will spawn <code>tile_count<\/code> execution agents using <code>execution::bulk<\/code>s.<\/li>\n\n\n\n<li>Each agent will call a <code>std::invocable<\/code>, passing it two arguments. The first is the agent\u2019s index (<code>i<\/code>) in the execution::bulk operation, in this case a unique integer in <code>[0, tile_count)<\/code>. The second argument is what the input sender sent &#8211; the temporary storage.<\/li>\n\n\n\n<li>We start by computing the start and end of the range of input and output elements that this agent is responsible for, based on our agent index.<\/li>\n\n\n\n<li>Then we do a sequential <code>std::inclusive_scan<\/code> over our elements. We store the scan result for our last element, which is the sum of all of our elements, in our temporary storage <code>partials<\/code>.<\/li>\n\n\n\n<li>After all computation in that initial <code>bulk <\/code>pass has completed, every one of the spawned execution agents will have written the sum of its elements into its slot in <code>partials<\/code>.<\/li>\n\n\n\n<li>Now we need to scan all of the values in <code>partials<\/code>. We\u2019ll do that with a single execution agent which will execute after the<a href=\"https:\/\/www.open-std.org\/jtc1\/sc22\/wg21\/docs\/papers\/2024\/p2300r10.html#design-sender-adaptor-bulk\"> <\/a><code>execution::bulk<\/code> completes. We create that execution agent with <code>execution::then<\/code>.<\/li>\n\n\n\n<li><code>execution::then<\/code> takes an input sender and an <code>std::invocable<\/code> and calls the <code>std::invocable<\/code> with the value sent by the input sender. Inside our <code>std::invocable<\/code>, we call <code>std::inclusive_scan<\/code> on <code>partials<\/code>, which the input senders will send to us.<\/li>\n\n\n\n<li>Then we return <code>partials<\/code>, which the next phase will need.<\/li>\n\n\n\n<li>Finally we do another <code>execution::bulk <\/code>of the same shape as before. In this <code>execution::bulk,<\/code> we will use the scanned values in <code>partials<\/code> to integrate the sums from other tiles into our elements, completing the inclusive scan.<\/li>\n\n\n\n<li><code>async_inclusive_scan<\/code> returns a sender that sends the output <code>std::span&lt;double&gt;<\/code>. A consumer of the algorithm can chain additional work that uses the scan result. At the point at which <code>async_inclusive_scan<\/code> returns, the computation may not have completed. In fact, it may not have even started.<\/li>\n<\/ol>\n\n\n\n<p class=\"wp-block-paragraph\">Sender<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>bulk(input, shape, call)<\/code>:  Returns a sender which describes the callable <code>call<\/code> invoked on <code>input <\/code>according to <code>shape.<\/code><\/li>\n\n\n\n<li><code>continues_on(input, scheduler<\/code>):  Returns a sender describing the transition from the execution agent of the <code>input <\/code>sender to the execution agent of the target <code>scheduler<\/code>.<\/li>\n\n\n\n<li><code>just(values)<\/code>: Returns a sender with no completion schedulers, which sends the provided <code>values<\/code>. <code>just <\/code>is a sender factory.<\/li>\n\n\n\n<li><code>then(input, call)<\/code>:  <code>then<\/code> returns a sender describing the continuation of the task of the <code>input<\/code> sender on an added node of invoking the provided function <code>call<\/code>.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Would it not be nice to see this program in action? Currently (December 2024), no compiler supports <code>std::execution<\/code> or the concepts <code>sender <\/code>and <code>scheduler<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Here comes the reference implementation <a href=\"https:\/\/github.com\/NVIDIA\/stdexec\">stdexec<\/a> to our rescue:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I changed the data type of the processed elements from <code>double <\/code>to <code>int<\/code>.<\/p>\n\n\n\n<!-- HTML generated using hilite.me --><div style=\"background: #f0f3f3; overflow:auto;width:auto;gray;border-width:.1em .1em .1em .8em\"><pre style=\"margin: 0; line-height: 125%\"><span style=\"color: #0099FF; font-style: italic\">\/\/ inclusiveScanExecution.cpp<\/span>\n\n<span style=\"color: #009999\">#include &lt;algorithm&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;exec\/static_thread_pool.hpp&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;iostream&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;numeric&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;span&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;stdexec\/execution.hpp&gt;<\/span>\n<span style=\"color: #009999\">#include &lt;vector&gt;<\/span>\n\n<span style=\"color: #006699; font-weight: bold\">auto<\/span> <span style=\"color: #CC00FF\">async_inclusive_scan<\/span>(<span style=\"color: #006699; font-weight: bold\">auto<\/span> sch,                   <span style=\"color: #0099FF; font-style: italic\">\/\/ 2<\/span>\n                          std<span style=\"color: #555555\">::<\/span>span<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #006699; font-weight: bold\">const<\/span> <span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> input, <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                          std<span style=\"color: #555555\">::<\/span>span<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> output,      <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                          <span style=\"color: #007788; font-weight: bold\">int<\/span> init,                   <span style=\"color: #0099FF; font-style: italic\">\/\/ 1<\/span>\n                          std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> tile_count)     <span style=\"color: #0099FF; font-style: italic\">\/\/ 3<\/span>\n{\n  std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> <span style=\"color: #006699; font-weight: bold\">const<\/span> tile_size <span style=\"color: #555555\">=<\/span> (input.size() <span style=\"color: #555555\">+<\/span> tile_count <span style=\"color: #555555\">-<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">\/<\/span> tile_count;\n\n  std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> partials(tile_count <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>); <span style=\"color: #0099FF; font-style: italic\">\/\/ 4<\/span>\n  partials[<span style=\"color: #FF6600\">0<\/span>] <span style=\"color: #555555\">=<\/span> init;                        <span style=\"color: #0099FF; font-style: italic\">\/\/ 4<\/span>\n\n  <span style=\"color: #006699; font-weight: bold\">return<\/span> stdexec<span style=\"color: #555555\">::<\/span>just(std<span style=\"color: #555555\">::<\/span>move(partials)) <span style=\"color: #0099FF; font-style: italic\">\/\/ 5<\/span>\n         <span style=\"color: #555555\">|<\/span> stdexec<span style=\"color: #555555\">::<\/span>continues_on(sch) <span style=\"color: #555555\">|<\/span>\n         stdexec<span style=\"color: #555555\">::<\/span>bulk(tile_count,                                      <span style=\"color: #0099FF; font-style: italic\">\/\/ 6<\/span>\n                       [<span style=\"color: #555555\">=<\/span>](std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> i, std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> <span style=\"color: #555555\">&amp;<\/span>partials) { <span style=\"color: #0099FF; font-style: italic\">\/\/ 7<\/span>\n                         <span style=\"color: #006699; font-weight: bold\">auto<\/span> start <span style=\"color: #555555\">=<\/span> i <span style=\"color: #555555\">*<\/span> tile_size;                    <span style=\"color: #0099FF; font-style: italic\">\/\/ 8<\/span>\n                         <span style=\"color: #006699; font-weight: bold\">auto<\/span> end <span style=\"color: #555555\">=<\/span>\n                             std<span style=\"color: #555555\">::<\/span>min(input.size(), (i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">*<\/span> tile_size); <span style=\"color: #0099FF; font-style: italic\">\/\/ 8<\/span>\n                         partials[i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>] <span style=\"color: #555555\">=<\/span>\n                             <span style=\"color: #555555\">*--<\/span>std<span style=\"color: #555555\">::<\/span>inclusive_scan(begin(input) <span style=\"color: #555555\">+<\/span> start,   <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n                                                    begin(input) <span style=\"color: #555555\">+<\/span> end,     <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n                                                    begin(output) <span style=\"color: #555555\">+<\/span> start); <span style=\"color: #0099FF; font-style: italic\">\/\/ 9<\/span>\n                       }) <span style=\"color: #0099FF; font-style: italic\">\/\/ 10<\/span>\n         <span style=\"color: #555555\">|<\/span> stdexec<span style=\"color: #555555\">::<\/span>then( <span style=\"color: #0099FF; font-style: italic\">\/\/ 11<\/span>\n               [](std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> <span style=\"color: #555555\">&amp;&amp;<\/span>partials) {\n                 std<span style=\"color: #555555\">::<\/span>inclusive_scan(begin(partials), end(partials), <span style=\"color: #0099FF; font-style: italic\">\/\/ 12<\/span>\n                                     begin(partials));               <span style=\"color: #0099FF; font-style: italic\">\/\/ 12<\/span>\n                 <span style=\"color: #006699; font-weight: bold\">return<\/span> std<span style=\"color: #555555\">::<\/span>move(partials);                         <span style=\"color: #0099FF; font-style: italic\">\/\/ 13<\/span>\n               }) <span style=\"color: #555555\">|<\/span>\n         stdexec<span style=\"color: #555555\">::<\/span>bulk(\n             tile_count,                                                 <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n             [<span style=\"color: #555555\">=<\/span>](std<span style=\"color: #555555\">::<\/span><span style=\"color: #007788; font-weight: bold\">size_t<\/span> i, std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> <span style=\"color: #555555\">&amp;<\/span>partials) {            <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n               <span style=\"color: #006699; font-weight: bold\">auto<\/span> start <span style=\"color: #555555\">=<\/span> i <span style=\"color: #555555\">*<\/span> tile_size;                               <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n               <span style=\"color: #006699; font-weight: bold\">auto<\/span> end <span style=\"color: #555555\">=<\/span> std<span style=\"color: #555555\">::<\/span>min(input.size(), (i <span style=\"color: #555555\">+<\/span> <span style=\"color: #FF6600\">1<\/span>) <span style=\"color: #555555\">*<\/span> tile_size);   <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n               std<span style=\"color: #555555\">::<\/span>for_each(begin(output) <span style=\"color: #555555\">+<\/span> start, begin(output) <span style=\"color: #555555\">+<\/span> end, <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n                             [<span style=\"color: #555555\">&amp;<\/span>](<span style=\"color: #007788; font-weight: bold\">int<\/span> <span style=\"color: #555555\">&amp;<\/span>e) { e <span style=\"color: #555555\">=<\/span> partials[i] <span style=\"color: #555555\">+<\/span> e; }        <span style=\"color: #0099FF; font-style: italic\">\/\/ 14<\/span>\n               );\n             }) <span style=\"color: #555555\">|<\/span>\n         stdexec<span style=\"color: #555555\">::<\/span>then(                         <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n             [<span style=\"color: #555555\">=<\/span>](std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> <span style=\"color: #555555\">&amp;&amp;<\/span>partials) { <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n               <span style=\"color: #006699; font-weight: bold\">return<\/span> output;                   <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n             });                                <span style=\"color: #0099FF; font-style: italic\">\/\/ 15<\/span>\n}\n\n<span style=\"color: #007788; font-weight: bold\">int<\/span> <span style=\"color: #CC00FF\">main<\/span>() {\n\n  std<span style=\"color: #555555\">::<\/span>cout <span style=\"color: #555555\">&lt;&lt;<\/span> <span style=\"color: #CC3300\">&#39;\\n&#39;<\/span>;\n\n  std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> input(<span style=\"color: #FF6600\">30<\/span>);\n  std<span style=\"color: #555555\">::<\/span>iota(begin(input), end(input), <span style=\"color: #FF6600\">0<\/span>);\n  <span style=\"color: #006699; font-weight: bold\">for<\/span> (<span style=\"color: #006699; font-weight: bold\">auto<\/span> e <span style=\"color: #555555\">:<\/span> input) {\n    std<span style=\"color: #555555\">::<\/span>cout <span style=\"color: #555555\">&lt;&lt;<\/span> e <span style=\"color: #555555\">&lt;&lt;<\/span> <span style=\"color: #CC3300\">&#39; &#39;<\/span>;\n  }\n  std<span style=\"color: #555555\">::<\/span>cout <span style=\"color: #555555\">&lt;&lt;<\/span> <span style=\"color: #CC3300\">&#39;\\n&#39;<\/span>;\n\n  std<span style=\"color: #555555\">::<\/span>vector<span style=\"color: #555555\">&lt;<\/span><span style=\"color: #007788; font-weight: bold\">int<\/span><span style=\"color: #555555\">&gt;<\/span> output(input.size());\n\n  exec<span style=\"color: #555555\">::<\/span>static_thread_pool pool(<span style=\"color: #FF6600\">8<\/span>);\n  <span style=\"color: #006699; font-weight: bold\">auto<\/span> sch <span style=\"color: #555555\">=<\/span> pool.get_scheduler();\n\n  <span style=\"color: #006699; font-weight: bold\">auto<\/span> [out] <span style=\"color: #555555\">=<\/span>\n      stdexec<span style=\"color: #555555\">::<\/span>sync_wait(async_inclusive_scan(sch, input, output, <span style=\"color: #FF6600\">0<\/span>, <span style=\"color: #FF6600\">4<\/span>))\n          .value();\n\n  <span style=\"color: #006699; font-weight: bold\">for<\/span> (<span style=\"color: #006699; font-weight: bold\">auto<\/span> e <span style=\"color: #555555\">:<\/span> out) {\n    std<span style=\"color: #555555\">::<\/span>cout <span style=\"color: #555555\">&lt;&lt;<\/span> e <span style=\"color: #555555\">&lt;&lt;<\/span> <span style=\"color: #CC3300\">&#39; &#39;<\/span>;\n  }\n\n  std<span style=\"color: #555555\">::<\/span>cout <span style=\"color: #555555\">&lt;&lt;<\/span> <span style=\"color: #CC3300\">&#39;\\n&#39;<\/span>;\n}\n<\/pre><\/div>\n\n\n\n<p class=\"wp-block-paragraph\">Here is an explanation of the main program:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the <code>main<\/code> a <code>std::vector&lt;int&gt;<\/code> <code>input<\/code> is then created with 30 elements. The <code>std::iota<\/code> function fills the <code>input<\/code> vector with sequential integers starting from 0. The program then prints the contents of the <code>input<\/code> vector to the console.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Next, a <code>std::vector&lt;int&gt;<\/code> <code>output<\/code> is created with the same size as the <code>input<\/code> vector to store the results of the inclusive scan operation. The <code>exec::static_thread_pool<\/code> <code>pool<\/code> has 8 threads, which will be used to execute tasks concurrently. The <code>get_scheduler<\/code> member function of the thread pool creates a scheduler object <code>sch<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The <code>async_inclusive_scan<\/code> function takes the scheduler <code>sch<\/code>, the <code>input<\/code> vector, the <code>output<\/code> vector, an initial value of 0, and a tile count of 4. This function performs the inclusive scan operation asynchronously using the specified scheduler and returns a future-like object. The <code>stdexec::sync_wait<\/code> function synchronously wait for the completion of the <code>async_inclusive_scan<\/code> operation, and the result is unpacked into the variable <code>out<\/code>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Finally, the program prints the contents of the <code>out<\/code> vector to the console:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1030\" height=\"60\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution-1030x60.png\" alt=\"\" class=\"wp-image-10397\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution-1030x60.png 1030w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution-300x17.png 300w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution-768x44.png 768w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution-705x41.png 705w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2024\/11\/inclusiveScanExecution.png 1089w\" sizes=\"auto, (max-width: 1030px) 100vw, 1030px\" \/><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\">What&#8217;s Next?<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In my next blog post, I will step one step back and explain the composition of senders via operator<code> |<\/code>.<br><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Inclusive scan solves problems related to range queries, such as calculating the sum of a range of elements in an array. It is also used in range minimum queries and various other algorithms. Before I present the asynchronous inclusive scan, I introduce the inclusive scan, aka prefix sum. Prefix Sum In computer science, the prefix [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":10389,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[559],"tags":[561],"class_list":["post-10386","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-c26-blog","tag-execution"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/10386","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=10386"}],"version-history":[{"count":13,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/10386\/revisions"}],"predecessor-version":[{"id":10857,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/10386\/revisions\/10857"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/10389"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=10386"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=10386"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=10386"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}