C++ Core Guidelines: std::array and std::vector are your Friends

Contents[Show]

In 99 % of your use-cases for a sequential container, you are totally fine with a std::array or a std::vector. What? If you don't believe me, read this post.

 rules 1752626 1280

Okay, I can make it short today. Here is a rule of thumb: If you want to add elements to your container or remove elements from your container, use a std::vector; if not, use a std::array.

If you are busy, you can stop to read, if not, continue.

The Details

Here is the reason for the rule of thumb from the guideline : SL.con.2: Prefer using STL vector by default unless you have a reason to use a different container

std::array and std::vector offer the following advantages:

  1. the fastest general-purpose access (random access, including being vectorization-friendly);
  2. the fastest default access pattern (begin-to-end or end-to-begin is prefetcher-friendly);
  3. the lowest space overhead (contiguous layout has zero per-element overhead, which is cache-friendly).

I already wrote in my last post C++ Core Guidelines: The Standard Library about the third point. The first point of random access via the index operator is apparent. So, if you don't like proof by authority, let me talk about the second point. To get the full picture, here are the sequential containers of the STL.

 sequentialOverview

You see, we have five sequential containers in the standard template library. Depending on your use-case, std::vector maybe a 95% fit, because most of the times you have to add or remove elements to your std::vector. Let me add a few additional remarks to the table.

O(i) stands for the complexity (runtime) of an operation. So O(1) means that the runtime of an operation on a container is constant and is independent of the size of the container. Opposite to that, O(n) means, that the runtime depends linear on the number of the elements of the container. What does that mean for a std::vector or a std::array. The access time on an element is independent of the size of the std::vector or a std::array, but the insertion or deletion of an arbitrary element with k-times more elements is k-times slower. Of course, the modification is only possible for a std::vector.

std::array and std::vector provide similar access time guarantees, but there is one big difference between them, which many developers ignore. The std::array is typically created on the stack and the elements of a std::vector are created on the heap. This means, that a std::array can only have a limited number of elements but a std::vector an infinite number of elements.

Although the random access on the elements of a std::vector has the same complexity O(1) as the random access on the element of a std::deque, that doesn’t mean, that both operations are equally fast. I will come to this point later.

std::vector and std::deque support since C++11 the new method shrink_to_fit. The number of elements a std::vector or a std:.deque have (size) are usually smaller than the number of elements for which memory is already reserved (capacity). That is for a simple reason. The size of the std::vector or a std::deque can increase without an expensive allocation of new memory. The new method shrink_to_fit allows it to reduce the capacitiy of a std::vector a std::deque to its size. This call is not binding. That means the runtime can ignore it. But on popular platforms, I always observed the desired behaviour.

The complexity guarantee O(1) for the insertion or deletion into a double (std::list) or single linked list (std::forward_list) is only guaranteed if the iterator points to the right element. std::list and std::forward_list provide an exclusive guarantee, which may sometimes be necessary. When you modify a std::vector or a std::deque, the iterators become invalid. This will not hold for a std::list or a std::forward::list.

You must have an excellent reason to use the very special std::forward_list as your sequential container. std::forward_list is optimised for memory requirements and performance and is applicable if the insertion, extraction or movement of elements only affect adjacent elements. The reason for this special behaviour is quite obvious. As a single linked list, std::forward_list only supports a forward iterator and even don't know its size. This is the reason, why you can not use a std::forward_list ist many algorithms of the STL.

Memory Predictability

I said O(1) for the access time of an element in a std::vector and for an element in a std::deque does not mean the same. Here is my simple experiment, which I already provided in the post C++ Core Guidelines: The Remaining Rules to Performance. This is the reason I make my explanation quite short.

If you read an int from memory more than the size of one int is read from memory. An entire cache line is read from memory and stored in a cache. On modern architectures, a cache line has typically 64 bytes. If you now request an additional variable from memory and this variable is in the previous cache, the read directly uses this cache, and the operation is much faster.

Let's see what this means for a std::vector, a std::deque, std::list, and std::forward_list. I intentionally ignore in my performance test a std::array because of its limited size.

This was the theory of cache lines. Now I'm curious. Does it make a difference to read and accumulate all elements from std::vector, a std::deque, std::list, and std::forward_list. The small program should give an answer.

// memoryAcess.cpp

#include <forward_list>
#include <chrono>
#include <deque>
#include <iomanip>
#include <iostream>
#include <list>
#include <string>
#include <vector>
#include <numeric>
#include <random>

const int SIZE = 100'000'000; 

template <typename T>
void sumUp(T& t, const std::string& cont){            // (6)
  
  std::cout << std::fixed << std::setprecision(10);

  auto begin= std::chrono::steady_clock::now();
  std::size_t res = std::accumulate(t.begin(), t.end(), 0LL);
  std::chrono::duration<double> last=  std::chrono::steady_clock::now() - begin;
  std::cout << cont <<  std::endl;
  std::cout << "time: " << last.count() << std::endl;
  std::cout << "res: " << res << std::endl;
  std::cout << std::endl;
 
  std::cout << std::endl;
     
}

int main(){
    
    std::cout << std::endl;
    
    std::random_device seed;                            // (1)
    std::mt19937 engine(seed());
    std::uniform_int_distribution<int> dist(0, 100);
    std::vector<int> randNumbers;
    randNumbers.reserve(SIZE);
    for (int i=0; i < SIZE; ++i){
        randNumbers.push_back(dist(engine));
    }
    
    {
      std::vector<int> myVec(randNumbers.begin(), randNumbers.end());
      sumUp(myVec,"std::vector<int>");                 // (2)
    }

    
    {
      std::deque<int>myDec(randNumbers.begin(), randNumbers.end());
      sumUp(myDec,"std::deque<int>");                 // (3)
    }
    
    {
      std::list<int>myList(randNumbers.begin(), randNumbers.end());
      sumUp(myList,"std::list<int>");                 // (4)
    }
    
    {
      std::forward_list<int>myForwardList(randNumbers.begin(), randNumbers.end());
      sumUp(myForwardList,"std::forward_list<int>");  // (5)
    } 
    
}

 

The program memoryAccess.cpp creates first 100 Million random numbers between 0 and 100 (1). Then it accumulate the elements using a std::vector (2), a std::deque (3), a std::list (4), and a std::forward_list (5). The actual work is done in the function sumUp (6). 

I compiled the program with maximum optimisation and executed it on Linux and Windows. I'm not interested in the comparison between Linux and Windows because that would be a comparison between a desktop PC and a Laptop. I'm interested in the read performance of the four containers. Here it is:memoryAccessWin

memoryAccessLinux

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

To make my performance comparison easy to digest, here is a graphic.

performaneSequential

I don't want to overestimate these performance numbers, but one key observation is obvious. The more cache line aware the container is, the faster is the access time of the elements: std::vector > std::deque > (std::list, std::forward_list).

What's next?

I think I should write a similar post to the associative containers in the standard template library. From my perspective, they are underrepresented in the C++ core guidelines.  My next post is about associative containers such as std::map and std::unordered_map.

 

Thanks a lot to my Patreon Supporters: Paul Baxter,  Meeting C++, Matt Braun, Avi Lachmish, Roman Postanciuc, Venkata Ramesh Gudpati, Tobias Zindl, Marko, Ramesh Jangama, G Prvulovic, Reiner Eiteljörge, Benjamin Huth, Reinhold Dröge, Timo, Abernitzke, Richard Ohnemus , Frank Grimm, and Sakib.

Thanks in particular to:  TakeUpCode 450 60     crp4

 

Get your e-book at Leanpub:

The C++ Standard Library

 

Concurrency With Modern C++

 

Get Both as one Bundle

cover   ConcurrencyCoverFrame   bundle
With C++11, C++14, and C++17 we got a lot of new C++ libraries. In addition, the existing ones are greatly improved. The key idea of my book is to give you the necessary information to the current C++ libraries in about 200 pages.  

C++11 is the first C++ standard that deals with concurrency. The story goes on with C++17 and will continue with C++20.

I'll give you a detailed insight in the current and the upcoming concurrency in C++. This insight includes the theory and a lot of practice with more the 100 source files.

 

Get my books "The C++ Standard Library" (including C++17) and "Concurrency with Modern C++" in a bundle.

In sum, you get more than 600 pages full of modern C++ and more than 100 source files presenting concurrency in practice.

 

 

Get your interactive course

 

Modern C++ Concurrency in Practice

C++ Standard Library including C++14 & C++17

educative CLibrary

Based on my book "Concurrency with Modern C++" educative.io created an interactive course.

What's Inside?

  • 140 lessons
  • 110 code playgrounds => Runs in the browser
  • 78 code snippets
  • 55 illustrations

Based on my book "The C++ Standard Library" educative.io created an interactive course.

What's Inside?

  • 149 lessons
  • 111 code playgrounds => Runs in the browser
  • 164 code snippets
  • 25 illustrations
 

 

 

 

 

 

Add comment


My Newest E-Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 5795

All 2701812

Currently are 176 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments