Copy versus Move Semantic: A few Numbers

Contents[Show]

A lot was written about the advantages of the move semantic to the copy semantic. Instead of an expensive copy operation you can use a cheap move operation. But, what does that mean? I will compare in this post the performance of the copy and move semantic for the containers the Standard Template Library (STL). 

 

Before I show the number, I will provide a little bit of background information.

Copy versus move semantic

The subtle difference is, if you create with copy or move semantic a new object based on an existing one, that the copy semantic will copy the elements of the resource, that the move semantic will move the elements of the resource. Of course, copying is expensive, moving is cheap. But there are additional serious consequences.

  1. With copy semantic it can happen, that a std::bad_alloc will be thrown because your program is out of memory.
  2. The resource of the move operation is afterwards in a "valid but unspecified state".

The second point is very nice to show with std::string.

At first, the classical copy semantic.

Copy semantic

std::string("ABCDEF");
std::string str2;
str2= str1;

copy

Both strings str1 and str2 have after the copy operation the same content "ABCDEF". So, what's the difference to the move semantic.

Move semantic

 

std::string("ABCDEF");
std::string str3;
str2= std::move(str1);

move

The string str1 is in opposite to the copy semantic afterwards empty "". This is not guaranteed but often the case. I explicitly requested the move semantic with the function std::move. The compiler will automatically perform the move semantic if it is sure that the source of the move semantic is not needed any more.

I will explicitly request the move semantic in my program by using std::move.

The performance differences

I will take the naive position in my post and compare, what is the performance difference between the copy and move semantic of the STL containers. My comparison will include the std::string. I will ignore the associative containers, which can have more equal keys. I'm in particular interested in the performance ratio between the copy and move semantic of the containers.

The boundary conditions

The differences were not so dramatically between the program with maximum optimization and without optimization therefore I will for simplicity reasons only provide the results for the executable with maximum optimization. I use an GCC 4.9.2 compiler and the cl.exe compiler, which is part of Microsoft Visual Studio 2015. Both platforms are 64-bit. Therefore, the executables are build for 64-bit. 

The program

We have a lot of containers in the STL. Therefore, the program is a little bit lengthy.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
// movePerformance.cpp
 
#include <array>
#include <forward_list>
#include <chrono>
#include <deque>
#include <iomanip>
#include <iostream>
#include <list>
#include <map>
#include <numeric>
#include <set>
#include <string>
#include <unordered_map>
#include <unordered_set>
#include <utility>
#include <vector>

const int SIZE = 10000000; 

template <typename T>
void measurePerformance(T& t, const std::string& cont){
  
  std::cout << std::fixed << std::setprecision(10);

  auto begin= std::chrono::system_clock::now();
  T t1(t);
  auto last=  std::chrono::system_clock::now() - begin;
  std::cout << cont << std::endl;
  auto copyTime=  std::chrono::duration<double>(last).count();
  std::cout <<  "    Copy: " << copyTime << " sec" << std::endl;

  begin= std::chrono::system_clock::now();
  T t2(std::move(t));
  last=  std::chrono::system_clock::now() - begin;
  auto moveTime= std::chrono::duration<double>(last).count();
  std::cout <<  "    Move: " << moveTime << " sec" << std::endl;
  
  std::cout << std::setprecision(2);
  std::cout << "    Ratio (copy time/move time): " << (copyTime/moveTime) << std::endl;
  
  std::cout << std::endl;
     
}

int main(){
    
    std::cout << std::endl;
    
    {
      std::array<int,SIZE/1000> myArray;
      measurePerformance(myArray,"std::array<int,SIZE/1000>");   
    }
    
    {
      std::vector<int> myVec(SIZE);
      measurePerformance(myVec,"std::vector<int>(SIZE)");
    }

    {
      std::deque<int>myDec(SIZE);
      measurePerformance(myDec,"std::deque<int>(SIZE)");
    }
    
    {
      std::list<int>myList(SIZE);
      measurePerformance(myList,"std::list<int>(SIZE)");
    }
    
    {
      std::forward_list<int>myForwardList(SIZE);
      measurePerformance(myForwardList,"std::forward_list<int>(SIZE)");
    } 
       
    {
      std::string myString(SIZE,' ');
      measurePerformance(myString,"std::string(SIZE,' ')");
    }
    
    std::vector<int> tmpVec(SIZE);
    std::iota(tmpVec.begin(),tmpVec.end(),0);
    
    {
      std::set<int>mySet(tmpVec.begin(),tmpVec.end());
      measurePerformance(mySet,"std::set<int>");
    }
    
    {
      std::unordered_set<int>myUnorderedSet(tmpVec.begin(),tmpVec.end());
      measurePerformance(myUnorderedSet,"std::unordered_set<int>");
    }
    
    {
      std::map<int,int>myMap;
      for (auto i= 0; i <= SIZE; ++i) myMap[i]= i;
      measurePerformance(myMap,"std::map<int,int>");
    }
    
    {
      std::unordered_map<int,int>myUnorderedMap;
      for (auto i= 0; i <= SIZE; ++i) myUnorderedMap[i]= i;
      measurePerformance(myUnorderedMap,"std::unordered_map<int,int>");
    }  
    
}

 

The idea of the program is it to initialize the containers with 10 million elements. Of course, the initialization will happen with copy and move semantic. The performance measurement takes place in the function template measurePerformane (line 21 - 44). The function takes as argument the container and the name of the container. Thanks to the chrono library I can measure how long the copy initialization (line 27) and the move initialization (line 34) takes. At the end I'm interested in the ratio between the copy and move semantic (line 40).

What's happening in the main function? I create for each container an own scope so that it will be automatically released. Therefore, myArray (line 51) will automatically be released and the end of its scope (line 53). Because the containers are quite big, releasing their memory is a must. I claimed that each container has 10 millions elements. That will not hold for myArray. Because myArray will not be allocated on the heap, I have to dramatically reduce its size. But now to the remaining containers. With std::vector, std::deque, std::list, and std::forward_list there are in line 55 - 73 the remaining sequential containers. In line 75 - 78 std::string follows. The rest are the associative containers. I have to pay attention to one characteristic of the associative container. In order to have unique keys and therefore the size 10 millions, I use the numbers 0 to 9999999 as keys. The function std::iota does the job.

The numbers

movemoveWin

 

The results of std::array are not so meaningful. At one hand, std::array is not so big; at the other hand, the time difference on Windows is not measurable with the clock std::system_clock

What insight can I derive for the numbers?

  • Sequential container: std::vector is as expected the fastest container in case of copying or moving.
  • Sequential versus associative container: Copying of the sequential container on Linux and Windows is faster.
  • Copy versus move semantic: The differences between the copy and move semantic are huge. That holds in particular true for the associative containers. 
  • std::string: The std::string on Linux behaves strange. At one hand, copying is very fast; on the other hand, moving is only 16 times faster than copying. The becomes even more strange if I compile and execute the program without optimization. I get the result on Linux that move semantic is only 1.5 times faster than the copy semantic. But these numbers are in strong contradiction to the numbers on Windows. On Windows, the move semantic is 15000 times faster than the copy semantic. 

The riddle around std::string

The performance difference on Linux and Windows of the copy and move semantic is quickly explained. My GCC implements the std::string according to copy-on-write (cow). This is not conformant to the C++11 standard. But cl.exe implements std::string according to the C++11 standard. If I compile the program with a GCC 6.1 and enable C++11, I will get totally different numbers. GCC's std::string implementation is since 5.1 conformant to the C++11 standard.

Here are the numbers with the online compiler on en.cppreference.com.

 string

Now, there is a big difference between the copy and move semantic. 

What's next?

I hope that was motivation for the move semantic. In the next post I will pick two nice characteristics of the move semantic.

 

 

 

 

 

 

 

 

 

 

title page smalltitle page small Go to Leanpub/cpplibrary "What every professional C++ programmer should know about the C++ standard library".   Get your e-book. Support my blog.

 

Tags: memory

Comments   

0 #1 Maran 2016-12-22 08:03
Thanks for the illustrative post.
When I add const to the type of t as below,
template
void measurePerformance(const T& t, const std::string& cont)
The ratio of copy vs move is almost one for all type of containers.

But when I remove const as in your original code, I get similar results as yours. I could not reason about this. Am I missing something obvious?
Quote
+1 #2 Rainer 2016-12-23 06:26
Quoting Maran:
Thanks for the illustrative post.
When I add const to the type of t as below,
template
void measurePerformance(const T& t, const std::string& cont)
The ratio of copy vs move is almost one for all type of containers.

But when I remove const as in your original code, I get similar results as yours. I could not reason about this. Am I missing something obvious?

Quoting Maran:
Thanks for the illustrative post.
When I add const to the type of t as below,
template
void measurePerformance(const T& t, const std::string& cont)
The ratio of copy vs move is almost one for all type of containers.

But when I remove const as in your original code, I get similar results as yours. I could not reason about this. Am I missing something obvious?

If T is const, copy semantic kicks in because you can not move. Therefore you always get 1. Copy semantic is the fallback for move semantic.
Quote

Add comment


My Newest E-Book

Latest comments

Subscribe to the newsletter (+ pdf bundle)

Blog archive

Source Code

Visitors

Today 728

All 332852

Currently are 215 guests and no members online