book 159880 1280

More special Friends with std::map and std::unordered_map

Modern C++ has eight associative containers, but your special friends should be std::map and std::unordered_map. Why? Let me explain it in this post.

 

book 159880 1280

In my last post, C++ Core Guidelines:  std::array and std::vector are your friends; I stated: In 99 % of your use-cases, you are lovely with a std::array or a std::vector. A similar statement exists for associative containers: In 95 % of your use cases, you are lovely with a std::map or std::unordered_map.  In seldom cases, you don’t need the value associated with the key. These are the missing 5 %. Before I begin this post and give an overview and numbers to both associative containers, here is my rule of thumb for today: If you want a container with a key/value association and the keys should be ordered, use std::map; if not, use a std::unordered_map.

Here is the first overview. For more details, please read my previous posts about associative containers.

The eight Variations

OrderedVersusUnorderedContainers

To get an ordering into the eight variations of associative containers, you have to answer three questions. Each question can be answered by yes or no. 2 ^ 3 == 8. Here are the three questions:

 

Rainer D 6 P2 500x500Modernes C++ Mentoring

Be part of my mentoring programs:

  • "Fundamentals for C++ Professionals" (open)
  • "Design Patterns and Architectural Patterns with C++" (open)
  • "C++20: Get the Details" (open)
  • "Concurrency with Modern C++" (starts March 2024)
  • Do you want to stay informed: Subscribe.

     

    1. Is the container ordered?
    2. Does the key have an associated value?
    3. Are several identical keys possible?

    And here are the answers.

    1. When the container is not ordered, it’s called unordered.
    2. When the key does have a value associated, it’s called map; if not set.
    3. When the container can have more than one identical key, it’s called multi.

    When I speak of the ordered container, I mean the ordering of the keys.

    Maybe this taxonomy was too complicated. Let me give you a more straightforward picture.

    A Phone Book

    The eight variations are just different versions of a phone book. What is a phone book? A phone book is a sequence of key/value pairs. You use the keys (family names) to get the values (phone numbers).

    The family names of a phone book can be ordered or unordered; the phone book can have a phone number associated with the family name or not and can have only one or more identical family names. If you want to store your mobile and landline numbers in a phone book, you are pretty happy that you can use two identical keys.

    The reason for this post is not to explain the associative containers: The reason is a different one. The access time to an ordered associative container is logarithmic, but the access time to an unordered associative container is amortized constant. 

    Performance of a std::map and a std::unordered::map

    What does amortized constant access time mean for an unordered associative container such as std::unordered_map? It means that your query for a phone number is independent of the size of the phone book. Don’t you believe me? Let me show you a performance test.

    I have a phone book with roughly 89,000 entries. I will increase its size successively by ten until it has almost 89,000,000 entries. After each step, I will ask for all its phone numbers. This means I randomly use all family names. 

    The following image shows you a part of the initial phone book. You can see the name/number pairs separated by a colon and the name separated from the number by a comma.

    telebook

     The program should be pretty easy to read.

     

    // telephoneBook.cpp
    
    #include <chrono>
    #include <fstream>
    #include <iostream>
    #include <map>
    #include <random>
    #include <regex>
    #include <sstream>
    #include <string>
    #include <unordered_map>
    #include <vector>
    
    using map = std::unordered_map<std::string, int>;                   // (1)
    
    std::ifstream openFile(const std::string& myFile){                  
    
      std::ifstream file(myFile, std::ios::in);
      if ( !file ){
        std::cerr << "Can't open file "+ myFile + "!" << std::endl;
        exit(EXIT_FAILURE);
      }
      return file;
      
    }
    
    std::string readFile(std::ifstream file){                        
        
        std::stringstream buffer;
        buffer << file.rdbuf();
        
        return buffer.str();
        
    }
    
    map createTeleBook(const std::string& fileCont){                 
        
        map teleBook; 
        
        std::regex regColon(":");
        
        std::sregex_token_iterator fileContIt(fileCont.begin(), fileCont.end(), regColon, -1);
        const std::sregex_token_iterator fileContEndIt;
        
        std::string entry;
        std::string key;
        int value;
        while (fileContIt != fileContEndIt){                               // (2)
            entry = *fileContIt++;
            auto comma = entry.find(",");                                  // (3)
            key = entry.substr(0, comma);
            value = std::stoi(entry.substr(comma + 1, entry.length() -1));
            teleBook[key] = value;                                         // (4)
        }
        return teleBook;
        
    }
    
    std::vector<std::string> getRandomNames(const map& teleBook){    
        
        std::vector<std::string> allNames;
        for (const auto& pair: teleBook) allNames.push_back(pair.first);   // (5)
        
        std::random_device randDev;
        std::mt19937 generator(randDev());
     
        std::shuffle(allNames.begin(), allNames.end(), generator);         // (6) 
        
        return allNames;
    }
        
    void measurePerformance(const std::vector<std::string>& names, map& m){   
            
        auto start = std::chrono::steady_clock::now();
        for (const auto& name: names) m[name];                              // (7)
        std::chrono::duration<double> dur= std::chrono::steady_clock::now() - start;
        std::cout << "Access time: " << dur.count() << " seconds" << std::endl;
        
    }
        
    int main(int argc, char* argv[]){
    
        std::cout << std::endl;
      
        // get the filename
        std::string myFile;
        if ( argc == 2 ){
            myFile= {argv[1]};
        }
        else{
            std::cerr << "Filename missing !" << std::endl;
            exit(EXIT_FAILURE);
        } 
      
        std::ifstream file = openFile(myFile);
      
        std::string fileContent = readFile(std::move(file));
      
        map teleBook = createTeleBook(fileContent);
      
        std::cout << "teleBook.size(): " <<  teleBook.size() << std::endl;
      
        std::vector<std::string> randomNames = getRandomNames(teleBook);
      
        measurePerformance(randomNames, teleBook); 
        
        std::cout << std::endl;
        
    }
    

     

    Let me start with the main program. I open the file, read the content, create a phone book (std::map or std::unordered_map), get an arbitrary permutation of the family names, and make the performance test finally. Okay, this was too concise.

    Line 1 is the most interesting one. A std::unordered_map supports a superset of the interface of a std::map. This makes it quite convenient for me to make my performance test. I first did it by using map = std::map<std::string, int>; and then changed the line to using map = std::unordered_map<std::string, int>;.  The according to relations holds for the pairs (std::set/std::unordered_set),(std::mulitset, std::unordered_multiset), and (std::multimap, std::unordered_multimap). I assume the following functions are also quite interesting for you:

    • createTeleBook
      • the while loop iterates over all name/number tokens created by the regular expression regColon (line 2)
      • each token is separated by a comma (line 3)
      • in the end, the name/number pair is added to the phone book (line 4)
    • getRandomNames
      • puts all names onto a vector (line 5)
      • shuffles the names (line 6)
    • measurePerformance
      • asks for each name in the phone book (line 7)
       

    And now, finally, the performance numbers for a std::map and a std::unordered_map.

    std::map

    telebookMap

    std::unordered_map

     telebookUnordered

    The screenshots show precisely how big the phone books are. The numbers confirm the access time I showed in the first table: The access time of a std::map depends logarithmic on its size, and the access time of a std::unordered_map is amortised constant. The following plot shows the performance relation between a std::map and a std::unordered_map.

    comparison

    For 100,000 entries, the std::map is three times slower than the std::unordered_map, and for 100,000,000 entries 7 1/2 times slower.

    What’s next?

    After this slight detour from the C++ core guidelines, I will write in my next post about bounds errors and how to avoid them.

     

     

    Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, Marco Parri Empoli, moon, Philipp Lenk, Hobsbawm, and Charles-Jianye Chen.

    Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.

    My special thanks to Embarcadero
    My special thanks to PVS-Studio
    My special thanks to Tipi.build 
    My special thanks to Take Up Code
    My special thanks to SHAVEDYAKS

    Seminars

    I’m happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

    Standard Seminars (English/German)

    Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

    • C++ – The Core Language
    • C++ – The Standard Library
    • C++ – Compact
    • C++11 and C++14
    • Concurrency with Modern C++
    • Design Pattern and Architectural Pattern with C++
    • Embedded Programming with Modern C++
    • Generic Programming (Templates) with C++
    • Clean Code with Modern C++
    • C++20

    Online Seminars (German)

    Contact Me

    Modernes C++ Mentoring,

     

     

    0 replies

    Leave a Reply

    Want to join the discussion?
    Feel free to contribute!

    Leave a Reply

    Your email address will not be published. Required fields are marked *