The hash function is responsible for the unordered associative containers’ constant access time (best cast). As ever, C++ offers many ways to adjust the behavior of the hash functions. On the one hand, C++ has a lot of different hash functions; on the other hand, you can define your hash function. You can even adjust the number of buckets.
Before I write about the hash functions, I want to look closer at the declaration of the unordered associative containers. So we will not lose the big picture. I take std::unordered_map as the most prominent unordered associative container.
The declaration of the hash table std::unordered_map reveals a lot of exciting details.
Let’s have a closer look at the template parameters. The std::unordered_map associates the key (Key) with its value (Val). The remaining three template parameters are derived from the key and the value type. This statement holds for the hash function (Hash), the equality function (KeyEqual), and the allocator (Allocator). Therefore, it’s quite easy to instantiate a std::unordered_map char2int.
Now it gets more interesting. By default, the hash function std::hash<Key> and the equality function std::equal_to<Key> are used. Therefore, you can instantiate a std::unordered_map with a unique hash or equality function. But stop. Why do we need an equality function? The hash function maps the key to the hash value. The hash value determines the bucket. This mapping can cause a collision. That means different keys go into the same bucket. The std::unorderd_map has to deal with these collisions. To do that, it uses the equality function only for completeness reasons. You can adjust the allocation strategy of the container with the Allocator.
Which requirements have the key and the value of an unordered associative container to fulfill?
The key has to be
- comparable with the equality function,
- available as a hash value,
- copyable and moveable.
The value has to be
- default constructible,
- copyable and moveable.
The hash function
A hash function is good if their mapping from the keys to the values produces few collisions and the hash values are uniformly distributed among the buckets. Because the execution time of the hash function is constant, the access time of the elements can also be constant. Instead of that, the access time in the bucket is linear. Therefore, the overall access time of a value depends on the number of collisions in the bucket, respectively.
The hash function
- is available for fundamental data types like booleans, integers, and floating points.
- is available for the data types std::string and std::wstring,
- creates for C string const char* a hash value of the pointer address,
- can be defined for user-defined data types.
By applying the theory to my own data types, which I want to use as the key of an unordered associative container, my data type must fulfill two requirements: a hash function and an equality function.
My analysis of the program starts with the main function. The easiest way to get the program is to examine the output closely. I create in line 44 the hash function hasVal and use them to calculate the hash values in line 48. hasVal returns a hash value of type std::size_t. std::size_t stands for a sufficiently big enough unsigned integer. MyIntMap in line 53 defines a new name for a type. This type uses MyInt (lines 7 – 13) as a key. Now, MyIntMap needs a hash function and an equality function. It uses MyHash (lines 15 -20) as a hash function. The hash function uses the hash function of the data type
int internally. I already overload the equality function for MyInt.
MyAbsMap follows a different strategy. According to its name, MyAbsMap creates its hash value based on the absolute value of the integer (line 25). I use the class MyEq with the overloaded call operator as an equality function. MyEq is only interested in the absolute value of the integer. The output shows that the hash function of MyAbsMap returns the same hash value for different keys. The result is that the hash value for MyInt(-2) (line 70) is identical to the hash value of MyInt(2). This holds although I didn’t initialize MyAbsMap with the pair (MyInt(2),2).
One piece of the puzzle is still missing to understand hash tables better. The hash function maps the key onto the value. Therefore, the hash functions map the key of type int or std::string to its bucket. How is that possible? On the one hand, we have an almost infinite number of keys but only a finite number of buckets. But that is not the only question I have. How many elements go into one bucket? Or to say it differently. How often does a collision occur? Question to which the next post will answer.
Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Matthieu Bolt, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, Rob North, Bhavith C Achar, and Marco Parri Empoli.
Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, Slavko Radman, and David Poole.
|My special thanks to Embarcadero|
|My special thanks to PVS-Studio|
|My special thanks to Tipi.build|
|My special thanks to Take Up Code|
I’m happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.
- Embedded Programmierung mit modernem C++ 12.12.2023 – 14.12.2023 (Präsenzschulung, Termingarantie)
Standard Seminars (English/German)
Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.
- C++ – The Core Language
- C++ – The Standard Library
- C++ – Compact
- C++11 and C++14
- Concurrency with Modern C++
- Design Pattern and Architectural Pattern with C++
- Embedded Programming with Modern C++
- Generic Programming (Templates) with C++
- Clean Code with Modern C++
- Phone: +49 7472 917441
- Mobil:: +49 176 5506 5086
- Mail: schulung@ModernesCpp.de
- German Seminar Page: www.ModernesCpp.de
- Mentoring Page: www.ModernesCpp.org
Modernes C++ Mentoring,