Hash Functions

Contents[Show]

The hash function is responsible for the unordered associative containers' constant access time (best cast). As ever, C++ offers many ways to adjust the behavior of the hash functions. On the one hand, C++ has a lot of different hash functions; on the other hand, you can define your hash function. You can even adjust the number of buckets.

 

Before I write about the hash functions, I want to look closer at the declaration of the unordered associative containers. So we will not lose the big picture. I take std::unordered_map as the most prominent unordered associative container.

std::unordered_map

The declaration of the hash table std::unordered_map reveals a lot of exciting details.

template<
    class Key,
    class Val,
    class Hash = std::hash<Key>,
    class KeyEqual = std::equal_to<Key>,
    class Allocator = std::allocator< std::pair<const Key, Val> >
> class unordered_map;

 

Let's have a closer look at the template parameters. The std::unordered_map associates the key (Key) with its value (Val). The remaining three template parameters are derived from the key and the value type. This statement holds for the hash function (Hash), the equality function (KeyEqual), and the allocator (Allocator). Therefore, it's quite easy to instantiate a std::unordered_map char2int.

std::unordered_map<char,int> char2int;

Now it gets more interesting. By default, the hash function std::hash<Key> and the equality function std::equal_to<Key> are used. Therefore, you can instantiate a std::unordered_map with a unique hash or equality function. But stop. Why do we need an equality function? The hash function maps the key to the hash value. The hash value determines the bucket. This mapping can cause a collision. That means different keys go into the same bucket. The std::unorderd_map has to deal with these collisions. To do that, it uses the equality function only for completeness reasons. You can adjust the allocation strategy of the container with the Allocator.

Which requirements have the key and the value of an unordered associative container to fulfill?

The key has to be

  • comparable with the equality function,
  • available as a hash value,
  • copyable and moveable.

The value has to be

  • default constructible,
  • copyable and moveable.

 

Rainer D 6 P2 540x540Modernes C++ Mentoring

Be part of my mentoring programs:

 

 

 

 

Do you want to stay informed about my mentoring programs: Subscribe via E-Mail.

The hash function

A hash function is good if their mapping from the keys to the values produces few collisions and the hash values are uniformly distributed among the buckets. Because the execution time of the hash function is constant, the access time of the elements can also be constant. Instead of that, the access time in the bucket is linear. Therefore, the overall access time of a value depends on the number of collisions in the bucket, respectively.

The hash function

  • is available for fundamental data types like booleans, integers, and floating points.
  • is available for the data types std::string and std::wstring,
  • creates for C string const char* a hash value of the pointer address,
  • can be defined for user-defined data types.

By applying the theory to my own data types, which I want to use as the key of an unordered associative container, my data type must fulfill two requirements: a hash function and an equality function.

 
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
// hashFunction.cpp

#include <iostream>
#include <ostream>
#include <unordered_map>

struct MyInt{
  MyInt(int v):val(v){}
  bool operator== (const MyInt& other) const {
    return val == other.val;
  }
  int val;
};

struct MyHash{
  std::size_t operator()(MyInt m) const {
    std::hash<int> hashVal;
    return hashVal(m.val);
  }
};

struct MyAbsHash{
  std::size_t operator()(MyInt m) const {
    std::hash<int> hashVal;
    return hashVal(abs(m.val));
  }
};

struct MyEq{
  bool operator() (const MyInt& l, const MyInt& r) const {
    return abs(l.val) ==  abs(r.val);
  }
};

std::ostream& operator << (std::ostream& strm, const MyInt& myIn){
  strm << "MyInt(" << myIn.val << ")";
  return strm;
}

int main(){

  std::cout << std::endl;

  std::hash<int> hashVal;

  // a few hash values
  for ( int i= -2; i <= 1 ; ++i){
    std::cout << "hashVal(" << i << "): " << hashVal(i) << std::endl;
  }

  std::cout << std::endl;

  typedef std::unordered_map<MyInt,int,MyHash> MyIntMap;

  std::cout << "MyIntMap: ";
  MyIntMap myMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};

  for(auto m : myMap) std::cout << '{' << m.first << ',' << m.second << '}';

  std::cout << std::endl << std::endl;

  typedef std::unordered_map<MyInt,int,MyAbsHash,MyEq> MyAbsMap;
  std::cout << "MyAbsMap: ";
  MyAbsMap myAbsMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};

  for(auto m : myAbsMap) std::cout << '{' << m.first << ',' << m.second << '}';

  std::cout << std::endl << std::endl;

  std::cout << "myAbsMap[MyInt(-2)]: " << myAbsMap[MyInt(-2)] << std::endl;
  std::cout << "myAbsMap[MyInt(2)]: " << myAbsMap[MyInt(2)] << std::endl;
  std::cout << "myAbsMap[MyInt(3)]: " << myAbsMap[MyInt(3)] << std::endl;

  std::cout << std::endl;

}
 

 

My analysis of the program starts with the main function. The easiest way to get the program is to examine the output closely. I create in line 44 the hash function hasVal and use them to calculate the hash values in line 48. hasVal returns a hash value of type std::size_t. std::size_t stands for a sufficiently big enough unsigned integer.  MyIntMap in line 53 defines a new name for a type. This type uses MyInt (lines 7 - 13) as a key. Now, MyIntMap needs a hash function and an equality function. It uses MyHash (lines 15 -20) as a hash function. The hash function uses the hash function of the data type int internally. I already overload the equality function for MyInt.

MyAbsMap follows a different strategy. According to its name, MyAbsMap creates its hash value based on the absolute value of the integer (line 25). I use the class MyEq with the overloaded call operator as an equality function. MyEq is only interested in the absolute value of the integer. The output shows that the hash function of MyAbsMap returns the same hash value for different keys. The result is that the hash value for MyInt(-2) (line 70) is identical to the hash value of MyInt(2). This holds although I didn't initialize MyAbsMap with the pair (MyInt(2),2).

 

 hashfunction

 

What's next?

One piece of the puzzle is still missing to understand hash tables better. The hash function maps the key onto the value. Therefore, the hash functions map the key of type int or std::string to its bucket. How is that possible? On the one hand, we have an almost infinite number of keys but only a finite number of buckets. But that is not the only question I have. How many elements go into one bucket? Or to say it differently. How often does a collision occur? Question to which the next post will answer.

 

 

 

 

 

 

Thanks a lot to my Patreon Supporters: Matt Braun, Roman Postanciuc, Tobias Zindl, G Prvulovic, Reinhold Dröge, Abernitzke, Frank Grimm, Sakib, Broeserl, António Pina, Sergey Agafyin, Андрей Бурмистров, Jake, GS, Lawton Shoemake, Animus24, Jozo Leko, John Breland, Venkat Nandam, Jose Francisco, Douglas Tinkham, Kuchlong Kuchlong, Robert Blanch, Truels Wissneth, Kris Kafka, Mario Luoni, Friedrich Huber, lennonli, Pramod Tikare Muralidhara, Peter Ware, Daniel Hufschläger, Alessandro Pezzato, Bob Perry, Satish Vangipuram, Andi Ireland, Richard Ohnemus, Michael Dunsky, Leo Goodstadt, John Wiederhirn, Yacob Cohen-Arazi, Florian Tischler, Robin Furness, Michael Young, Holger Detering, Bernd Mühlhaus, Matthieu Bolt, Stephen Kelley, Kyle Dean, Tusar Palauri, Dmitry Farberov, Juan Dent, George Liao, Daniel Ceperley, Jon T Hess, Stephen Totten, Wolfgang Fütterer, Matthias Grün, Phillip Diekmann, Ben Atakora, Ann Shatoff, and Rob North.

 

Thanks, in particular, to Jon Hess, Lakshman, Christian Wittenhorst, Sherhy Pyton, Dendi Suhubdy, Sudhakar Belagurusamy, Richard Sargeant, Rusty Fleming, John Nebel, Mipko, Alicja Kaminska, and Slavko Radman.

 

 

My special thanks to Embarcadero CBUIDER STUDIO FINAL ICONS 1024 Small

 

My special thanks to PVS-Studio PVC Logo

 

My special thanks to Tipi.build tipi.build logo

 

My special thanks to Take Up Code TakeUpCode 450 60

 

Seminars

I'm happy to give online seminars or face-to-face seminars worldwide. Please call me if you have any questions.

Bookable (Online)

German

Standard Seminars (English/German)

Here is a compilation of my standard seminars. These seminars are only meant to give you a first orientation.

  • C++ - The Core Language
  • C++ - The Standard Library
  • C++ - Compact
  • C++11 and C++14
  • Concurrency with Modern C++
  • Design Pattern and Architectural Pattern with C++
  • Embedded Programming with Modern C++
  • Generic Programming (Templates) with C++

New

  • Clean Code with Modern C++
  • C++20

Contact Me

Modernes C++,

RainerGrimmDunkelBlauSmall

Comments   

+2 #1 internet businesses 2016-12-12 21:06
A person necessarily assist to make critically posts I'd state.
That is the very first time I frequented your website page and so far?

I amazed with the research you made to make this actual publish extraordinary.
Fantastic process!
Quote
+1 #2 Thanks 2017-11-26 17:08
Thanks for the great info.
Quote
+1 #3 Edwardo 2017-12-27 13:32
Hello there I am so grateful I found your blog, I really found you by
error, while I was looking on Aol for something else, Nonetheless I am here now
and would just like to say many thanks for a marvelous post
and a all round entertaining blog (I also love the theme/design), I don’t have time to
look over it all at the moment but I have saved it and also added in your
RSS feeds, so when I have time I will be back to read much
more, Please do keep up the excellent b.
Quote

Stay Informed about my Mentoring

 

Mentoring

English Books

Course: Modern C++ Concurrency in Practice

Course: C++ Standard Library including C++14 & C++17

Course: Embedded Programming with Modern C++

Course: Generic Programming (Templates)

Course: C++ Fundamentals for Professionals

Course: The All-in-One Guide to C++20

Course: Master Software Design Patterns and Architecture in C++

Subscribe to the newsletter (+ pdf bundle)

All tags

Blog archive

Source Code

Visitors

Today 3459

Yesterday 5555

Week 33667

Month 55341

All 12133550

Currently are 185 guests and no members online

Kubik-Rubik Joomla! Extensions

Latest comments