{"id":5049,"date":"2016-11-27T20:26:06","date_gmt":"2016-11-27T20:26:06","guid":{"rendered":"https:\/\/www.modernescpp.com\/index.php\/hash-functions\/"},"modified":"2023-06-26T12:34:54","modified_gmt":"2023-06-26T12:34:54","slug":"hash-functions","status":"publish","type":"post","link":"https:\/\/www.modernescpp.com\/index.php\/hash-functions\/","title":{"rendered":"Hash Functions"},"content":{"rendered":"<p>The hash function is responsible for the unordered associative containers&#8217; constant access time (best cast). As ever, C++ offers many ways to adjust the behavior of the hash functions. On the one hand, C++ has a lot of different hash functions; on the other hand, you can define your hash function. You can even adjust the number of buckets.<\/p>\n<p><!--more--><\/p>\n<p>&nbsp;<\/p>\n<p>Before I write about the hash functions, I want to look closer at the declaration of the unordered associative containers. So we will not lose the big picture. I take <span style=\"font-family: courier new,courier;\">std::unordered_map<\/span> as the most prominent unordered associative container.<\/p>\n<h2>std::unordered_map<\/h2>\n<p>The declaration of the hash table <span style=\"font-family: courier new,courier;\">std::unordered_map<\/span> reveals a lot of exciting details.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #0000ff;\">template<\/span>&lt;\r\n    <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Key<\/span>,\r\n    <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Val<\/span>,\r\n    <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Hash<\/span> = std::hash&lt;Key&gt;,\r\n    <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">KeyEqual<\/span> = std::equal_to&lt;Key&gt;,\r\n    <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">Allocator<\/span> = std::allocator&lt; std::pair&lt;<span style=\"color: #0000ff;\">const<\/span> Key, Val&gt; &gt;\r\n&gt; <span style=\"color: #0000ff;\">class<\/span> <span style=\"color: #2b91af;\">unordered_map<\/span>;<\/pre>\n<\/div>\n<p>&nbsp;<\/p>\n<p>Let&#8217;s have a closer look at the template parameters. The<span style=\"font-family: courier new,courier;\"> std::unordered_map<\/span> associates the key (<span style=\"font-family: courier new,courier;\">Key<\/span>) with its value (<span style=\"font-family: courier new,courier;\">Val<\/span>). The remaining three template parameters are derived from the key and the value type. This statement holds for the hash function (<span style=\"font-family: courier new,courier;\">Hash<\/span>), the equality function (<span style=\"font-family: courier new,courier;\">KeyEqual<\/span>), and the allocator (<span style=\"font-family: courier new,courier;\">Allocator<\/span>). Therefore, it&#8217;s quite easy to instantiate a <span style=\"font-family: courier new,courier;\">std::unordered_map char2int.<\/span><\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<pre style=\"margin: 0; line-height: 125%;\">std::unordered_map&lt;<span style=\"color: #2b91af;\">char<\/span>,<span style=\"color: #2b91af;\">int<\/span>&gt; char2int;<\/pre>\n<\/div>\n<p>Now it gets more interesting. By default, the hash function <span style=\"font-family: courier new,courier;\">std::hash&lt;Key&gt;<\/span> and the equality function <span style=\"font-family: courier new,courier;\">std::equal_to&lt;Key&gt;<\/span> are used. Therefore, you can instantiate a&nbsp;<span style=\"font-family: courier new,courier;\">std::unordered_map<\/span> with a unique hash or equality function. But stop. Why do we need an equality function? The hash function maps the key to the hash value. The hash value determines the bucket. This mapping can cause a collision. That means different keys go into the same bucket. The<span style=\"font-family: courier new,courier;\"> std::unorderd_map<\/span> has to deal with these collisions. To do that, it uses the equality function only for completeness reasons. You can adjust the allocation strategy of the container with the <span style=\"font-family: courier new,courier;\">Allocator.<\/span><\/p>\n<p>Which requirements have the key and the value of an unordered associative container to fulfill?<\/p>\n<p>The key has to be<\/p>\n<ul>\n<li>comparable with the equality function,<\/li>\n<li>available as a hash value,<\/li>\n<li>copyable and moveable.<\/li>\n<\/ul>\n<p>The value has to be<\/p>\n<ul>\n<li>default constructible,<\/li>\n<li>copyable and moveable.<\/li>\n<\/ul>\n<p>{module title=&#8221;Mentoring&#8221;}<\/p>\n<h2>The hash function<\/h2>\n<p>A hash function is good if their mapping from the keys to the values produces few collisions and the hash values are uniformly distributed among the buckets. Because the execution time of the hash function is constant, the access time of the elements can also be constant. Instead of that, the access time in the bucket is linear. Therefore, the overall access time of a value depends&nbsp;on the number of collisions in the bucket, respectively.<\/p>\n<p>The hash function<\/p>\n<ul>\n<li>is available for fundamental data types like booleans, integers, and floating points.<\/li>\n<li>is available for the data types <span style=\"font-family: courier new,courier;\">std::string<\/span> and <span style=\"font-family: courier new,courier;\">std::wstring,<\/span><\/li>\n<li>creates for C string <span style=\"font-family: courier new,courier;\">const char*<\/span> a hash value of the pointer address,<\/li>\n<li>can be defined for user-defined data types.<\/li>\n<\/ul>\n<p>By applying the theory to my own data types, which I want to use as the key of an unordered associative container, my data type must fulfill two requirements: a hash function and an equality function.<\/p>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div>&nbsp;<\/div>\n<p><!-- HTML generated using hilite.me --><\/p>\n<div style=\"background: #ffffff; overflow: auto; width: auto; gray;border-width: .1em .1em .1em .8em;\">\n<table>\n<tbody>\n<tr>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"> 1\r\n 2\r\n 3\r\n 4\r\n 5\r\n 6\r\n 7\r\n 8\r\n 9\r\n10\r\n11\r\n12\r\n13\r\n14\r\n15\r\n16\r\n17\r\n18\r\n19\r\n20\r\n21\r\n22\r\n23\r\n24\r\n25\r\n26\r\n27\r\n28\r\n29\r\n30\r\n31\r\n32\r\n33\r\n34\r\n35\r\n36\r\n37\r\n38\r\n39\r\n40\r\n41\r\n42\r\n43\r\n44\r\n45\r\n46\r\n47\r\n48\r\n49\r\n50\r\n51\r\n52\r\n53\r\n54\r\n55\r\n56\r\n57\r\n58\r\n59\r\n60\r\n61\r\n62\r\n63\r\n64\r\n65\r\n66\r\n67\r\n68\r\n69\r\n70\r\n71\r\n72\r\n73\r\n74\r\n75\r\n76<\/pre>\n<\/td>\n<td>\n<pre style=\"margin: 0; line-height: 125%;\"><span style=\"color: #008000;\">\/\/ hashFunction.cpp<\/span>\r\n\r\n<span style=\"color: #0000ff;\">#include &lt;iostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;ostream&gt;<\/span>\r\n<span style=\"color: #0000ff;\">#include &lt;unordered_map&gt;<\/span>\r\n\r\n<span style=\"color: #0000ff;\">struct<\/span> MyInt{\r\n  MyInt(<span style=\"color: #2b91af;\">int<\/span> v):val(v){}\r\n  <span style=\"color: #2b91af;\">bool<\/span> <span style=\"color: #0000ff;\">operator<\/span>== (<span style=\"color: #0000ff;\">const<\/span> MyInt&amp; other) <span style=\"color: #0000ff;\">const<\/span> {\r\n    <span style=\"color: #0000ff;\">return<\/span> val == other.val;\r\n  }\r\n  <span style=\"color: #2b91af;\">int<\/span> val;\r\n};\r\n\r\n<span style=\"color: #0000ff;\">struct<\/span> MyHash{\r\n  std::<span style=\"color: #2b91af;\">size_t<\/span> <span style=\"color: #0000ff;\">operator<\/span>()(MyInt m) <span style=\"color: #0000ff;\">const<\/span> {\r\n    std::hash&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; hashVal;\r\n    <span style=\"color: #0000ff;\">return<\/span> hashVal(m.val);\r\n  }\r\n};\r\n\r\n<span style=\"color: #0000ff;\">struct<\/span> MyAbsHash{\r\n  std::<span style=\"color: #2b91af;\">size_t<\/span> <span style=\"color: #0000ff;\">operator<\/span>()(MyInt m) <span style=\"color: #0000ff;\">const<\/span> {\r\n    std::hash&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; hashVal;\r\n    <span style=\"color: #0000ff;\">return<\/span> hashVal(abs(m.val));\r\n  }\r\n};\r\n\r\n<span style=\"color: #0000ff;\">struct<\/span> MyEq{\r\n  <span style=\"color: #2b91af;\">bool<\/span> <span style=\"color: #0000ff;\">operator<\/span>() (<span style=\"color: #0000ff;\">const<\/span> MyInt&amp; l, <span style=\"color: #0000ff;\">const<\/span> MyInt&amp; r) <span style=\"color: #0000ff;\">const<\/span> {\r\n    <span style=\"color: #0000ff;\">return<\/span> abs(l.val) ==  abs(r.val);\r\n  }\r\n};\r\n\r\nstd::ostream&amp; <span style=\"color: #0000ff;\">operator<\/span> &lt;&lt; (std::ostream&amp; strm, <span style=\"color: #0000ff;\">const<\/span> MyInt&amp; myIn){\r\n  strm &lt;&lt; <span style=\"color: #a31515;\">\"MyInt(\"<\/span> &lt;&lt; myIn.val &lt;&lt; <span style=\"color: #a31515;\">\")\"<\/span>;\r\n  <span style=\"color: #0000ff;\">return<\/span> strm;\r\n}\r\n\r\n<span style=\"color: #2b91af;\">int<\/span> main(){\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  std::hash&lt;<span style=\"color: #2b91af;\">int<\/span>&gt; hashVal;\r\n\r\n  <span style=\"color: #008000;\">\/\/ a few hash values<\/span>\r\n  <span style=\"color: #0000ff;\">for<\/span> ( <span style=\"color: #2b91af;\">int<\/span> i= -2; i &lt;= 1 ; ++i){\r\n    std::cout &lt;&lt; <span style=\"color: #a31515;\">\"hashVal(\"<\/span> &lt;&lt; i &lt;&lt; <span style=\"color: #a31515;\">\"): \"<\/span> &lt;&lt; hashVal(i) &lt;&lt; std::endl;\r\n  }\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n  <span style=\"color: #0000ff;\">typedef<\/span> std::unordered_map&lt;MyInt,<span style=\"color: #2b91af;\">int<\/span>,MyHash&gt; MyIntMap;\r\n\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"MyIntMap: \"<\/span>;\r\n  MyIntMap myMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};\r\n\r\n  <span style=\"color: #0000ff;\">for<\/span>(<span style=\"color: #0000ff;\">auto<\/span> m : myMap) std::cout &lt;&lt; <span style=\"color: #a31515;\">'{'<\/span> &lt;&lt; m.first &lt;&lt; <span style=\"color: #a31515;\">','<\/span> &lt;&lt; m.second &lt;&lt; <span style=\"color: #a31515;\">'}'<\/span>;\r\n\r\n  std::cout &lt;&lt; std::endl &lt;&lt; std::endl;\r\n\r\n  <span style=\"color: #0000ff;\">typedef<\/span> std::unordered_map&lt;MyInt,<span style=\"color: #2b91af;\">int<\/span>,MyAbsHash,MyEq&gt; MyAbsMap;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"MyAbsMap: \"<\/span>;\r\n  MyAbsMap myAbsMap{{MyInt(-2),-2},{MyInt(-1),-1},{MyInt(0),0},{MyInt(1),1}};\r\n\r\n  <span style=\"color: #0000ff;\">for<\/span>(<span style=\"color: #0000ff;\">auto<\/span> m : myAbsMap) std::cout &lt;&lt; <span style=\"color: #a31515;\">'{'<\/span> &lt;&lt; m.first &lt;&lt; <span style=\"color: #a31515;\">','<\/span> &lt;&lt; m.second &lt;&lt; <span style=\"color: #a31515;\">'}'<\/span>;\r\n\r\n  std::cout &lt;&lt; std::endl &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"myAbsMap[MyInt(-2)]: \"<\/span> &lt;&lt; myAbsMap[MyInt(-2)] &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"myAbsMap[MyInt(2)]: \"<\/span> &lt;&lt; myAbsMap[MyInt(2)] &lt;&lt; std::endl;\r\n  std::cout &lt;&lt; <span style=\"color: #a31515;\">\"myAbsMap[MyInt(3)]: \"<\/span> &lt;&lt; myAbsMap[MyInt(3)] &lt;&lt; std::endl;\r\n\r\n  std::cout &lt;&lt; std::endl;\r\n\r\n}\r\n<\/pre>\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div>&nbsp;<\/div>\n<p>&nbsp;<\/p>\n<p>My analysis of the program starts with the&nbsp;<span style=\"font-family: courier new,courier;\">main<\/span> function. The easiest way to get the program is to examine the output closely. I create in line 44 the hash function <span style=\"font-family: courier new,courier;\">hasVal<\/span> and use them to calculate the hash values in line 48. <span style=\"font-family: courier new,courier;\">hasVal<\/span> returns a hash value of type <span style=\"font-family: courier new,courier;\">std::size_t. std::size_t<\/span> stands for a sufficiently big enough unsigned integer.&nbsp; <span style=\"font-family: courier new,courier;\">MyIntMap<\/span> in line 53 defines a new name for a type. This type uses <span style=\"font-family: courier new,courier;\">MyInt<\/span> (lines 7 &#8211; 13) as a key. Now, <span style=\"font-family: courier new,courier;\">MyIntMap<\/span> needs a hash function and an equality function. It uses <span style=\"font-family: courier new,courier;\">MyHash<\/span> (lines 15 -20) as a hash function. The hash function uses the hash function of the data type <code>int<\/code> internally. I already overload the equality function for MyInt.<\/p>\n<p><span style=\"font-family: courier new,courier;\">MyAbsMap<\/span> follows a different strategy. According to its name, <span style=\"font-family: courier new,courier;\">MyAbsMap<\/span> creates its hash value based on the absolute value of the integer (line 25). I use the class <span style=\"font-family: courier new,courier;\">MyEq<\/span> with the overloaded call operator as an equality function. <span style=\"font-family: courier new,courier;\">MyEq<\/span> is only interested in the absolute value of the integer. The output shows that the hash function of <span style=\"font-family: courier new,courier;\">MyAbsMap<\/span> returns the same hash value for different keys. The result is that the hash value for <span style=\"font-family: courier new,courier;\">MyInt(-2)<\/span> (line 70) is identical to the hash value of <span style=\"font-family: courier new,courier;\">MyInt(2)<\/span>. This holds although I didn&#8217;t initialize<span style=\"font-family: courier new,courier;\"> MyAbsMap<\/span> with the pair <span style=\"font-family: courier new,courier;\">(MyInt(2),2)<\/span>.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<img loading=\"lazy\" decoding=\"async\" class=\" size-full wp-image-5048\" src=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/hashfunction.png\" alt=\"hashfunction\" width=\"601\" height=\"344\" srcset=\"https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/hashfunction.png 601w, https:\/\/www.modernescpp.com\/wp-content\/uploads\/2016\/11\/hashfunction-300x172.png 300w\" sizes=\"auto, (max-width: 601px) 100vw, 601px\" \/><\/p>\n<p>&nbsp;<\/p>\n<h2>What&#8217;s next?<\/h2>\n<p>One piece of the puzzle is still missing to understand hash tables better. The hash function maps the key onto the value. Therefore, the hash functions map the key of type <span style=\"font-family: courier new,courier;\">int<\/span> or <span style=\"font-family: courier new,courier;\">std::string<\/span> to its bucket. How is that possible? On the one hand, we have an almost infinite number of keys but only a finite number of buckets. But that is not the only question I have. How many elements go into one bucket? Or to say it differently. How often does a collision occur? Question to which the <a href=\"https:\/\/www.modernescpp.com\/index.php\/buckets-capacity-and-load-factor\">next post<span id=\"transmark\"><\/span><\/a> will answer.<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>The hash function is responsible for the unordered associative containers&#8217; constant access time (best cast). As ever, C++ offers many ways to adjust the behavior of the hash functions. On the one hand, C++ has a lot of different hash functions; on the other hand, you can define your hash function. You can even adjust [&hellip;]<\/p>\n","protected":false},"author":21,"featured_media":5048,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[364],"tags":[472],"class_list":["post-5049","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-embedded","tag-associative-containers"],"_links":{"self":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5049","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/users\/21"}],"replies":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/comments?post=5049"}],"version-history":[{"count":1,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5049\/revisions"}],"predecessor-version":[{"id":6921,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/posts\/5049\/revisions\/6921"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media\/5048"}],"wp:attachment":[{"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/media?parent=5049"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/categories?post=5049"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.modernescpp.com\/index.php\/wp-json\/wp\/v2\/tags?post=5049"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}