strchr.com commentshttp://www.strchr.comPerfectionistic and minimalistic programming.1440Georgi 'Sanmayce' on Hash functions: An empirical comparisonTue, 15 Oct 2019 07:48:19 +0700http://www.strchr.com/hash_functions#comment_778<p>Peter, I see no ways to better the 32bit code hashers, so the fastest known to me 32bit hasher in 64bit code is:</p> <p><a href="https://forum.thefreedictionary.com/postsm1118964_MASAKARI--The-people-s-choice--General-Purpose-Grade--English-wordlist.aspx#1118964" rel=nofollow>https://forum.thefreedictionary.com/postsm1118964_MASAKARI--The-people-s-choice--General-Purpose-Grade--English-wordlist.aspx#1118964</a></p> Georgi 'Sanmayce' on Hash functions: An empirical comparisonFri, 04 Oct 2019 02:28:34 +0700http://www.strchr.com/hash_functions#comment_777<pre>Hashing Faster than SSE4.2 iSCSI-CRC </pre><p>The feed:</p> <p><a href="https://github.com/wangyi-fudan/wyhash/issues/29#issuecomment-538078396" rel=nofollow>https://github.com/wangyi-fudan/wyhash/issues/29#issuecomment-538078396</a></p> <p><a href="https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/824947#comment-1946227" rel=nofollow>https://software.intel.com/en-us/forums/intel-moderncode-for-parallel-architectures/topic/824947#comment-1946227</a></p> <p></p> <p>Dummy me, had to fix v2, now everything is OK, my excuse - yesterday, have been distracted the whole day.</p> <p>So, here comes v3:</p> <p></p> <p></p> <pre>#define _rotl_KAZE(x, n) (((x) &lt;&lt; (n)) | ((x) &gt;&gt; (32-(n)))) #define _PADr_KAZE(x, n) ( ((x) &lt;&lt; (n))&gt;&gt;(n) ) #define ROLInBits 27 // 5 in r.1; Caramba: it should be ROR by 5 not ROL, from the very beginning the idea was to mix two bytes by shifting/masking the first 5 'noisy' bits (ASCII 0-31 symbols). // CAUTION: Add 8 more bytes to the buffer being hashed, usually malloc(...+8) - to prevent out of boundary reads! uint32_t FNV1A_Hash_Yorikke_v3(const char *str, uint32_t wrdlen) { const uint32_t PRIME = 591798841; uint32_t hash32 = 2166136261; uint64_t PADDEDby8; const char *p = str; for(; wrdlen &gt; 2*sizeof(uint32_t); wrdlen -= 2*sizeof(uint32_t), p += 2*sizeof(uint32_t)) { hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ (*(uint32_t *)(p+0)) ) * PRIME; hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ (*(uint32_t *)(p+4)) ) * PRIME; } // Here 'wrdlen' is 1..8 PADDEDby8 = _PADr_KAZE(*(uint64_t *)(p+0), (8-wrdlen)&lt;&lt;3); // when (8-8) the QWORD remains intact hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(uint32_t *)((char *)&amp;PADDEDby8+0) ) * PRIME; hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(uint32_t *)((char *)&amp;PADDEDby8+4) ) * PRIME; return hash32 ^ (hash32 &gt;&gt; 16); } // Last touch: 2019-Oct-03, Kaze </pre>Georgi 'Sanmayce' on Hash functions: An empirical comparisonSun, 29 Sep 2019 17:58:57 +0700http://www.strchr.com/hash_functions#comment_776<p>Hi Peter,</p> <p>glad to share the latest-n-fastest FNV1A variant.</p> <p>For a long time I knew how much more is out there, many coders shared very nice etudes, but my 'Yorikke' has something special, the ... Zennish approach embedded :P</p> <p>Currently I am writing an insane matchfinder using B-trees while hashing millions of keys of order 4,6,8,10,12,14,16,18,36,64, thus a hasher of superhigh speed (FOR SMALL KEYS) is needed since the B-trees are constructed in multi-passes and billions of hash invocations of Yorikke are to be used. Latency is crucial, throughput is meh.</p> <p></p> <pre>#define _rotl_KAZE(x, n) (((x) &lt;&lt; (n)) | ((x) &gt;&gt; (32-(n)))) #define _PADr_KAZE(x, n) ( ((x) &lt;&lt; (n))&gt;&gt;(n) ) #define ROLInBits 27 // 5 in r.1; Caramba: it should be ROR by 5 not ROL, from the very beginning the idea was to mix two bytes by shifting/masking the first 5 'noisy' bits (ASCII 0-31 symbols). UINT Hash_Yorikke(const char *str, SIZE_T wrdlen) { const UINT PRIME = 591798841; UINT hash32 = 2166136261; const char *p = str; long long PADDEDby8; for(; wrdlen &gt;= 2*sizeof(DWORD); wrdlen -= 2*sizeof(DWORD), p += 2*sizeof(DWORD)) { //hash32 = (hash32 ^ (_rotl(*(DWORD *)p,ROLInBits) ^ *(DWORD *)(p+4))) * PRIME; hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)(p+0) ) * PRIME; hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)(p+4) ) * PRIME; } PADDEDby8 = _PADr_KAZE(*(long long *)(p+0), (8/1-(wrdlen&amp;(8/1-1)))&lt;&lt;3); hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)((char *)&amp;PADDEDby8+0) ) * PRIME; hash32 = ( _rotl_KAZE(hash32,ROLInBits) ^ *(DWORD *)((char *)&amp;PADDEDby8+(8/1)/2) ) * PRIME; return hash32 ^ (hash32 &gt;&gt; 16); } // The very instrumental and informative page of Peter Kankowski, first column is time (smaller-better), last one is collisions (smaller-better): /* dic_common_words.txt </pre><p>500 lines read</p> <p>1024 elements in the table (10 bits)</p> <pre> Jesteress: 55 [ 110] Meiyan: 56 [ 102] Yorikke: 54 [ 98] ! Best Speed, Best Dispersion ! on Core 2, 32bit executable FNV-1a: 69 [ 124] Larson: 68 [ 99] CRC-32: 65 [ 101] Murmur2: 71 [ 103] Murmur3: 68 [ 101] XXHfast32: 80 [ 110] XXHstrong32: 80 [ 109] dic_fr.txt </pre><p>13408 lines read</p> <p>32768 elements in the table (15 bits)</p> <pre> Jesteress: 1757 [ 2427] Meiyan: 1775 [ 2377] Yorikke: 1672 [ 2413] ! Best Speed, - ! on Core 2, 32bit executable FNV-1a: 2097 [ 2446] Larson: 2033 [ 2447] CRC-32: 2140 [ 2400] Murmur2: 2266 [ 2399] Murmur3: 2116 [ 2376] XXHfast32: 2428 [ 2494] XXHstrong32: 2431 [ 2496] dic_ip.txt </pre><p>3925 lines read</p> <p>8192 elements in the table (13 bits)</p> <pre> Jesteress: 436 [ 819] Meiyan: 451 [ 807] Yorikke: 486 [ 789] ! - , Best Dispersion ! on Core 2, 32bit executable FNV-1a: 614 [ 796] Larson: 587 [ 789] CRC-32: 589 [ 802] Murmur2: 566 [ 825] Murmur3: 549 [ 818] XXHfast32: 704 [ 829] XXHstrong32: 704 [ 829] dic_numbers.txt </pre><p>500 lines read</p> <p>1024 elements in the table (10 bits)</p> <pre> Jesteress: 40 [ 300] Meiyan: 32 [ 125] Yorikke: 37 [ 82] ! - , - ! on Core 2, 32bit executable FNV-1a: 35 [ 108] Larson: 26 [ 16] CRC-32: 34 [ 64] Murmur2: 45 [ 104] Murmur3: 42 [ 104] XXHfast32: 53 [ 102] XXHstrong32: 53 [ 102] dic_postfix.txt </pre><p>500 lines read</p> <p>1024 elements in the table (10 bits)</p> <pre> Jesteress: 70 [ 106] Meiyan: 74 [ 112] Yorikke: 76 [ 99] ! - , - ! on Core 2, 32bit executable FNV-1a: 159 [ 105] Larson: 160 [ 105] CRC-32: 129 [ 94] Murmur2: 99 [ 111] Murmur3: 98 [ 105] XXHfast32: 76 [ 106] XXHstrong32: 82 [ 112] dic_prefix.txt </pre><p>500 lines read</p> <p>1024 elements in the table (10 bits)</p> <pre> Jesteress: 73 [ 102] Meiyan: 77 [ 106] Yorikke: 79 [ 94] ! - , Best Dispersion ! on Core 2, 32bit executable FNV-1a: 165 [ 94] Larson: 161 [ 99] CRC-32: 135 [ 107] Murmur2: 103 [ 106] Murmur3: 101 [ 103] XXHfast32: 77 [ 103] XXHstrong32: 82 [ 102] dic_Shakespeare.txt </pre><p>3228 lines read</p> <p>8192 elements in the table (13 bits)</p> <pre> Jesteress: 357 [ 585] Meiyan: 366 [ 588] Yorikke: 349 [ 536] ! Best Speed, - ! on Core 2, 32bit executable FNV-1a: 419 [ 555] Larson: 404 [ 583] CRC-32: 433 [ 563] Murmur2: 471 [ 566] Murmur3: 443 [ 555] XXHfast32: 493 [ 491] XXHstrong32: 493 [ 491] dic_variables.txt </pre><p>1842 lines read</p> <p>4096 elements in the table (12 bits)</p> <pre> Jesteress: 249 [ 366] Meiyan: 256 [ 350] Yorikke: 240 [ 351] ! Best Speed, - ! on Core 2, 32bit executable FNV-1a: 318 [ 374] Larson: 313 [ 366] CRC-32: 309 [ 338] Murmur2: 318 [ 383] Murmur3: 299 [ 334] XXHfast32: 336 [ 347] XXHstrong32: 339 [ 355] */ </pre>Anon on Implementing strcmp, strlen, and strstr using SSE 4.2 instructionsTue, 30 Jul 2019 11:23:30 +0700http://www.strchr.com/strcmp_and_strlen_using_sse_4.2#comment_775<pre>&gt;&gt;RHyde </pre><p>Note that as memory is only protected in page sized units, as long as you ensure proper alignment before you start using SSE/AVX instructions you will be safe. But yes, any strcmp/strlen style function that uses SSE/AVX/etc. without a buffer size parameter, must first read in smaller units until it gets to a 16/32 byte aligned address, after which it can start using SSE/AVX safely without having to worry about faulting even if it reads past the end of the string.</p> Sandis on Using DAWG for predictive text inputTue, 16 Jul 2019 12:59:17 +0700http://www.strchr.com/dawg_predictive#comment_774<p>&quot;I'm very embarrassed that I cannot understand how branching in git works&quot; - you are not alone, my friend :D</p>