strchr.com commentshttp://www.strchr.comPerfectionistic and minimalistic programming.1440Peter Kankowski on Hash functions: An empirical comparisonWed, 06 Feb 2019 04:05:48 +0700http://www.strchr.com/hash_functions#comment_771<p>Thanks, I will test your function.</p> Leonid Yuriev on Hash functions: An empirical comparisonTue, 05 Feb 2019 22:23:24 +0700http://www.strchr.com/hash_functions#comment_770<p>It seems t1ha superior to all of the above functions, both in speed and in quality.</p> <p>Of course it could be used with Folly' F14 hash table.</p> <p><a href="https://github.com/PositiveTechnologies/t1ha" rel=nofollow>https://github.com/PositiveTechnologies/t1ha</a></p> <p></p> Peter Kankowski on Fast strlen functionFri, 28 Dec 2018 02:15:47 +0700http://www.strchr.com/optimized_strlen_function#comment_769Thank you, it's something I missed in the blog post.RHyde on Fast strlen functionThu, 27 Dec 2018 12:33:03 +0700http://www.strchr.com/optimized_strlen_function#comment_768<p>FWIW, here is an implementation that prevents crossing MMU pages using SSE instructions. String addresses are passed in RSI and RDI.</p> <pre> ; strcmp2- ; ; String comparison using pcmpistri instruction ; and computing string lengths ahead of time. strcmp2 proc xmm0Save textequ &lt;[rsp]&gt; xmm1Save textequ &lt;[rsp+16]&gt; push rsi push rdi push rcx sub rsp, 32 movdqu xmm0Save, xmm0 movdqu xmm1Save, xmm1 ; Make RDI dependent on RSI so we can ; step through RDI by incrementing RSI: sub rdi, rsi ; Okay, 16-byte align string 1 (pointed ; at by RSI): paraAlign: mov al, [rsi] cmp al, [rdi][rsi*1] jne different ; Move on to next character inc rsi ; Check for end of string: cmp al, 0 je same ; Now we need to see if we've aligned RSI on ; a 16-byte boundary test rsi, 0fh jnz paraAlign ; RSI is now paragraph aligned. We can fetch blocks of ; 16 bytes at [RSI] without fear of a general protection ; fault. However, we don't know what RDI's alignment is, ; so we have to test to see if it's legal to fetch 16 bytes ; at a time from RDI (which is really [rdi][rsi*1] at this ; point). sub rsi, 16 scLoop: add rsi, 16 ;On to next block. scLoop2: lea rcx, [rdi][rsi*1] ;Check the src2 and rcx, 0fffh ; block to see if cmp rcx, 0fh ; there are at jbe lt16inPage ; least 16 bytes ; left on MMU page. ; If we have at least 16 bytes left on the MMU page for the ; src2 block, then use pcmpistri to compare the 16 bytes ; at src1 (which we know is completely on an MMU page as ; RSI is 16-byte aligned) against the 16 bytes at src2 ; (RDI+RSI). Load src1 bytes into XMM1 and use src2 as ; the pcmpistri operand (because we can use movdqa for ; src1, as it is aligned, and pcmpistri allows non-aligned ; accesses). isAligned: movdqa xmm1, [rsi] pcmpistri xmm1, [rsi][rdi*1], scFlags ja scLoop ;Equal, no zero bytes jc different2 ;Not equal ; At this point, the zero flag must be set, so there ; must have been a zero byte in src1 or src2. As the ; characters also match, the strings must be equal. same: xor rax, rax jmp exit ; lt16inPage- ; ; Code transfers to this label when there are fewer ; than 16 characters left in the src2 memory page. ; We must compare byte-by-byte until we hit a zero ; byte or cross the MMU page boundary. ; ; Note that if RCX is zero at this point, then ; RCX is already 16-byte aligned and we can jump ; right back up to the loop above. lt16inPage: jrcxz isAligned cmpUpTo16: mov al, [rsi] cmp al, [rdi][rsi*1] jne different inc rsi cmp al, 0 je same dec rcx jnz cmpUpTo16 jmp paraAlign ; Branch to this point from the code where we were ; aligning RSI to a 16-byte boundary and found a ; different character (betwen RSI and RDI). different2: add rsi, rcx different: mov al, [rsi] sub al, [rdi][rsi*1] movsx rax, al exit: movdqa xmm0, xmm0Save movdqa xmm1, xmm1Save add rsp, 32 pop rcx pop rdi pop rsi ret strcmp2 endp </pre>RHyde on Implementing strcmp, strlen, and strstr using SSE 4.2 instructionsThu, 27 Dec 2018 11:59:25 +0700http://www.strchr.com/strcmp_and_strlen_using_sse_4.2#comment_767<p>Just recently came across this thread.</p> <p>There is one issue with the pcmpistri (and most SSE-based) algorithms- they *always* fetch 16 bytes even if the zero byte is among them. In the (admittedly very) rare case where some of those bytes beyond the zero-terminating byte cross an MMU page boundary, it is possible to get a general protection fault.</p> <p></p> <p>Granted, the possibility is quite low. However, you would not want to use such an algorithm for a mission critical application (say, nuclear reactor or life-critical stuff) unless you can guarantee that there are at least 15 bytes of real memory after the zero byte.</p>