You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the current implementation, the gcc backend of rustc currently emits the
following for a function that implements popcount for a u32 (x86_64 targeting
AVX2, using standard unix calling convention):
popcount:
mov eax, edi
and edi, 1431655765
shr eax
and eax, 1431655765
add edi, eax
mov edx, edi
and edi, 858993459
shr edx, 2
and edx, 858993459
add edx, edi
mov eax, edx
and edx, 252645135
shr eax, 4
and eax, 252645135
add eax, edx
mov edx, eax
and eax, 16711935
shr edx, 8
and edx, 16711935
add edx, eax
movzx eax, dx
shr edx, 16
add eax, edx
ret
Rather than using this implementation, gcc could be told to use Wenger's
algorithm. This would give the same function the following implementation:
popcount:
xor eax, eax
xor edx, edx
popcnt eax, edi
test edi, edi
cmove eax, edx
ret
This patch implements the popcount operation in terms of Wenger's algorithm in
all cases.
Signed-off-by: Andy Sadler <[email protected]>
0 commit comments