I've implemented only four ways to do this:
- using look up table 0..255 (no iterations, oen sum and a few shifting by 8 bits)
- naive (number of iterations == number of bits)
- Brian Kernighan way (number of iterations == number of bits set)
- pythonic by ugly (count "1"s after calling bin()...)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/env python | |
""" :mod: popcnt | |
============ | |
.. module: popcnt | |
:synopsis: couting bits set ("1") in int or long | |
.. moduleauthor:: Krzysztof.Ciba@NOSPAMgmail.com | |
""" | |
## bit count look up table in range 0..255 | |
__bcLUT = [ 0 ] * 256 | |
for i in range(256): | |
__bcLUT[i] = ( i & 1 ) + __bcLUT[i/2] | |
def popcntLUT( arg ): | |
""" popcnt using look up table """ | |
assert type(arg) in ( int, long ) | |
e = lambda x, y: x + int( bool( y ) ) << 3 | |
return sum( [ __bcLUT[ ( arg >> i ) & 255 ] | |
for i in range( 0, e( *divmod( arg.bit_length(), 8) ), 8 ) ] ) | |
def popcntBK( arg ): | |
""" Brian Kernighan way - loop only for bits set """ | |
assert type(arg) in ( int, long ) | |
pop = 0 | |
while arg: | |
arg &= arg - 1 | |
pop += 1 | |
return pop | |
def popcntNaive( arg ): | |
""" naive - loop for every bit in arg """ | |
assert type(arg) in ( int, long ) | |
pop = 0 | |
while arg: | |
pop += arg & 1 | |
arg >>= 1 | |
return pop | |
def popcntUgly( arg ): | |
""" ugly but pythonic way? """ | |
assert type(arg) in ( int, long ) | |
return bin(arg).count("1") | |
# # eep! | |
if __name__ == "__main__": | |
# dummy testing | |
for i in range(1024): | |
ugly = popcntUgly(i) | |
lut = popcntLUT(i) | |
bk = popcntBK(i) | |
naive = popcntNaive(i) | |
assert ugly == lut | |
assert ugly == bk | |
assert ugly == naive |
All function were run on 1000 samples made by 100 random numbers:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# --- testBK.py | |
import sys | |
import random | |
from popcnt import popcntBK | |
up = 2**256 | |
for i in range(1, 1001): | |
if not i % 10: | |
print ".", | |
sys.stdout.flush() | |
if not i % 100: | |
sys.stdout.flush() | |
for j in range(1000): | |
popcntBK( random.randint( 0, up ) ) | |
# --- testLUT.py | |
import sys | |
import random | |
from popcnt import popcntLUT | |
up = 2**256 | |
for i in range(1, 1001): | |
if not i % 10: | |
print ".", | |
sys.stdout.flush() | |
if not i % 100: | |
sys.stdout.flush() | |
for j in range(1000): | |
popcntLUT( random.randint( 0, up ) ) | |
# --- testNaive.py | |
import sys | |
import random | |
from popcnt import popcntNaive | |
up = 2**256 | |
for i in range(1, 1001): | |
if not i % 10: | |
print ".", | |
sys.stdout.flush() | |
if not i % 100: | |
sys.stdout.flush() | |
for j in range(1000): | |
popcntNaive( random.randint( 0, up ) ) | |
# --- testUgly.py | |
import sys | |
import random | |
from popcnt import popcntUgly | |
up = 2**256 | |
for i in range(1, 1001): | |
if not i % 10: | |
print ".", | |
sys.stdout.flush() | |
if not i % 100: | |
sys.stdout.flush() | |
for j in range(1000): | |
popcntUgly( random.randint( 0, up ) ) | |
And now testing results:
- naive (Zzzzz...)
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
175.885u 0.055s 2:56.35 99.7% 0+0k 0+0io 0pf+0w
- BK way (much better!)
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
75.853u 0.028s 1:16.03 99.7% 0+0k 0+0io 0pf+0w
- look up table (niiiice!)
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
42.828u 0.030s 0:42.93 99.8% 0+0k 0+0io 0pf+0w
- ugly but pythonic (wut?)
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
40.880u 0.040s 0:41.06 99.6% 0+0k 0+0io 0pf+0w
And guess what? Ugly (but pythonic) won! Coincidence? I don't think so... ;)
EDIT:
And of course I can completely remove lines with lambda and all this stuff within popcntLUT and it could be coded just like that:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
## bit count look up table in range 0..255 | |
__bcLUT = [ 0 ] * 256 | |
for i in range(256): | |
__bcLUT[i] = ( i & 1 ) + __bcLUT[i/2] | |
def popcntLUT( arg ): | |
""" popcnt using look up table """ | |
assert type(arg) in ( int, long ) | |
return sum( [ __bcLUT[ ( arg >> i ) & 255 ] | |
for i in range( 0, arg.bit_length(), 8 ) ] ) |
But it doesn't help to much.
Also one can check say 16 bits look up table, say:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
__bc16b = [ 0 ] * (256*256) | |
for i in range(256*256): | |
__bc16b[i] = ( i & 1 ) + __bc16b[i/2] | |
def popcnt16b( arg ): | |
""" popcnt using 16bits loom up table """ | |
return sum( [ __bc16b[( arg >> i ) & 65535 ] for i in range( 0, arg.bit_length(), 16 ) ] ) |
This should run at least 2 times faster that original popcntLUT.
What if LUT would be coded without lambda function? They are famous for code slowdown.
OdpowiedzUsuńIt wouldn't helped too much (~0.5-0.6 second faster). IMHO it's a divmod which is slowing down in LUT case, but I need to recheck.
OdpowiedzUsuń