Strony

środa, 5 czerwca 2013

python popcnt

popcnt - count bits set ("1") in a integer, of course using python

I've implemented only four ways to do this:
  • using look up table 0..255 (no iterations, oen sum and a few shifting by 8 bits)
  • naive (number of iterations ==  number of bits)
  • Brian Kernighan way (number of iterations == number of bits set)
  • pythonic by ugly (count "1"s after calling bin()...)
#!/bin/env python
""" :mod: popcnt
============
.. module: popcnt
:synopsis: couting bits set ("1") in int or long
.. moduleauthor:: Krzysztof.Ciba@NOSPAMgmail.com
"""
## bit count look up table in range 0..255
__bcLUT = [ 0 ] * 256
for i in range(256):
__bcLUT[i] = ( i & 1 ) + __bcLUT[i/2]
def popcntLUT( arg ):
""" popcnt using look up table """
assert type(arg) in ( int, long )
e = lambda x, y: x + int( bool( y ) ) << 3
return sum( [ __bcLUT[ ( arg >> i ) & 255 ]
for i in range( 0, e( *divmod( arg.bit_length(), 8) ), 8 ) ] )
def popcntBK( arg ):
""" Brian Kernighan way - loop only for bits set """
assert type(arg) in ( int, long )
pop = 0
while arg:
arg &= arg - 1
pop += 1
return pop
def popcntNaive( arg ):
""" naive - loop for every bit in arg """
assert type(arg) in ( int, long )
pop = 0
while arg:
pop += arg & 1
arg >>= 1
return pop
def popcntUgly( arg ):
""" ugly but pythonic way? """
assert type(arg) in ( int, long )
return bin(arg).count("1")
# # eep!
if __name__ == "__main__":
# dummy testing
for i in range(1024):
ugly = popcntUgly(i)
lut = popcntLUT(i)
bk = popcntBK(i)
naive = popcntNaive(i)
assert ugly == lut
assert ugly == bk
assert ugly == naive
view raw popcnt.py hosted with ❤ by GitHub


All function were run on 1000 samples made by 100 random numbers:

# --- testBK.py
import sys
import random
from popcnt import popcntBK
up = 2**256
for i in range(1, 1001):
if not i % 10:
print ".",
sys.stdout.flush()
if not i % 100:
print
sys.stdout.flush()
for j in range(1000):
popcntBK( random.randint( 0, up ) )
# --- testLUT.py
import sys
import random
from popcnt import popcntLUT
up = 2**256
for i in range(1, 1001):
if not i % 10:
print ".",
sys.stdout.flush()
if not i % 100:
print
sys.stdout.flush()
for j in range(1000):
popcntLUT( random.randint( 0, up ) )
# --- testNaive.py
import sys
import random
from popcnt import popcntNaive
up = 2**256
for i in range(1, 1001):
if not i % 10:
print ".",
sys.stdout.flush()
if not i % 100:
print
sys.stdout.flush()
for j in range(1000):
popcntNaive( random.randint( 0, up ) )
# --- testUgly.py
import sys
import random
from popcnt import popcntUgly
up = 2**256
for i in range(1, 1001):
if not i % 10:
print ".",
sys.stdout.flush()
if not i % 100:
print
sys.stdout.flush()
for j in range(1000):
popcntUgly( random.randint( 0, up ) )
view raw gistfile1.py hosted with ❤ by GitHub


And now testing results:
  • naive (Zzzzz...)
[Wed, 05 Jun 22:07 E0][cibak@localhost:~/popcnt]> time python testNaive.py
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
175.885u 0.055s 2:56.35 99.7%    0+0k 0+0io 0pf+0w

  • BK way (much better!)
[Wed, 05 Jun 22:04 E0][cibak@localhost:~/popcnt]> time python testBK.py
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
75.853u 0.028s 1:16.03 99.7%    0+0k 0+0io 0pf+0w
 

  • look up table (niiiice!)
[Wed, 05 Jun 22:06 E0][cibak@localhost:~/popcnt]> time python testLUT.py
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
42.828u 0.030s 0:42.93 99.8%    0+0k 0+0io 0pf+0w 

  • ugly but pythonic (wut?)
[Wed, 05 Jun 22:04 E1][cibak@localhost:~/popcnt]> time python testUgly.py
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
40.880u 0.040s 0:41.06 99.6%    0+0k 0+0io 0pf+0w



And guess what? Ugly (but pythonic) won! Coincidence? I don't think so... ;)

EDIT:
And of course I can completely remove lines with lambda and all this stuff within popcntLUT  and it could be coded just like that:

## bit count look up table in range 0..255
__bcLUT = [ 0 ] * 256
for i in range(256):
__bcLUT[i] = ( i & 1 ) + __bcLUT[i/2]
def popcntLUT( arg ):
""" popcnt using look up table """
assert type(arg) in ( int, long )
return sum( [ __bcLUT[ ( arg >> i ) & 255 ]
for i in range( 0, arg.bit_length(), 8 ) ] )
view raw popcntLUT.py hosted with ❤ by GitHub


But it doesn't help to much.

Also one can check say 16 bits look up table, say:

__bc16b = [ 0 ] * (256*256)
for i in range(256*256):
__bc16b[i] = ( i & 1 ) + __bc16b[i/2]
def popcnt16b( arg ):
""" popcnt using 16bits loom up table """
return sum( [ __bc16b[( arg >> i ) & 65535 ] for i in range( 0, arg.bit_length(), 16 ) ] )
view raw popcnt16b.py hosted with ❤ by GitHub


 This should run at least 2 times faster that original popcntLUT.

 

2 komentarze:

  1. What if LUT would be coded without lambda function? They are famous for code slowdown.

    OdpowiedzUsuń
  2. It wouldn't helped too much (~0.5-0.6 second faster). IMHO it's a divmod which is slowing down in LUT case, but I need to recheck.

    OdpowiedzUsuń