|
_max_hash = np.uint64((1 << 32) - 1) |
in this implementation of minhash, it seems like the hasher is using 32 bits (sha1_hash32)
why is the _max_hash = np.uint64((1 << 32) - 1) using np.uint64 ?
I tried experiments with np.uint32 with the mersenne prime np.uint64((1 << 31) - 1) and it seems there arent much difference in the results.
If I understand correctly, this will automatically halve memory consumption as well.
Is there a reason to insist on np.uint64?
datasketch/datasketch/minhash.py
Line 12 in ebe4ca4
in this implementation of minhash, it seems like the hasher is using 32 bits (
sha1_hash32)why is the
_max_hash = np.uint64((1 << 32) - 1)usingnp.uint64?I tried experiments with
np.uint32with the mersenne primenp.uint64((1 << 31) - 1)and it seems there arent much difference in the results.If I understand correctly, this will automatically halve memory consumption as well.
Is there a reason to insist on
np.uint64?