Skip to content

NodeId class - hash collisions with integer (UInteger) node identifiers #1459

@peterfranciscook

Description

@peterfranciscook

Describe the bug
I noticed some unpredictable behavior in my app while using NodeId as Map keys. I'm not sure if this is actually supported/encouraged, but didn't see anything in the docs to indicate otherwise. At a glance, many of Milo's internals (e.g. DataTypeTree) seem to use NodeId as a hashtable key. I tracked it down to the hashCode method of the NodeId class - the hashCode computation yields a very high likelihood of hash collisions for a server with several active namespaces & number of nodes. For my particular project, there were 38 hash collisions out of 1433 monitored/relevant nodes across 3 namespaces (2.6% collision rate).

Expected behavior
I expect NodeId to be reliably hashable, perhaps not completely uniquely, but at least "unique-enough" across the range of namespaces and nodes for my server, so that 2 NodeIds don't resolve to the same hash.

Logs and Packet Captures
I wrote this small test to see hash collisions for what I figured to be a feasible range of namespaces (8) and node identifiers (32767) and compared it to a couple alternatives to Java's .hashCode()

    void checkMiloNodeIdHashes() {
        checkNodeIdHashes(NodeId::hashCode, "NodeId::hashCode");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.goodFastHash(32)), "Hashing.goodFastHash(32)");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.murmur3_32_fixed()), "Hashing.murmur3_32_fixed()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.murmur3_128()), "Hashing.murmur3_128()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.sha256()), "Hashing.sha256()");
        checkNodeIdHashes((nodeId) -> hashNodeId(nodeId, Hashing.crc32()), "Hashing.crc32()");
    }

    void checkNodeIdHashes(Function<NodeId, Integer> hashMethod, String method) {
        Multimap<Integer, NodeId> seen = ArrayListMultimap.create();

        // hash node ids for namespaces 0-7 & node identifiers 0-32767
        int nNamespace = 8;
        int nIndex = 32768;
        for (int ns = 0; ns < nNamespace; ns++) {
            for (int nid = 0; nid < nIndex; nid++) {
                NodeId nodeId = new NodeId(ns, nid);
                int hash = hashMethod.apply(nodeId);
                seen.put(hash, nodeId);
            }
        }

        Logger.debug("hash method: {}", method);
        checkCollisions(seen, nNamespace * nIndex);

        seen = null; // much RAM, garbage collect
    }

    int hashNodeId(NodeId nodeId, HashFunction hashFunction) {
        return hashFunction.newHasher()
            .putInt(nodeId.getNamespaceIndex().intValue())
            .putInt(((UInteger) nodeId.getIdentifier()).intValue())
            .hash()
            .asInt();
    }

    void checkCollisions(Multimap<Integer, NodeId> seen, int nNode) {
        int collisions = 0;
        int maxCollisions = 0;
        for (var entry : seen.asMap().entrySet()) {
            int key = entry.getKey();
            Collection<NodeId> values = entry.getValue();
            int nCollisionForKey = values.size();
            if (nCollisionForKey > 1) {
                Logger.trace("Hash Collision: {} <- {}", key, values);
                collisions += nCollisionForKey;
                maxCollisions = Math.max(maxCollisions, nCollisionForKey);
            }
        }
        displayCollisionStats(collisions, nNode, maxCollisions);
    }

    void displayCollisionStats(int collisions, int nNode, int maxCollisions) {
        float collisionPercent = (float) collisions / nNode * 100;
        Logger.debug("Total collisions: {}", collisions);
        Logger.debug("Collision percentage: {}%", collisionPercent);
        Logger.debug("Max collisions: {}\n", maxCollisions);
    }

outputs:

DEBUG: hash method: NodeId::hashCode
DEBUG: Total collisions: 262082
DEBUG: Collision percentage: 99.97635%
DEBUG: Max collisions: 8

DEBUG: hash method: Hashing.goodFastHash(32)
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0

DEBUG: hash method: Hashing.murmur3_32_fixed()
DEBUG: Total collisions: 12
DEBUG: Collision percentage: 0.0045776367%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.murmur3_128()
DEBUG: Total collisions: 14
DEBUG: Collision percentage: 0.005340576%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.sha256()
DEBUG: Total collisions: 22
DEBUG: Collision percentage: 0.008392334%
DEBUG: Max collisions: 2

DEBUG: hash method: Hashing.crc32()
DEBUG: Total collisions: 0
DEBUG: Collision percentage: 0.0%
DEBUG: Max collisions: 0

Additional context
artifacts = ["org.eclipse.milo:sdk-client:0.6.15", "org.eclipse.milo:stack-core:0.6.15",]

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions