python - Efficiently mapping unhashable objects to their index in a list

Question

A Python list

f = [x0, x1, x2]

may be seen as an efficient representation of a mapping from [0, 1, ..., len(f) - 1] to the set of its elements. By "efficient" I mean that f[i] returns the element associated with i in O(1) time.

The inverse mapping may be defined as follows:

class Inverse:
    def __init__(self, f):
        self.f = f

    def __getitem__(self, x):
        return self.f.index(x)

This works, but Inverse(f)[x] takes O(n) time on average.

Alternatively, one may use a dict:

f_inv = {x: i for i, x in enumerate(f)}

This has O(1) average time complexity, but it requires the objects in the list to be hashable.

Is there a way to define an inverse mapping that provides equality-based lookups, in O(1) average time, with unhashable objects?

Edit: sample input and expected output:

>>> f = [x0, x1, x2]
>>> f_inv = Inverse(f)  # this is to be defined
>>> f_inv[x0]  # in O(1) time
0
>>> f_inv[x2]  # in O(1) time
2

score 1 · Accepted Answer

You can create an associated dictionary mapping the object ID's back to the list index.

The obvious disadvantage is that you will have to search the index for the identity object, not for on eobject that is merely equal.

On the upside, by creating a custom MutableSequence class using collections.abc, you can, with minimal code, write a class that keeps your data both as a sequence and as the reverse dictionary.

from collections.abc import MutableSequence
from threading import RLock


class MD(dict):
    # No need for a full MutableMapping subclass, as the use is limited
    def __getitem__(self, key):
        return super().__getitem__(id(key))


class Reversible(MutableSequence):
    def __init__(self, args):
        self.seq = list()
        self.reverse = MD()
        self.lock = RLock()
        for element in args:
            self.append(element)

    def __getitem__(self, index):
        return self.seq[index]

    def __setitem__(self, index, value):
        with self.lock:
            del self.reverse[id(self.seq[index])]
            self.seq[index] = value
            self.reverse[id(value)] = index

    def __delitem__(self, index):
        if index < 0:
            index += len(self)
        with self.lock:
            # Increase all mapped indexes
            for obj in self.seq[index:]:
                self.reverse[obj] -= 1
            del self.reverse[id(self.seq[index])]
            del self.seq[index]

    def __len__(self):
        return len(self.seq)

    def insert(self, index, value):
        if index < 0:
            index += len(self)
        with self.lock:
            # Increase all mapped indexes
            for obj in self.seq[index:]:
                self.reverse[obj] += 1
            self.seq.insert(index, value)
            self.reverse[id(value)] = index

And voilá: just use this object in place of your list, and the public attribute "reverse" to get the index of identity objects. Perceive you can augment the "intelligence" of the "MD" class by trying to use different strategies, like to use the objects themselves, if they are hashable, and only resort to id, or other custom key based on other object attributes, when needed. That way you could mitigate the need for the search to be for the same object.

So, for ordinary operations on the list, this class maintain the reverted dictionary synchronized. There is no support for slice indexing, though. For more information, check the docs at https://docs.python.org/3/library/collections.abc.html

score 0 · Accepted Answer

Unfortunately you're stuck with an algorithm limitation here. Fast lookup structures, like hash tables or binary trees, are efficient because they put objects in particular buckets or order them based on their values. This requires them to be hashable or comparable consistently for the entire time you are storing them in this structure, otherwise a lookup is very likely to fail.

If the objects you need are mutable (usually the reason they are not hashable) then any time an object you are tracking changes you need to update the data structure. The safest way to do this is to create immutable objects. If you need to change an object, then create a new one, remove the old one from the dictionary, and insert the new object as a key with the same value.

The operations here are still O(1) with respect to the size of the dictionary, you just need to consider whether the cost of copying objects on every change is worth it.

python - Efficiently mapping unhashable objects to their index in a list

2 回答 2

Related

Reference