我有以下输入:
input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
并尝试获得以下输出:
outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]
outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}
考虑到可扩展性,有关如何处理给定的任何提示(var 输入可能会变得非常大)。
我有以下输入:
input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
并尝试获得以下输出:
outputlist = [[0, 0, 1, 2], [1, 3, 4, 2]]
outputmapping = {0:dog, 1:cat, 2:mouse, 3:ruby, 4:python, 5:mouse}
考虑到可扩展性,有关如何处理给定的任何提示(var 输入可能会变得非常大)。
你可能想要这样的东西:
import collections
import itertools
def build_catalog(L):
counter = itertools.count().next
names = collections.defaultdict(counter)
result = []
for t in L:
new_t = [ names[item] for item in t ]
result.append(new_t)
catalog = dict((name, idx) for idx, name in names.iteritems())
return result, catalog
使用它:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> outputlist, outputmapping = build_catalog(input)
>>> outputlist
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
此类将自动将对象映射到递增的整数值:
class AutoMapping(object):
def __init__(self):
self.map = {}
self.objects = []
def __getitem__(self, val):
if val not in self.map:
self.map[val] = len(self.objects)
self.objects.append(val)
return self.map[val]
示例用法,供您输入:
>>> input = [('dog', 'dog', 'cat', 'mouse'), ('cat', 'ruby', 'python', 'mouse')]
>>> map = AutoMapping()
>>> [[map[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> map.objects
['dog', 'cat', 'mouse', 'ruby', 'python']
>>> dict(enumerate(map.objects))
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}
这是一种可能的解决方案,尽管它不是最好的。如果您通过预先分配它们预先知道列表中的每个条目将有多少元素,则可以稍微提高效率。
labels=[];
label2index={};
outputlist=[];
for group in input:
current=[];
for label in group:
if label not in label2index:
label2index[label]=len(labels);
labels.append(label);
current.append(label2index[label]);
outputlist.append(current);
outputmapping={};
for idx, val in enumerate(labels):
outputmapping[idx]=val;
我在我的项目中经常遇到同样的问题,所以前段时间我完成了一个课程,它就是这样做的:
class UniqueIdGenerator(object):
"""A dictionary-like class that can be used to assign unique integer IDs to
names.
Usage:
>>> gen = UniqueIdGenerator()
>>> gen["A"]
0
>>> gen["B"]
1
>>> gen["C"]
2
>>> gen["A"] # Retrieving already existing ID
0
>>> len(gen) # Number of already used IDs
3
"""
def __init__(self, id_generator=None):
"""Creates a new unique ID generator. `id_generator` specifies how do we
assign new IDs to elements that do not have an ID yet. If it is `None`,
elements will be assigned integer identifiers starting from 0. If it is
an integer, elements will be assigned identifiers starting from the given
integer. If it is an iterator or generator, its `next` method will be
called every time a new ID is needed."""
if id_generator is None:
id_generator = 0
if isinstance(id_generator, int):
import itertools
self._generator = itertools.count(id_generator)
else:
self._generator = id_generator
self._ids = {}
def __getitem__(self, item):
"""Retrieves the ID corresponding to `item`. Generates a new ID for `item`
if it is the first time we request an ID for it."""
try:
return self._ids[item]
except KeyError:
self._ids[item] = self._generator.next()
return self._ids[item]
def __len__(self):
"""Retrieves the number of added elements in this UniqueIDGenerator"""
return len(self._ids)
def reverse_dict(self):
"""Returns the reversed mapping, i.e., the one that maps generated IDs to their
corresponding items"""
return dict((v, k) for k, v in self._ids.iteritems())
def values(self):
"""Returns the list of items added so far. Items are ordered according to
the standard sorting order of their keys, so the values will be exactly
in the same order they were added if the ID generator generates IDs in
ascending order. This hold, for instance, to numeric ID generators that
assign integers starting from a given number."""
return sorted(self._ids.keys(), key = self._ids.__getitem__)
使用示例:
>>> input = [(dog, dog, cat, mouse), (cat, ruby, python, mouse)]
>>> gen = UniqueIdGenerator()
>>> outputlist = [[gen[x] for x in y] for y in input]
[[0, 0, 1, 2], [1, 3, 4, 2]]
>>> print outputlist
>>> outputmapping = gen.reverse_dict()
>>> print outputmapping
{0: 'dog', 1: 'cat', 2: 'mouse', 3: 'ruby', 4: 'python'}