python - Python中具有重复项的列表中的数据处理

Question

我有两个列表：一个包含产品，另一个包含它们的相关价格。列表可以包含未定义数量的产品。列表的示例如下：

产品 : ['Apple', 'Apple', 'Apple', 'Orange', 'Banana', 'Banana', 'Peach', 'Pineapple', 'Pineapple']
价格：['1.00', '2.00', '1.50', '3.00', '0.50', '1.50', '2.00', '1.00', '1.00']

我希望能够从产品列表中删除所有重复项，并仅保留与价目表中唯一产品相关联的最便宜的价格。请注意，某些产品可能具有相同的价格（在我们的示例中为菠萝）。

所需的最终列表将类似于：

产品 : ['Apple', 'Orange', 'Banana', 'Peach', 'Pineapple']
价格：['1.00', '3.00', '0.50', '2.00', '1.00']

我想知道在 Python 中最有效的方法。谢谢

score 3 · Accepted Answer

from collections import OrderedDict
products = ['Apple', 'Apple', 'Apple', 'Orange', 'Banana', 'Banana', 'Peach', 'Pineapple', 'Pineapple']
prices = ['1.00', '2.00', '1.50', '3.00', '0.50', '1.50', '2.00', '1.00', '1.00']

min_prices = OrderedDict()
for prod, price in zip(products, prices):
    min_prices[prod] = min(float(price), min_prices.get(prod, float('inf')))

>>> print min_prices.keys(), min_prices.values()
['Apple', 'Orange', 'Banana', 'Peach', 'Pineapple'] [1.0, 3.0, 0.5, 2.0, 1.0]

score 1 · Accepted Answer

可能最简单的方法是利用字典对唯一键的强制执行：

from operator import itemgetter
Products = ['Apple', 'Apple', 'Apple', 'Orange', 'Banana', 'Banana', 'Peach', 'Pineapple', 'Pineapple']
Prices = ['1.00', '2.00', '1.50', '3.00', '0.50', '1.50', '2.00', '1.00', '1.00']

final = dict(sorted(zip(Products, Prices), key=itemgetter(1), reverse=True))

score 1 · Accepted Answer

那这个呢：

prices = map(float,prices)
r={}
for k,v in zip(products,prices):
    if v < r.setdefault(k,float('inf')):
        r[k] = v
products,prices = r.keys(),map(str,r.values())

score 0 · Accepted Answer

您可以使用字典来执行此操作：

Products = ['Apple', 'Apple', 'Apple', 'Orange', 'Banana', 'Banana', 'Peach', 'Pineapple', 'Pineapple']
Prices = ['1.00', '2.00', '1.50', '3.00', '0.50', '1.50', '2.00', '1.00', '1.00']

Prices=[float(price) for price in Prices]

di={}
for prod,price in zip(Products,Prices):
    di.setdefault(prod,[]).append(price)

for key,val in di.items():
    di[key]=min(val)

print di

印刷{'Orange': 3.0, 'Pineapple': 1.0, 'Apple': 1.0, 'Peach': 2.0, 'Banana': 0.5}

如果您想要两个列表以相同的顺序，您可以这样做：

from collections import OrderedDict

new_prod=OrderedDict.fromkeys(Products).keys()
new_prices=[di[item] for item in new_prod]

印刷：

['Apple', 'Orange', 'Banana', 'Peach', 'Pineapple']
[1.0, 3.0, 0.5, 2.0, 1.0]

score 0 · Accepted Answer

>>> from collections import OrderedDict
>>> products = ['Apple', 'Apple', 'Apple', 'Orange', 'Banana', 'Banana', 'Peach', 'Pineapple', 'Pineapple']
>>> prices =  ['1.00', '2.00', '1.50', '3.00', '0.50', '1.50', '2.00', '1.00', '1.00']
>>> dic = OrderedDict()
>>> for x,y in zip(products,prices):
...     dic.setdefault(x, []).append(y)
...     
>>> dic.keys()
['Apple', 'Orange', 'Banana', 'Peach', 'Pineapple']
>>> [min(val, key = float) for val in dic.values()]
['1.00', '3.00', '0.50', '2.00', '1.00']

score 0 · Accepted Answer

不是最短的解决方案，但它说明了这一点：假设您的列表分别是products和prices。然后：

lookup = dict()    
for prod, price in zip(products, prices):
    if prod not in lookup:
        lookup[prod] = price
    else:
        lookup[prod] = min(price, lookup[prod])

此时，lookupdict 包含您的每个产品及其最低价格。dict 肯定是比两个列表更好的数据结构；如果你真的想把它作为两个单独的列表，你可以这样做：

new_prods = []
new_prices = []
for prod, price in lookup.items():
    new_prods.append(prod)
    new_prices.append(price)

python - Python中具有重复项的列表中的数据处理

6 回答 6

Related

Reference