0

问题

我有一个对象列表。每个对象都有两个属性:“score”“coordinates”。我需要根据属性找到列表中最大的N个对象。score我遇到的主要问题是使用score属性对对象进行排序。排序可以是部分的。我只对N个最大的对象感兴趣。

当前解决方案

我目前的方法不是最优雅也不是最有效的。这个想法是创建一个dictionaryof 对象indices及其score,然后对分数列表进行排序并使用dictionary来索引产生最大分数的对象。

这些是步骤:

  1. 创建一个列表scores。列表的每个元素对应一个对象。也就是说,第一个条目是第一个对象的分数,第二个条目是第二个对象的分数,依此类推。

  2. dictionary使用对象的scoresaskey和对象indexas创建一个value

  3. 使用 a 对分数列表进行排序heapq以获得N最大的对象。

  4. 使用dictionary获取具有最大 的那些对象scores

  5. list仅使用N最高分对象创建一个新对象。

代码片段

这是我的排序功能:

import random
import heapq


# Gets the N objects with the largest score:
def getLargest(N, objects):
    # Set output objects:
    outobjects = objects

    # Get the total of objects in list:
    totalobjects = len(objects)

    # Check if the total number of objects is bigger than the N requested
    # largest objects:

    if totalobjects > N:

        # Get the "score" attributes from all the objects:
        objectScores = [o.score for o in objects]

        # Create a dictionary with the index of the objects and their score.
        # I'm using a dictionary to keep track of the largest scores and
        # the objects that produced them:
        objectIndices = range(totalobjects)
        objectDictionary = dict(zip(objectIndices, objectScores))

        # Get the N largest objects based on score:
        largestObjects = heapq.nlargest(N, objectScores)
        print(largestObjects)

        # Prepare the output list of objects:
        outobjects = [None] * N

        # Look for those objects that produced the
        # largest score:
        for k in range(N):
            # Get current largest object:
            currentLargest = largestObjects[k]
            # Get its original position on the keypoint list:
            position = objectScores.index(currentLargest)
            # Index the corresponding keypoint and store it
            # in the output list:
            outobjects[k] = objects[position]

    # Done:
    return outobjects

此片段生成100用于测试我的方法的随机对象。最后一个循环打印N = 3随机生成的最大对象score

# Create a list with random objects:
totalObjects = 100
randomObjects = []


# Test object class:
class Object(object):
    pass


# Generate a list of random objects
for i in range(totalObjects):
    # Instance of objects:
    tempObject = Object()
    # Set the object's random score
    random.seed()
    tempObject.score = random.random()
    # Set the object's random coordinates:
    tempObject.coordinates = (random.randint(0, 5), random.randint(0, 5))
    # Store object into list:
    randomObjects.append(tempObject)

# Get the 3 largest objects sorted by score:
totalLargestObjects = 3
largestObjects = getLargest(totalLargestObjects, randomObjects)

# Print the filtered objects:
for i in range(len(largestObjects)):
    # Get the current object in the list:
    currentObject = largestObjects[i]
    # Get its score:
    currentScore = currentObject.score
    # Get its coordinates as a tuple (x,y)
    currentCoordinates = currentObject.coordinates
    # Print the info:
    print("object: " + str(i) + " score: " + str(currentScore) + " x: " + str(
        currentCoordinates[0]) + " y: " + str(currentCoordinates[1]))

我目前的方法可以完成工作,但必须有一种更Pythonic(更矢量化)的方式来实现相同的目标。我的背景主要是 C++,我还在学习 Python。欢迎任何反馈。

附加信息

最初,我正在寻找类似于 C++ 的std:: nth_element. 似乎 NumPy 在 Python 中提供了此功能partition。不幸的是,虽然std::nth_element支持自定义排序的谓词,但 NumPypartition不支持。我最终使用了 a heapq,它可以很好地完成工作并按所需的顺序进行排序,但我不知道基于一个属性进行排序的最佳方式。

4

2 回答 2

1

元组是你需要的。不是将分数存储在堆中,而是将元组存储(score, object)在堆中。它将尝试按分数进行比较并返回一个元组列表,您可以使用它来检索原始对象。这将节省您通过分数检索对象的额外步骤:

heapq.nlargest(3, ((obj.score, obj) for obj in randomObjects))
# [(0.9996643881256989, <__main__.Object object at 0x155f730>), (0.9991398955041872, <__main__.Object object at 0x119e928>), (0.9858047551444177, <__main__.Object object at 0x15e38c0>)]

对于一个真实世界的例子:https ://akuiper.com/console/g6YuNa_1WClp

或者正如@shriakhilc 评论的那样,使用key参数 inheapq.nlargest指定您要按分数进行比较:

heapq.nlargest(3, randomObjects, lambda o: o.score)
于 2021-12-27T23:56:37.667 回答
1

我建议你使用排序的 python 原生方法 + lambda 函数。见这里:https ://docs.python.org/3/howto/sorting.html#sortinghowto

基本上,这是您可以拥有的:

myList = [
  {score: 32, coordinates: [...]},
  {score: 12, coordinates: [...]},
  {score: 20, coordinates: [...]},
  {score: 8, coordinates: [...]},
  {score: 40, coordinates: [...]},
]

# Sort by score DESCENDING
mySortedList = sorted(myList, key=lambda element: element['score'], reverse=True)

# Retrieve top 3 results
myTopResults = mySortedList[0:3]
于 2021-12-28T00:00:48.153 回答