python - 结合两个列表并通过引用字典 Python 进行排序

Question

我有（在我看来是）一个非常复杂的问题。我会尽量简洁——尽管为了完全理解这个问题，你可能需要点击我的个人资料并查看我在 StackOverflow 上发布的（只有其他）两个问题。简而言之：我有两个列表——一个由包含设施名称和事件日期的电子邮件字符串组成。另一个由每封电子邮件的设施 ID 组成（我使用以下正则表达式函数之一来获取此列表）。我使用 Regex 能够在每个字符串中搜索这些信息。3个正则表达式函数是：

def find_facility_name(incident):

    pattern = re.compile(r'Subject:.*?for\s(.+?)\n')
    findPat1 = re.search(pattern, incident)
    facility_name = findPat1.group(1)

    return facility_name



def find_date_of_incident(incident):


    pattern = re.compile(r'Date of Incident:\s(.+?)\n')
    findPat2 = re.search(pattern, incident)
    incident_date = findPat2.group(1)

    return incident_date

def find_facility_id(incident):
    pattern = re.compile('(\d{3})\n')
    findPat3 = re.search(pattern, incident)
    f_id = findPat3.group(1)

    return f_id

我也有一个格式如下的字典：

d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}

我正在尝试组合这两个列表并按字典中的键值排序，然后是事件日期。由于键值附加到设施名称，这应该会自动导致来自相同设施的电子邮件被组合在一起。为了做到这一点，我尝试使用这两个功能：

def get_facility_ids(incident_list):
'''(lst) -> lst

Return a new list from incident_list that inserts the facility IDs from the
get_facilities dictionary into each incident.

'''
f_id = []
for incident in incident_list:
    find_facility_name(incident)
    for k in d:
        if find_facility_name(incident) == d[k]:
            f_id.append(k)

return f_id

id_list = get_facility_ids(incident_list)

def combine_lists(L1, L2):
    combo_list = []
    for i in range(len(L1)):
        combo_list.append(L1[i] + L2[i])

return combo_list

combination = combine_lists(id_list, incident_list)

def get_sort_key(incident):
'''(str) -> tup

Return a tuple from incident containing the facility id as the first
value and the date of the incident as the second value.

'''

return (find_facility_id(incident), find_date_of_incident(incident))

final_list = sorted(combination, key=get_sort_key)

这是我的输入可能和所需输出的示例：

d = {'001' : 'Facility #1', '002' : 'Another Facility'...etc.}
input: first_list = ['email_1', 'email_2', etc.]
first output: next_list = ['facility_id_for_1+email_1', 'facility_id_for_2 + email_2', etc.]
DESIRED OUTPUT: FINAL_LIST = sorted(next_list, key=facility_id, date of incident)

唯一的问题是，键值与在每个单独的电子邮件字符串中找到的内容不正确匹配。有些做，有些是完全随机的。我不知道为什么会发生这种情况，但我觉得这与我合并这两个列表的方式有关。谁能帮助这个卑微的n00b？谢谢！！！

score 0 · Accepted Answer

首先，我建议您颠倒您的 ID-to-name 字典。按键查找值非常快，但按值查找键非常慢。

rd = { name: id_num for id_num, name in d.items() }

然后您的第一个函数可以替换为列表推导：

id_list = [rd[find_facility_name(incident)] for incident in incident_list]

这也可能会揭示为什么您的结果中的值会变得混乱。如果事件的设施名称不在您的字典中，则此代码将引发 a KeyError（而您的旧函数将跳过它）。

您的combine函数与 Python 的内置zip函数非常相似。我将其替换为：

combination = [id+incident for id, incident in zip(id_list, incident_list)]

但是，由于您是从第二个列表构建第一个列表，因此直接构建组合版本可能有意义，而不是制作单独的列表然后在单独的步骤中组合它们。这是对上面列表理解的更新，直接得到combination结果：

combination = [rd[find_facility_name(incident)] + incident
               for incident in incident_list]

要进行排序，您可以使用我们刚刚添加到电子邮件消息中的 ID 字符串，而不是再次解析以找到 ID：

combination.sort(key=lambda x: (x[0:3], get_date_of_incident(x)))

3切片中的是基于您的示例，并且"001"作为"002"id 值。如果实际 id 更长或更短，则需要对其进行调整。

score 0 · Accepted Answer

所以，这就是我认为正在发生的事情。如果可能，请纠正我。“incident_list”是电子邮件字符串的列表。您进入并在每封电子邮件中找到设施名称，因为您有一个外部字典，其中 (key:value)=(facility id: facility name)。从字典中，您可以在此“id_list”中提取设施 ID。

您将列表组合起来，以便获得字符串列表 [设施 ID + 电子邮件，...] 然后您希望它按元组排序（设施 ID，发生日期）。

看起来您正在搜索设施 ID 和设施名称两次。如果它们相同，您可以跳过一个步骤。然后，最好的方法是使用元组一次完成所有操作：

event_list = ['email1', 'email2',...]

未排序列表 = []
对于事件列表中的电子邮件：
    id = find_facility_id（电子邮件）
    日期 = find_date_of_incident（电子邮件）
    mytuple = ( id, date, id + email )
    unsorted_list.append(mytuple)

final_list = sorted(unsorted_list, key=lambda mytup:(mytup[0], mytup[1]))

然后你得到一个简单的元组列表，按第一个元素（id 作为字符串）排序，然后是第二个元素（日期作为字符串）。如果您只需要一个字符串列表（ id + email ），那么您需要一个包含每个元组部分的最后一个元素的列表

FINALLIST = [ tup[-1] for tup in final_list ]

python - 结合两个列表并通过引用字典 Python 进行排序

2 回答 2

Related

Reference