python - 根据每个元组的第一个值比较两个元组列表（但返回所有元组值）

Question

我试图通过查找系统 A 中唯一的设备、系统 B 中唯一的设备以及最后两个系统中都存在的设备来比较两个不同系统的输出。

现在，我将来自两个系统的数据作为元组列表。我的示例数据如下所示：

system_a_devices = [("host1.test.local", "Test 1 Group"), ("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc."), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

system_b_devices = [("desktop1.corp.tld", "Corporate TLD, Ltd."), ("host1.test.local", "Test One Group"), ("server6.hello.local", "Hello Corporation, Inc.")]

元组中的第一个值是主机的 FQDN，第二个值是设备的描述性名称（在此特定示例中，它是客户名称）。虽然最终结果中需要客户名称，但它们不一定需要匹配（请参阅“测试 1 组”和“测试 1 组”，但它们共享相同的 FQDN）。因此，最终结果可能包含字符串“Test 1 Group”或“Test One Group”，因为两者都适用于我想要完成的任务（尽管系统 B 很可能拥有最准确的客户名称数据）。

在确定每个系统的唯一值时，应该只考虑 FQDN（元组中的第一个值）。此外，两个系统中的每一个都可以以任何随机顺序返回系统列表，并且每个系统的每个列表的元组数（FQDN/客户名称配对）会有所不同。

我的最终结果应该类似于以下内容：

system_a_unique = [("host5.testing.lan", "LAN Test Group"), ("server5.hello.local", "Hello Corporation, Inc.")]

system_b_unique = [("server6.hello.local", "Hello Corporation, Inc.")]

both_systems = [("host1.test.local", "Test One Group"), ("desktop1.corp.tld", "Corporate TLD, Ltd.")]

正如我之前提到的，描述/客户名称可以来自“both_systems”列表的任一系统，但如果使用系统 B 的数据不需要太多工作额外的努力，系统 B 可能具有更好/更清晰的数据。

我将如何有效地完成这项任务？更好的问题是我应该如何构建来自系统 A 和系统 B 的数据输出以更好地实现这一点（即元组列表是一个坏主意）？

score 1 · Accepted Answer

Would the better question to ask be how should I structure my data output from System A and System B to better accomplish this (i.e. list of tuples is a bad idea)?

I have to say that, yes, a simple move to dicts would make this trivial.

system_a_devices = {"host1.test.local": "Test 1 Group", "host5.testing.lan": "LAN Test Group", "server5.hello.local": "Hello Corporation, Inc.", "desktop1.corp.tld": "Corporate TLD, Ltd."}
system_b_devices = {"desktop1.corp.tld": "Corporate TLD, Ltd.", "host1.test.local": "Test One Group", "server6.hello.local": "Hello Corporation, Inc."}

Now you can just do straightforward list comps:

system_a_unique = [tup for tup in system_a_devices.items() if tup[0] not in system_b_devices]
system_b_unique = [tup for tup in system_b_devices.items() if tup[0] not in system_a_devices]
both_systems = [tup for tup in system_b_devices.items() if tup[0] in system_a_devices]

score 0 · Accepted Answer

You can use set operations on the FQDNs to find which are unique to each system and which are on both, and then use dicts to lookup the device names based on FQDNs:

# create FQDN -> device name dicts for each system
devices_a = dict(system_a_devices)
devices_b = dict(system_b_devices)

# create a set of FQDNs for each system
fqdn_set_a = set(system_a_devices.keys())
fqdn_set_b = set(system_b_devices.keys())

# compute FQDNs unique to each systems and those which are not unique
unique_fqdns_a = fqdn_set_a - fqdn_set_b
unique_fqdns_b = fqdn_set_b - fqdn_set_a
non_unique_fqdns = fqdn_set_a & fqdn_set_b

# now add device names using the FQDN -> device name dicts
system_a_unique = [(fqdn, devices_a[fqdn]) for fqdn in unique_fqdns_a]
system_b_unique = [(fqdn, devices_b[fqdn]) for fqdn in unique_fqdns_b]
# note: for FQDNs found on both systems, use the device name from system B
both_systems = [(fqdn, devices_b[fqdn]) for fqdn in non_unique_fqdns]

python - 根据每个元组的第一个值比较两个元组列表（但返回所有元组值）

2 回答 2

Related

Reference