2

我有主机名列表。它们由地区代码表示。

  1. AP- : 亚太地区
  2. EM- : 欧洲、中东和非洲
  3. AM- : 美洲

实际列表有大约 1000 个主机名,想法是过滤掉没有区域代码的主机名。我可以通过字符串操作等将其过滤掉,但是我想知道如何编写一个有效的正则表达式来过滤其中没有区域代码的主机名(如列表中的最后 4 项)?

 import re

host_name = ["XXX_Guangzhou_AP-CN-BEI-7517","XXX_Jakarta_AP-ID-JAK-0001","XXX_TaiPei_AP-TW-TPE-0002","XXX_Dubai_EM-AE-DUB-1012",
"XXX_Viladecans_EM-ES-VIL-1002","XXX_Ringsted_EM-DK-RIN-0001","XXX_Bogota_AM-CO-BOG-1033","XXX_Hamburg_EM-DE-HAM-1004",
"XXX_Bangkok_TH127","XXX_Bangkok_TH124","XXX_Eagan_6231","XXX_Martinez_AR218"]

hostRegex = re.compile(r"[^(AP\-|EM\-|AM\-)]")
mo = list(filter(hostRegex.findall,host_name))
print(mo)
4

1 回答 1

1

您可以使用

hostRegex = re.compile(r"_(A[PM]|EM)-")
mo = list(filter(lambda x: not hostRegex.search(x),host_name))

_(A[PM]|EM)- 则表达式匹配 _, then AP, AMor EM, 然后是-char。

filter(lambda x: not hostRegex.search(x),host_name)部分返回host_name列表中不匹配的所有项目。

请参阅Python 演示

import re

host_name = [
    "XXX_Guangzhou_AP-CN-BEI-7517","XXX_Jakarta_AP-ID-JAK-0001","XXX_TaiPei_AP-TW-TPE-0002",
    "XXX_Dubai_EM-AE-DUB-1012","XXX_Viladecans_EM-ES-VIL-1002","XXX_Ringsted_EM-DK-RIN-0001",
    "XXX_Bogota_AM-CO-BOG-1033","XXX_Hamburg_EM-DE-HAM-1004", "XXX_Bangkok_TH127","XXX_Bangkok_TH124",
    "XXX_Eagan_6231","XXX_Martinez_AR218"]

hostRegex = re.compile(r"_(A[PM]|EM)-")
mo = list(filter(lambda x: not hostRegex.search(x),host_name))
print(mo)

输出:

['XXX_Bangkok_TH127', 'XXX_Bangkok_TH124', 'XXX_Eagan_6231', 'XXX_Martinez_AR218']
于 2022-01-28T16:18:42.113 回答