python - 在python中使用正则表达式忽略列表中的字符串

Question

我正在从网站获取一些信息，例如我正在获取一些客户的地址

address = ['Mr Thomas',
 '+(91)-9849633132, 9959455935',
 '+(91)-9849633132',
 '9196358485',
 '8846853128',
 '8-4-236/2']

从上面的列表中，我想忽略+(91) and 9 and 8以电话号码开头的字符串，所以我使用了如下的正则表达式

import re


result = [i for i in address if not re.match(r"[98]\B", i)]

结果

['Mr Thomas','+(91)-9849633132, 9959455935','+(91)-9849633132','8-4-236/2']

那就是以开头的字符串9 and 8被忽略，但我也想忽略以开头的字符串+(91)，谁能告诉我该怎么做。

score 1 · Accepted Answer

只需添加另一个检查 +(91)，使用 | （或）运算符。像这样：

>>> [i for i in address if not re.match(r"[98]\B|\+\(91\)\B", i)]
['Mr Thomas', '8-4-236/2']

请注意，您必须转义 +、( 和 )，因为它们是特殊字符。

顺便说一句，使用过滤器可能更有效，而不是列表推导：

>>> filter(lambda x: not re.match(r"[98]\B|\+\(91\)\B", x), address)
['Mr Thomas', '8-4-236/2']

虽然我不能确定。

编辑：看起来效率不高。但是，我发现它更像是自我记录，但您可以随意使用它。

score 0 · Accepted Answer

0

result = [i for i in address if not re.match(r"\+[98]\B", i)]

于 2012-08-07T09:27:24.613 回答

score 0 · Accepted Answer

这确实有效：

 result = [i for i in s if not re.match(r'[+89][-()+0-9/\s]+',i)]

为什么？'\B' 开关在这里是有害的，因为匹配不能出现在字符串的开头。此外，建议的搜索模式允许数字内有空格。

3 回答 3