python - 在 Python 中删除数字（正则表达式）

Question

我正在尝试从字符串中删除所有数字。但是，下一个代码也会删除任何单词中包含的数字。显然，我不想那样。我一直在尝试许多正则表达式，但没有成功。

谢谢！

s = "This must not be deleted, but the number at the end yes 134411"
s = re.sub("\d+", "", s)
print s

结果：

这个一定不能b删，但是最后的数字yes

score 45 · Accepted Answer

在 \d+ 之前添加一个空格。

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> s = re.sub(" \d+", " ", s)
>>> s
'This must not b3 delet3d, but the number at the end yes '

编辑：查看评论后，我决定形成一个更完整的答案。我认为这说明了所有情况。

s = re.sub("^\d+\s|\s\d+\s|\s\d+$", " ", s)

score 21 · Accepted Answer

21

试试这个：

"\b\d+\b"

这将只匹配那些不属于另一个单词的数字。

于 2009-05-03T14:12:44.600 回答

score 7 · Accepted Answer

使用\s不是很好，因为它不处理标签等。更好的解决方案的第一步是：

re.sub(r"\b\d+\b", "", s)

请注意，该模式是一个原始字符串，因为\b通常是字符串的退格转义，我们想要特殊的单词边界正则表达式转义。一个稍微花哨的版本是：

re.sub(r"$\d+\W+|\b\d+\b|\W+\d+$", "", s)

当字符串的开头/结尾有数字时，它会尝试删除前导/尾随空格。我说“尝试”是因为如果最后有多个数字，那么你仍然有一些空格。

score 6 · Accepted Answer

6

还要处理行首的数字字符串：

s = re.sub(r"(^|\W)\d+", "", s)

于 2009-05-03T14:23:58.863 回答

score 4 · Accepted Answer

你可以试试这个

s = "This must not b3 delet3d, but the number at the end yes 134411"
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

同样的规则也适用于

s = "This must not b3 delet3d, 4566 but the number at the end yes 134411" 
re.sub("(\s\d+)","",s)

结果：

'This must not b3 delet3d, but the number at the end yes'

score 3 · Accepted Answer

要仅匹配字符串中的纯整数：

\b(?<![0-9-])(\d+)(?![0-9-])\b

它对此做了正确的事情，仅匹配百万之后的所有内容：

max-3 cvd-19 agent-007 8-zoo 2ab c3d ef4 55g h66i jk77 
8m9n o0p2     million     0 22 333  4444

此页面上的所有其他 8 个正则表达式答案都因该输入而以各种方式失败。

第一个 0-9 ... [0-9-] ... 末尾的破折号保留 -007，第二组中的破折号保留 8-。

或 \d 代替 0-9 如果您愿意

在正则表达式101

可以简化吗？

score 2 · Accepted Answer

如果您的号码始终位于字符串的末尾，请尝试：re.sub("\d+$", "", s)

否则，您可以尝试 re.sub("(\s)\d+(\s)", "\1\2", s)

您可以调整反向引用以仅保留一两个空格（\s 匹配任何白色分隔符）

score 2 · Accepted Answer

我不知道你的真实情况是什么样的，但大多数答案看起来他们不会处理负数或小数，

re.sub(r"(\b|\s+\-?|^\-?)(\d+|\d*\.\d+)\b","")

以上还应该处理类似的事情，

“这一定不是b3 delet3d，而是末尾的数字是-134.411”

但这仍然不完整——您可能需要更完整地定义您可以在需要解析的文件中找到的内容。

编辑：还值得注意的是 '\b' 会根据您使用的语言环境/字符集而变化，因此您需要小心一点。

score 1 · Accepted Answer

非正则表达式解决方案：

>>> s = "This must not b3 delet3d, but the number at the end yes 134411"
>>> " ".join([x for x in s.split(" ") if not x.isdigit()])
'This must not b3 delet3d, but the number at the end yes'

按拆分" "，并通过执行检查块是否为数字str().isdigit()，然后将它们重新连接在一起。更详细（不使用列表理解）：

words = s.split(" ")
non_digits = []
for word in words:
    if not word.isdigit():
        non_digits.append(word)

" ".join(non_digits)

score 0 · Accepted Answer

我有一个灯泡时刻，我尝试过并且它有效：

sol = re.sub(r'[~^0-9]', '', 'aas30dsa20')

输出：

aasdsa

score -1 · Accepted Answer

>>>s = "This must not b3 delet3d, but the number at the end yes 134411"
>>>s = re.sub(r"\d*$", "", s)
>>>s

“这一定不是b3 delete3d，而是末尾的数字yes”

这将删除字符串末尾的数字。

python - 在 Python 中删除数字（正则表达式）

11 回答 11

Related

Reference