python-3.x - 如何在字符串中查找包含数字的首字母缩略词

Question

我需要创建一个函数来查找包含一些包含数字的大写首字母缩写词，但我只能检测到仅包含字母的首字母缩写词。

一个例子：

s= "the EU needs to contribute part of their GDP to improve the IC3 plan"

我试过

def acronym(s):
    return re.findall(r"\b[A-Z]{2,}\b", s)
print(acronym(s))

但我只得到

[EU,GDP]

我可以添加或更改什么来获得

[EU,GDP,IC3]

谢谢

score 2 · Accepted Answer

尝试：

import re

def acronym(s):
    return re.findall(r"\b(?:[0-9]+[A-Z][A-Z0-9]*)|(?:[A-Z][A-Z0-9]+)\b", s)

print(acronym('3I 33 I3 A GDP W3C'))

输出：

['3I', 'I3', 'GDP', 'W3C']

这个正则表达式意味着：

找到任何单词（介于之间\b，它们是“单词边界”）

以数字（或更多）开头，然后必须至少有一个大写字母，然后可以有其他字母和数字
以大写字母开头，然后至少有另一个大写字母或数字。

这?:允许我们不捕获 2 个组 ( ()|())，而只能捕获一个。

score 0 · Accepted Answer

此正则表达式不会匹配数字（例如123）：

import re

s = "the EU needs to contribute part of their GDP to improve the IC3 plan"

def acronym(s):
    return re.findall(r"\b([A-Z]{2,}\d*)\b", s)

print(acronym(s))

印刷：

['EU', 'GDP', 'IC3']

Regex101 链接在这里。

score 0 · Accepted Answer

试试这个。

它与 Andrej 和 S. Pellegrino 的答案相似，但它不会仅捕获数字字符串'123'，它会捕获任何位置的数字字符串，而不仅仅是在末尾。

图案说明：

\b- 匹配一个单词边界（字符串的开头）

(?=.*[A-Z])- 断言后面是大写字母（即字符串至少包含一个大写字母）。这被称为积极展望。

[A-Z\d]{2,}- 匹配大写字母或数字两次或多次。

\b- 匹配另一个单词边界（字符串的结尾）。

import re

def acronym(s):
    pattern = r'\b(?=.*[A-Z])[A-Z\d]{2,}\b'
    return re.findall(pattern, s)

编辑：添加正则表达式模式的解释。

python-3.x - 如何在字符串中查找包含数字的首字母缩略词

3 回答 3

Related

Reference