python - Python 正则表达式提取标记

Question

我试图找到所有看起来像abc_rtyorabc_45或abc09_23korabc09-K34的标记4535。标记不应以_或-或数字开头。

我没有取得任何进展，甚至失去了我所做的进展。这就是我现在所拥有的：

r'(?<!0-9)[(a-zA-Z)+]_(?=a-zA-Z0-9)|(?<!0-9)[(a-zA-Z)+]-(?=a-zA-Z0-9)\w+'

为了让问题更清楚，这里有一个例子：如果我有一个字符串如下：

    D923-44 43 uou 08*) %%5 89ANB -iopu9 _M89 _97N hi_hello

然后它将接受

    D923-44 and 43 and uou and hi_hello

它应该忽略

    08*) %%5 89ANB -iopu9 _M89 _97N

我可能错过了一些案例，但我认为文字就足够了。抱歉，如果不是

score 2 · Accepted Answer

这似乎可以按需要工作：

regex = re.compile(r"""
    (?<!\S)   # Assert there is no non-whitespace before the current character
    (?:       # Start of non-capturing group:
     [^\W\d_] # Match either a letter
     [\w-]*   # followed by any number of the allowed characters
    |         # or
     \d+      # match a string of digits.
    )         # End of group
    (?!\S)    # Assert there is no non-whitespace after the current character""", 
    re.VERBOSE)

在regex101.com上查看。

score 2 · Accepted Answer

^(\d+|[A-Za-z][\w_-]*)$

正则表达式可视化

在 Debuggex 上实时编辑

用空格分隔符拆分行，然后通过该行运行此 REGEX 以进行过滤。

^是行的开始
\d表示数字[0-9]
+表示一个或多个
|表示或
[A-Za-z]第一个字符必须是字母
[\w_-]*后面可以有任何字母数字 _ + 字符或根本没有。
$表示行尾

REGEX 的流程显示在我提供的图表中，这在一定程度上解释了它是如何发生的。

然而，生病解释基本上它检查它是否是所有数字或它以一个字母（大写/小写）开头，然后在该字母之后检查任何字母数字 _ + 字符，直到行尾。

python - Python 正则表达式提取标记

2 回答 2

Related

Reference