python - 正则表达式过滤带有数字的重复项目

Question

我的项目清单如下

list1=['test_input_1','test_input_2','test_input_3','test_input_10','test_input_11']

我需要以下输出 - test_input_1

代码

for each in list1:
    string1 = each
    pattern = r'test_.*[1].*'
    match = re.search(pattern,string1)
    if match:
        print 'matched=', match.group()

Output-
matched= test_input_1
matched= test_input_10
matched= test_input_11

Expected Output-
matched= test_input_1

另外，模式前的“r”和“u”有什么区别？

score 2 · Accepted Answer

我不确定您的用例到底是什么，或者您正在尝试做什么..您编写的代码完全符合它应该做的......

看来您没有正确理解正则表达式...

我会为你分解test_.*[1].*...

test_：只是想在文本中找到“test_”。
.* : 这意味着任何字符 ( .) 任意次数 ( *) 这意味着它也可以是 0。
[1] : 这表示组中的任何字符，因此在这种情况下，给出的唯一字符是1.
.* : 这意味着任何字符 ( .) 任意次数 ( *) 这意味着它也可以是 0。（再次）

所以你得到test_input_1, test_input_10,是有道理的，test_input_11因为它们都遵循这种模式。

由于您只想捕获匹配的模式，test_input_1因此使用正则表达式毫无意义……您只需将列表中的每个字符串与test_input_1.

for item in list1:
    if item == 'test_input_1':
        # you found it!
        print ("Found: test_input_1")

我不确定你想用这个来完成什么......

也许这样的事情可以帮助你更多：

for idx, item in enumerate(list1):
    if item == 'test_input_1':
        print ('Found "test_input_1" at index %s' % idx)

但是如果你需要在正则表达式中做同样的想法，那么就像这样：

import re

def find_pattern(pattern, lst):
    regex = re.compile(pattern)
    for idx, item in enumerate(lst):
        match = regex.match(item)
        if not match:
            continue
        yield match.group(1), idx

list1=['test_input_1','test_input_2','test_input_3','test_input_10','test_input_11']
pat = r'(test_.*_1)\b'

for r in find_pattern(pat, list1):
    print 'found %s at index %s' % r

>>> 
found test_input_1 at index 0

python - 正则表达式过滤带有数字的重复项目

1 回答 1

Related

Reference