1

脚本:

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    for item in matches:
        if re.search(item, string):
            print 'Match found: ' + string
        else:
            print 'Match not found: ' + string

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')

输出:

Match not found: hey
Match found: hey
Match not found: hey
Match not found: hey
Match found: hello there
Match not found: hello there
Match not found: hello there
Match not found: hello there
Match not found: this should not match
Match not found: this should not match
Match found: this should not match
Match not found: this should not match
Match not found: oh, hiya
Match not found: oh, hiya
Match found: oh, hiya
Match found: oh, hiya

有很多事情我不明白,对于初学者来说,每个字符串在这个输出中被搜索四次,一些返回两个作为找到的匹配,一些返回三个。我不确定我的代码中有什么问题导致这种情况发生,但是有人可以尝试看看有什么问题吗?

预期的输出是这样的:

Match found: hey
Match found: hello there
Match not found: this should not match
Match found: oh, hiya
4

4 回答 4

5

这不是行为不正确,这是您对re.search(...).

请参阅输出后的评论:

Match not found: hey                    # because 'hello' is not in 'hey'
Match found: hey                        # because 'hey' is in 'hey'
Match not found: hey                    # because 'hi' is not in 'hey'
Match not found: hey                    # because 'hiya' is not in 'hey'

Match found: hello there                # because 'hello' is in 'hello there'
Match not found: hello there            # because 'hey' is not in 'hello there'
Match not found: hello there            # because 'hi' is not in 'hello there'
Match not found: hello there            # because 'hiya' is not in 'hello there'

Match not found: this should not match  # because 'hello' is not in 'this should not match'
Match not found: this should not match  # because 'hey' is not in 'this should not match'
Match found: this should not match      # because 'hi' is in 'this should not match'
Match not found: this should not match  # because 'hiya' is not in 'this should not match'

Match not found: oh, hiya               # because 'hello' is not in 'oh, hiya'
Match not found: oh, hiya               # because 'hey' is not in 'oh, hiya'
Match found: oh, hiya                   # because 'hi' is in 'oh, hiya'
Match found: oh, hiya                   # because 'hiya' is in 'oh, hiya'

如果您不希望hi在 input 的情况下匹配模式oh, hiya,则应在模式周围环绕单词边界:

\bhi\b

这将导致它只匹配hi 被其他字母包围的出现(well hiya there不会匹配模式\bhi\b,但well hi there )。

于 2012-05-27T19:37:49.317 回答
2

试试这个 - 它更简洁,它会标记多个匹配项:

import re

matches = ['hello', 'hey', 'hi', 'hiya']

def check_match(string):
    results = [item for item in matches if re.search(r'\b%s\b' % (item), string)]
    print 'Found %s' % (results) if len(results) > 0 else "No match found"

check_match('hey')
check_match('hello there')
check_match('this should not match')
check_match('oh, hiya')
check_match('xxxxx xxx')
check_match('hello and hey')

给出:

Found ['hey']
Found ['hello']
No match found
Found ['hiya']
No match found
Found ['hello', 'hey']
于 2012-05-27T19:46:47.743 回答
0

for 循环针对每个“匹配项”检查字符串,并为每个匹配项打印找到或未找到。您真正想要的是查看是否有任何匹配项匹配,然后打印出一个“找到”或“未找到”。我实际上并不了解python,因此语法可能不正确。

for item in matches:
    if re.search(item, string):
    found = true
if found:
    print 'Match found: ' + string
else:
    print 'Match not found: ' + string

`

于 2012-05-27T19:41:08.550 回答
0

你会得到 4 个搜索和 4 个输出,因为你正在循环遍历一个数组,为数组中的每个元素搜索并输出一些东西......

于 2012-05-27T19:32:22.200 回答