0
>>> line = '\xc2d sdsdfdslkfsdkfjdsf'
>>> pattern_strings = ['\xc2d', '\xe9']
>>> pattern = '|'.join(pattern_strings)
>>> pattern
'\xc2d|\xe9'
>>> import re
>>> re.findall(pattern, line)
['\xc2d']

When I put line in a file and try to do the same regex, it doen't show up anything

def find_pattern(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            pattern_strings = ['\xc2d', '\xe9'] # or using ['\\xc2d', '\xe9'] doesn't help
            pattern = '|'.join(pattern_strings)
            print re.findall(pattern, line)

where path is a file looked as following
\xc2d sdsdfdslkfsdkfjdsf

I see

\xc2d
[]
d\xa0
[]
\xe7
[]
\xc3\ufffdd
[]
\xc3\ufffdd
[]
\xc2\xa0
[]
\xc3\xa7
[]
\xa0\xa0
[]
'619d813\xa03697'
[]
4

2 回答 2

2

line = "\xc2d bla"是一个 Python 字符串,其中 `"\xc2d" 是一个包含 2 个字符的子字符串。

您的文件听起来像是其中包含文字字符串“\xc2d”,它与该模式不匹配。

如果您想匹配文字字符串,则需要匹配其每个字符(因此,请转义斜杠)。

pattern = r"\\xc2d" 
于 2012-07-27T20:30:43.070 回答
1

您需要以二进制模式读取文件f = open("myfile", "rb")以防止\x转换,因为在 Python 中\xhh表示十六进制转义字符

非二进制读取将失败 -在此处检查。

于 2012-07-27T21:00:35.360 回答