0

如何用所有'\x000','\x001','\x002'分割以下字符串?我尝试了如下的正则表达式,但它没有用!

z = re.compile(r'[\x000\x001\x002\x003\x004\x005]:')

line = '114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"\x000:1342434156.712809 get_cache http://www.nownews.com/\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww\x000:1342434156.731564 new version\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]'
z.split(line)

编辑1

字符串中有\x000、\x001、\x002....。我想用这些字符分割字符串。

预期的输出应该是:

['114.37.114.95 - - [16/Jul/2012:03:22:37 -0700] "GET /query?dest=adjustable_layout&from_url=http%3A%2F%2Fwww.nownews.com%2F&referer=&width=300&height=330&api_version=1 HTTP/1.1" 200 10481 "http://www.nownews.com/" "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; Foxy/1; .NET CLR 1.1.4322; .NET CLR 2.0.50727; .NET CLR 3.0.04506.648; .NET CLR 3.5.21022; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729; yie8)"', '\x000:1342434156.712809 get_cache http://www.nownews.com/', '\x000:1342434156.717942 Cache Hits agtzfnRhZ3Rvby1lY3IjCxIGTmV3c0FkIhdodHRwOi8vd3d3Lm5vd25ld3MuY29tLww', '\x000:1342434156.731564 new version', '\x001:1342434156.732352 display:[(u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'29\'), (u\'1\', u\'114\', u\'13\'), (u\'1\', u\'114\', u\'21\'), (u\'1\', u\'114\', u\'24\'), (u\'1\', u\'114\', u\'26\'), (u\'1\', u\'114\', u\'13\')]']
4

1 回答 1

3

\x000是一个两字节的字符串,由\x00(hex 0x00) 和0(hex 0x30) 组成。

因此,您不能在这样的字符类中使用它。但

z = re.compile(r'(\x00[0-5]:)')

作品。通过将正则表达式括在括号中,定界符也将成为结果列表的一部分,尽管没有直接连接到它们分离的字符串部分(如您编辑的问题中)。

如果您确实希望将分隔符保留为结果字符串的一部分,则不能使用.split(). 相反,使用.findall()

>>> z = re.compile(r'(?:\x00[0-5]:)?(?:(?!\x00[0-5]:).)*', re.S)
>>> z.findall(line)

解释:

(?:\x00[0-5]:)? # Match an optional leading \x000:, \x001: etc.
(?:             # Match...
 (?!\x00[0-5]:) #  as long as we're not at the start of another \x00n:
 .              #  any character (including newlines: re.S)
)*              # any number of times.
于 2012-07-16T10:52:22.300 回答