2

我已经阅读了文档并查看了其他问题,但我还没有找到答案。

是否可以在集合中使用前瞻,或者将前瞻作为集合内的补充?

我想创建一个匹配每个字符的集合,除了前面有空格的破折号。但是,如果只有一个空格,后面没有破折号,它应该是匹配的。

我在想这样的事情会起作用,但它没有:

r'[^\s(?=\-)]'

前瞻在集合内不起作用吗?如果没有,我该如何解决这个问题?

编辑以提供示例:

我一直在努力寻找更准确的替代方案

r'([^\-]*)\-(.*)'

它旨在读取一行并将艺术家与标题分开。

申请re.match(r'([^\-]*)\-(.*)', "artist - title")应产生:

group(1) = "artist"
group(2) = "title"

但是,如果艺术家姓名包含破折号,则会捕获字符串的错误部分。

例子:

re.match(r'([^\-]*)\-(.*)', "jay-z - title")

会产生:

group(1) = "jay"
group(2) = "z - title"

我希望捕获组捕获空格和破折号,但如果它前面有一个空格,则不捕获一个破折号(或者如果它后面跟着一个破折号,则假设先行与后视)捕获一个空格。

4

1 回答 1

1

There are two problems.

  1. A character class specifies a number of possibilities to match a single character in the text being searched. The lookahead and lookbehind assert conditions around the character you would match, but are not part of that character.

  2. The lookahead characters are not special in a character class - they are treated as the literal characters. Your character class r'[^\s(?=\-)]' is equivalent to r'[^\-)(?\s=]' and means "Match every character except for =, ?, (, whitespace, and all the characters between \ and )".

For what you seem to be trying to do, try matching every character except the dash, and use alternation to get the dashes that are not preceded by space:

r'([^-]|(?<!\s-))'

(Edited after question added examples)

If you can trust that ' - ' always separates an artist from a song title, and will always do so on its first occurrence, you can just use the split method on each string, as follows:

>>> "jay-z - title".split(' - ', 1)
['jay-z', 'title']
>>> 'prince - purple rain'.split(' - ', 1)
['prince', 'purple rain']
>>> 'prince - purple rain - a love-song'.split(' - ', 1)
['prince', 'purple rain - a love-song']

split takes a substring on which to split, and an optional maximum number of splits to do from that string. split returns the source string split into a list of substrings on the split argument, with the split argument removed.

Specifying a maximum number of splits N returns a list of N+1 substrings with the first N instances of the split target removed. Any subsequent instances of the split target are left in place.

split defaults to left-to-right reading of the string, and you can get right-to-left reading of the string with rsplit, which also supports a maxsplit optional argument:

>>> 'prince - purple rain - a love-song'.split(' - ', 1)
['prince', 'purple rain - a love-song']
>>> 'prince - purple rain - a love-song'.rsplit(' - ', 1)
['prince - purple rain', 'a love-song']

The built-in string type has a lot of functionality, which you can find in the Python documentation.

于 2013-05-18T15:13:56.777 回答