There are two problems.
A character class specifies a number of possibilities to match a single character in the text being searched. The lookahead and lookbehind assert conditions around the character you would match, but are not part of that character.
The lookahead characters are not special in a character class - they are treated as the literal characters. Your character class r'[^\s(?=\-)]'
is equivalent to r'[^\-)(?\s=]'
and means "Match every character except for =, ?, (, whitespace, and all the characters between \ and )".
For what you seem to be trying to do, try matching every character except the dash, and use alternation to get the dashes that are not preceded by space:
r'([^-]|(?<!\s-))'
(Edited after question added examples)
If you can trust that ' - '
always separates an artist from a song title, and will always do so on its first occurrence, you can just use the split
method on each string, as follows:
>>> "jay-z - title".split(' - ', 1)
['jay-z', 'title']
>>> 'prince - purple rain'.split(' - ', 1)
['prince', 'purple rain']
>>> 'prince - purple rain - a love-song'.split(' - ', 1)
['prince', 'purple rain - a love-song']
split
takes a substring on which to split, and an optional maximum number of splits to do from that string. split
returns the source string split into a list of substrings on the split argument, with the split argument removed.
Specifying a maximum number of splits N returns a list of N+1 substrings with the first N instances of the split target removed. Any subsequent instances of the split target are left in place.
split
defaults to left-to-right reading of the string, and you can get right-to-left reading of the string with rsplit
, which also supports a maxsplit
optional argument:
>>> 'prince - purple rain - a love-song'.split(' - ', 1)
['prince', 'purple rain - a love-song']
>>> 'prince - purple rain - a love-song'.rsplit(' - ', 1)
['prince - purple rain', 'a love-song']
The built-in string type has a lot of functionality, which you can find in the Python documentation.