1

我想用python分割一个字符串。我已经成功地为一个变量完成了它,但发现很难为 2 个变量完成它。

字符串:

Paragraph 4-2 says. i am going home$ early- Yes.

我需要输出

Paragraph 4-2 says
i am going home 
early
Yes

句子应该从.,$-(但是当它在 2 个数字 (4-2) 之间时不应该拆分)

我怎样才能做到这一点?

text.split('.')

更新

新的输出应该是这样的:

Paragraph 4-2 says.
i am going home$ 
early-
Yes.
4

3 回答 3

5
>>> import re
>>> s = 'Paragraph 4-2 says. i am going home$ early- Yes'
>>>
>>> re.split(r'(?<!\d)\s*[.$-]\s*(?!\d)', s)
['Paragraph 4-2 says', 'i am going home', 'early', 'Yes']
  • \s*[.$-]\s* matches any of .,$ or - surrounded by 0 or more spaces (\s*).
  • (?<!\d) is a negative-lookbehind to ensure that the match is not preceded by a digit.
  • (?!\d) is a negative-lookahead to ensure that the match is not followed by a digit.

You can read more about lookarounds here.

于 2013-07-27T15:54:24.853 回答
4
>>> re.split('(?<=\D)[.$-](?=\D|$)', s)
['Paragraph 4-2 says', ' i am going home', ' early', ' Yes']
>>> 

(?<\D)[.$-](?=\D) will get the .$-, not followed or proceded by intergers. And the lookahead and lookbehind won't consume any string. So the string will be splitted only the .$-, without the numbers surrounded by it.

Edit:

>>> re.findall('.*?(?<=\D)[.$-](?=[\D]|$)', s)
['Paragraph 4-2 says.', ' i am going home$', ' early-', ' Yes.']
于 2013-07-27T15:54:49.900 回答
1

你可以这样做:

>>> import re
>>> st='Paragraph 4-2 says. i am going home$ early- Yes.'
>>> [m.group(1) for m in re.finditer(r'(.*?[.$\-])(?:\s+|$)',st)]
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']

如果您根本不打算修改匹配组(使用条带或其他东西),您也可以使用具有相同正则表达式的 findall :

>>> re.findall(r'(.*?[.$\-])(?:\s+|$)',st)
['Paragraph 4-2 says.', 'i am going home$', 'early-', 'Yes.']

正则表达式在此处进行了解释,但总结如下:

(.*?[.$\-])  is the capture group containing:
 .*?          Any character (except newline) 0 to infinite times [lazy] 
    [.$\-]   Character class matching .$- one time

(?:\s+|$)    Non-capturing Group containing:
   \s+        First alternate: Whitespace [\t \r\n\f] 1 to infinite times [greedy] 
      |        or
       $      Second alternate: end of string

根据您的字符串,(.*?[.$\-])(?:[ ]+|$)如果您不想\r\n\f\s

于 2013-07-27T16:38:47.527 回答