我有这个正则表达式:(?<![A-Z])(?<=[.!?])\s(?=[A-Z])
它将一个段落分成句子(基于每个空格)。
我在本段中使用了它:Did he know that J. Smith is a name? The term is most commonly applied to the placing of a warship in active duty with its country's military forces. The ceremonies involved are Often rooted in centuries old naval tradition. I.D. is a wonderful word.
它打破了“J. Smith”,因为它认为“。” 代表一个句子的结束。
我正在使用 re.split() 并打印出数组,用换行符分隔值
这是上一段的输出:
Did he know that J.
Smith is a name?
The term is most commonly applied to the placing of a warship in active duty with its
country's military forces. (no newline at beginning of sentence)
The ceremonies involved are Often rooted in centuries old naval tradition.
I.D. is a wonderful word.`
它适用于“ID”,但为什么不适用于“J. Smith”?逻辑上应该...
我希望它在字符串中检测到这个结构:
无大写字母+句点/?/!+空格+大写字母