我有一个解析良好的多段落文档列表(所有段落由\n\n分隔,句子由“。”分隔),我想将其拆分为句子,以及一个表示段落编号的数字文档。例如,(两段)输入是:
First sentence of the 1st paragraph. Second sentence of the 1st paragraph. \n\n
First sentence of the 2nd paragraph. Second sentence of the 2nd paragraph. \n\n
理想情况下,输出应该是:
1 First sentence of the 1st paragraph.
1 Second sentence of the 1st paragraph.
2 First sentence of the 2nd paragraph.
2 Second sentence of the 2nd paragraph.
我熟悉 Perl 中的 Lingua::Sentences 包,它可以将文档分成句子。但是它与段落编号不兼容。因此,我想知道是否有其他方法可以实现上述目标(文档不包含缩写)。任何帮助是极大的赞赏。谢谢!