对于这个问题,我已经达到了我有限知识的尽头。目前,我正在解析差异结果。这是我试图操纵的结果示例:
[
[[0, 0, '\xe2\x80\x9cWe are returning again statement. He depicted the attacks as part of a battle launched by Sunnis against the country\xe2\x80\x99s Shia leaders.\r\n\r\nThe first attack came about 5 a.m. on Monday when gunmen stormed onto an Iraqi '],
[-1, 1, 'military base near the town of Duluiyah in S'],
[0, 2, 'alahuddin Province and killed 15 Iraqi soldiers, according to security officials. Four soldiers, including a high-ranking was taken prisoner by the insurgents, who escaped with him.\r\n\r\nThe insurgents also attacked the home of a police official in Balad, seriously wounding ']],
[[0, 4, 'eckpoint near Baquba, killing one policeman. In all, attacks were reported in at least five provinces.\r\n\r\nEight attacks were launched in Kirkuk Province, mostly targeting police patrols, with five people killed and 42 wounded.\r\n\r\nThe offensive started on the third day of the Islamic holy month of Ramadan, and '],
[-1, 5, 'apparently took advantage of the wi'],
[1, 6, 'll and the other.']]
]
我正在构建一个差异摘要器。以下是它的分解方式:
该列表是一个差异结果列表(在上面的示例中为两个)。
子列表包含三个元素:
- 更改前的文本,
- 构成变更的文本;和
- 更改后的文本。
子子列表也具有三个元素:
- 一个数字,表示该部分是删除、添加还是不受影响(分别为-1、0、1);
- 位置编号(顺序);和
- 字符串本身。
我需要做的是对子子列表中的字符串进行切片,但这取决于它们所在的子列表。
- 对于子列表中的元素 1,我需要切掉除最后 4 个字符之外的所有字符串。
- 对于子列表中的元素 2,我需要没有切片。
- 对于子列表中的元素 3,我需要切掉除前 4 个字符之外的所有字符串。
这是为什么我需要以这种方式切片的示例。解决方案之前的简化 tText:
[[[...]], [[this is a],[sentence],[to demonstrate.]], [[...]]]
解决方案后的文字:
[[[...]], [[is a],[sentence],[to d]], [[...]]]
另一个困难是,我想保留列表的结构。
这是艰难的一天 - 我为这个问题的思维弯曲性质道歉,但这就是溢出的目的......
想法?