2

我有一段文本的两个版本,我想生成其修订版的 HTML 视图,类似于 Google Docs 或 Stack Overflow 显示的内容。我需要在 Python 中执行此操作。我不知道这种技术叫什么,但我假设它有一个名字,希望有一个 Python 库可以做到这一点。

版本 1:

威廉亨利“比尔”盖茨三世(生于 1955 年 10 月 28 日)[2] 是一位美国商业巨头、慈善家和微软董事长[3],微软是他与保罗艾伦共同创立的软件公司。

版本 2:

威廉亨利“比尔”盖茨三世(生于 1955 年 10 月 28 日)[2] 是一位商业巨头、慈善家和微软董事长[3],微软是他与保罗艾伦共同创立的软件公司。他是美国人。

所需的输出:

威廉亨利“比尔”盖茨三世(生于 1955 年 10 月 28 日)[2] 是一位美国商业巨头、慈善家和微软董事长[3],微软是他与保罗艾伦共同创立的软件公司。 他是美国人。

使用 diff 命令不起作用,因为它告诉我哪些行不同,但不告诉我哪些列/单词不同。

$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.' > oldfile
$ echo 'William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > newfile
$ diff -u oldfile newfile
--- oldfile 2010-04-30 13:32:43.000000000 -0700
+++ newfile 2010-04-30 13:33:09.000000000 -0700
@@ -1 +1 @@
-William Henry "Bill" Gates III (born October 28, 1955)[2] is an American business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.
+William Henry "Bill" Gates III (born October 28, 1955)[2] is a business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  He is American.' > oldfile
4

3 回答 3

1

Google Diff Merge Patch在纯 python 中有一个非常好的 diff 实现。

于 2010-04-30T21:01:18.150 回答
0

您可以使用wdiff。我不知道是否有 Python 实现:

$ wdiff oldfile newfile
William Henry "Bill" Gates III (born October 28, 1955)[2] is [-an American-] {+a+} business magnate, philanthropist, and chairman[3] of Microsoft, the software company he founded with Paul Allen.  {+He is American.+}
于 2010-04-30T20:56:33.520 回答
0

difflib模块可能有助于解决这个问题

于 2010-04-30T21:02:23.233 回答