“difflib”的相关标签问题_Stack Overflow中文网

0 投票

1 回答

3660 浏览

python - 使用 difflib.HtmlDiff 类 - 显示单个字符

我正在使用difflib.HtmlDiff该类，使用两组文本（来自网站的 HTML）调用该函数，但是当它制作表格时

然而，这似乎只是逐个字符比较（每个表行 1 个字符），我最终得到了一个 4.3MB txt 文件，用于两组只有 100k 的 html。

doc文件说，

然而，情况似乎并非如此。

有什么建议么？

python difflib

2011-06-13T10:24:32.787

0 投票

2 回答

82385 浏览

python - High performance fuzzy string comparison in Python, use Levenshtein or difflib

I am doing clinical message normalization (spell check) in which I check each given word against 900,000 word medical dictionary. I am more concern about the time complexity/performance.

I want to do fuzzy string comparison, but I'm not sure which library to use.

Option 1:

Option 2:

In this example both give the same answer. Do you think both perform alike in this case?

2011-07-14T08:56:37.713

0 投票

3 回答

1029 浏览

python - 带有正则表达式的 Python difflib

我可以在 difflib 中使用正则表达式吗？

具体来说，我想做：

实际是：

黄金是：

python regex difflib

2011-07-18T15:16:30.263

0 投票

1 回答

1147 浏览

python - Python 的 HtmlDiff.make_table() 的最坏情况行为

我正在使用 Python 2.7 的difflib.HtmlDiff.make_table()函数为内部测试用例运行程序生成预期文件和实际文件之间的差异。它们最终形成 HTML 测试报告。

到目前为止，这工作得很好——直到我添加了一个带有更大文件（~400 KiB）的测试用例，有很多差异，通常不包含换行符。几乎我所有的测试用例都在不到 2 秒的时间内执行，一些更复杂的测试用例甚至高达 4 秒。这个新的在通过时同样快，但需要 13 分钟（！）才能失败。所有这些时间都花在生成报告上。我希望你能看到这是一个问题。

试图证明这一点（我知道可能不是最好的方法）：

结果：

difflib.ndiff()（make_table()据我了解，内部使用）似乎没有这个问题：

给我这个：

这看起来很合理，即它是成比例的。四倍的大小需要四倍的时间。

不知道从这里去哪里。我猜当存在差异时，HTML 生成器会进行大量回溯（尽管您会认为 ndiff() 已经处理了该问题）。我可以告诉它提前中止，放弃并将整个部分标记为“不同”吗？

我知道有很多不同的算法可以生成差异。在这种情况下，我不需要它进行非常深入的分析并尝试在任何地方重新同步。我只需要它大致告诉我文件上的哪个位置不同，然后在合理的时间范围内终止。

或者，是否有其他生成 HTML 的 Python 差异库没有这种最坏情况的问题？

python diff time-complexity difflib

2011-08-10T15:27:12.287

0 投票

2 回答

804 浏览

python - Python difflib gnu 补丁兼容性

可以使用与 GNU 补丁兼容的 python 模块 difflib 创建补丁吗？我尝试使用 Unified_diff 和 context_diff，还尝试将 lineterm 指定为“\n”，但我仍然收到此错误：

我使用 file.writelines(diff) 将补丁写入文件（一段代码http://pastebin.com/3HAWfwVf）

文件 test.txt：

文件 test2.txt：

并生成补丁：

谢谢你的帮助。

python compatibility patch difflib

2011-08-27T22:11:43.923

0 投票

1 回答

1440 浏览

python - 如何在 difflibs html 输出中每行突出显示两个以上的字符

我difflib.HtmlDiff用来比较两个文件。我希望在输出的 html 中突出显示差异。

当一行中最多有两个不同的字符时，这已经有效：

但是当一行上有更多不同的字符时，在输出中整行被标记为红色（在左侧）或绿色（在表格的右侧）：

这种行为是可配置的吗？那么我可以设置该行被标记为删除/添加的不同字符的数量吗？

编辑：

例子：

给我这个输出：

第 2 行是我想要的输出。它突出了黄色的差异。第 3 行对我来说很奇怪，因为它没有检测到一个字符的变化，而是将其显示为删除/添加。第 4 行与第 3 行相同，但标记了整行。

python html difflib

2011-10-05T12:06:48.597

0 投票

2 回答

2144 浏览

python - difflib 根据序列的顺序返回不同的比率

有谁知道为什么这两个返回不同的比率。

python difflib

2012-02-17T01:25:38.760

0 投票

1 回答

3304 浏览

python - 在 python 中，生成 HTML 突出显示两个简单字符串的差异

我需要用 python 突出显示两个简单字符串之间的差异，将不同的子字符串包含在 HTML 跨度属性中。所以我正在寻找一种简单的方法来实现以下示例所示的功能：

hightlight_diff('Hello world','HeXXo world','red')

...它应该返回字符串：

'He<span style="color:red">XX</span>o world'

我已经用谷歌搜索并看到了提到的 difflib，但它应该已经过时了，而且我还没有找到任何好的简单演示。

python html diff difflib

2012-02-22T13:59:02.253

0 投票

2 回答

3033 浏览

python - 使 difflib 的 SequenceMatcher 忽略“垃圾”字符

我有很多字符串要匹配相似度（每个字符串平均为 30 个字符）。我发现difflib's SequenceMatcher这项任务非常适合，因为它很简单并且结果很好。hellboy但如果我比较hell-boy喜欢这个

我希望这样的话能得到 100% 的匹配，即ratio of 1.0. 我知道上面函数中指定的垃圾字符不用于比较，而是用于查找最长的连续匹配子序列。有什么方法可以SequenceMatcher忽略一些“垃圾”字符以进行比较？

python difflib sequencematcher

2012-04-02T20:53:36.383

0 投票

3 回答

11514 浏览

python - How does the python difflib.get_close_matches() function work?

The following are two arrays:

output:

shouldnt '198.124.252.102' be the closest match for '198.124.252.101'?

I looked at the documentation where they have specified about some floating type weights but no information on algorithm use.

I am in need to find if the absolute difference between the last two octet is 1 (provided the first three octets are same).

So I am finding the closest string first and then checking that closest string for the above condition.

Is there any other function or way to achieve this? Also how does get_close_matches() behave?

ipaddr doesnt seem to have such a manipulation for ips.

python string ip difflib

2012-06-04T09:41:29.983

问题标签 [difflib]

Reference