python - 在 Python 中使用正则表达式拆分差异文件

Question

我正在尝试使用 python 中的 re 模块将差异（统一格式）拆分为每个部分。diff的格式是这样的...

diff --git a/src/core.js b/src/core.js
index 9c8314c..4242903 100644
--- a/src/core.js
+++ b/src/core.js
@@ -801,7 +801,7 @@ jQuery.extend({
        return proxy;
    },

-   // Mutifunctional method to get and set values to a collection
+   // Multifunctional method to get and set values of a collection
    // The value/s can optionally be executed if it's a function
    access: function( elems, fn, key, value, chainable, emptyGet, pass ) {
        var exec,
diff --git a/src/sizzle b/src/sizzle
index fe2f618..feebbd7 160000
--- a/src/sizzle
+++ b/src/sizzle
@@ -1 +1 @@
-Subproject commit fe2f618106bb76857b229113d6d11653707d0b22
+Subproject commit feebbd7e053bff426444c7b348c776c99c7490ee
diff --git a/test/unit/manipulation.js b/test/unit/manipulation.js
index 18e1b8d..ff31c4d 100644
--- a/test/unit/manipulation.js
+++ b/test/unit/manipulation.js
@@ -7,7 +7,7 @@ var bareObj = function(value) { return value; };
 var functionReturningObj = function(value) { return (function() { return value; }); };

 test("text()", function() {
-   expect(4);
+   expect(5);
    var expected = "This link has class=\"blog\": Simon Willison's Weblog";
    equal( jQuery("#sap").text(), expected, "Check for merged text of more then one element." );

@@ -20,6 +20,10 @@ test("text()", function() {
        frag.appendChild( document.createTextNode("foo") );

    equal( jQuery( frag ).text(), "foo", "Document Fragment Text node was retreived from .text().");
+
+   var $newLineTest = jQuery("<div>test<br/>testy</div>").appendTo("#moretests");
+   $newLineTest.find("br").replaceWith("\n");
+   equal( $newLineTest.text(), "test\ntesty", "text() does not remove new lines (#11153)" );
 });

 test("text(undefined)", function() {
diff --git a/version.txt b/version.txt
index 0a182f2..0330b0e 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-1.7.2
\ No newline at end of file
+1.7.3pre
\ No newline at end of file

我已经尝试了以下模式组合，但不能完全正确。这是我迄今为止最接近的一次......

re.compile(r'(diff.*?[^\rdiff])', flags=re.S|re.M)

但这会产生

['diff ', 'diff ', 'diff ', 'diff ']

我将如何匹配此差异中的所有部分？

score 1 · Accepted Answer

您不需要使用正则表达式，只需拆分文件：

diff_file = open('diff.txt', 'r')
diff_str = diff_file.read()
diff_split = ['diff --git%s' % x for x in diff_str.split('diff --git') \
              if x.strip()]
print diff_split

score 1 · Accepted Answer

这样做：

r=re.compile(r'^(diff.*?)(?=^diff|\Z)', re.M | re.S)
for m in re.findall(r, s):
    print '===='
    print m

score 0 · Accepted Answer

你为什么使用正则表达式？当一行以开头时，只遍历这些行并开始一个新部分diff怎么样？

list_of_diffs = []
temp_diff = ''
for line in patch:
    if line.startswith('diff'):
        list_of_diffs.append(temp_diff)
        temp_diff = ''
    else: temp_diff.append(line)

免责声明，以上代码仅应被视为说明性伪代码，预计不会实际运行。

正则表达式是一把锤子，但你的问题不是钉子。

score 0 · Accepted Answer

只需拆分后跟单词的任何换行符diff：

result = re.split(r"\n(?=diff\b)", subject)

虽然为了安全起见，您可能应该尝试匹配\r或\r\n：

result = re.split(r"(?:\r\n|[\r\n])(?=diff\b)", subject)

python - 在 Python 中使用正则表达式拆分差异文件

4 回答 4

Related

Reference