0

我正在做一个比较两个大文本文件版本(大约 5000 多行文本)的项目。较新的版本包含潜在的新内容和已删除的内容。它旨在帮助检测文本版本的早期变化,因为团队从该文本中接收信息。
为了解决这个问题,我使用了diff-match-patch库,它允许我识别已删除的内容和新内容。在第一步中,我搜索更改。

    public void compareStrings(String oldText, String newText){
        DiffMatchPatch dmp = new DiffMatchPatch();
        LinkedList<Diff> diffs = dmp.diffMain(previousString, newString, false);
    }

然后我通过关键字 INSERT/DELETE 过滤列表以仅获取新/删除的内容。

 public String showAddedElements(){
       
        String insertions = "";
        for(Diff elem: diffs){
            if(elem.operation == Operation.INSERT){
                insertions = insertions + elem.text + System.lineSeparator();
            }
        }
        return insertions;
    }

但是,当我输出内容时,有时我只得到单个字母,例如 (o, contr, ler),而只删除/添加了单个字符。相反,我想输出发生变化的整个句子。有没有办法从发生更改的 DiffMatchPatch 中检索行号?

4

1 回答 1

0

我通过使用另一个库进行行提取找到了解决方案。DiffUtils(DMitry Maumenko 的 DiffUtils 类)帮助我实现了预期的目标。

 /**
 * Converts a String to a list of lines by dividing the string at linebreaks.
 * @param text The text to be converted to a line list
 */
private List<String> fileToLines(String text) {
    List<String> lines = new LinkedList<String>();
    
    Scanner scanner = new Scanner(text);
    while (scanner.hasNext()) {
        String line = scanner.nextLine();
        lines.add(line);
    }
    scanner.close();
    return lines;
}

/**
 * Starts a line-by-line comparison between two strings. The results are included 
 * in an intern list element for further processing.
 * 
 * @param firstText The first string to be compared
 * @param secondText The second string to be compared
 */
public void startLineByLineComparison(String firstText, String secondText){
    List<String> firstString = fileToLines(firstText);
    List<String> secondString = fileToLines(secondText);
    changes = DiffUtils.diff(firstString, secondString).getDeltas();
}

插入后的列表有变化可以通过使用下面的代码提取出来,而elem.getType()表示文本之间的区别类型:

/**
 * Returns a String filled with all removed content including line position
 * @return String with removed content
 */
public String returnRemovedContent(){
    String deletions = "";
    for(Delta elem: changes){
        if(elem.getType() == TYPE.DELETE){
            deletions = deletions + appendLines(elem.getOriginal()) + System.lineSeparator();
        }
    }
    return deletions;
}
于 2022-02-22T10:19:12.243 回答