java - Java中的正则表达式匹配算法

Question

This article说Java中的正则表达式匹配很慢，因为带有“反向引用”的正则表达式无法有效匹配。这篇文章解释了高效的 Thomson 基于 NFA 的匹配算法（发明于 1968 年），它适用于没有“反向引用”的正则表达式。但是Pattern javadoc说 Java 正则表达式使用基于 NFA 的方法。

现在我想知道 Java 正则表达式匹配的效率如何以及它使用什么算法。

score 1 · Accepted Answer

java.util.regex.Pattern使用 Boyer–Moore 字符串搜索算法

/* Attempts to match a slice in the input using the Boyer-Moore string
 * matching algorithm. The algorithm is based on the idea that the
 * pattern can be shifted farther ahead in the search text if it is
 * matched right to left.
 */

private void compile() {
    ----------------------
    -----------------------

   if (matchRoot instanceof Slice) {
        root = BnM.optimize(matchRoot);
        if (root == matchRoot) {
            root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
        }
    } else if (matchRoot instanceof Begin || matchRoot instanceof First) {
        root = matchRoot;
    } else {
        root = hasSupplementary ? new StartS(matchRoot) : new Start(matchRoot);
    }
}

java - Java中的正则表达式匹配算法

1 回答 1

Related

Reference