java - 为什么 String.equalsIgnoreCase 这么慢

Question

我在面试中遇到了一个问题，要编写一种方法来检查相似词，而不管字符大小写。

我通过使用每对字符的 ASCII 值的差异来回答它。但是在家里，当我在 String.class 中完成它的实际实现时，我感到很不安——为什么它是这样实现的！

我试图通过这种方式在内置方法和我的自定义方法之间进行比较-

public class EqualsIgnoreCase {

    public static void main(String[] args) {
        String str1 = "Srimant @$ Sahu 959s";
        String str2 = "sriMaNt @$ sAhu 959s";

        System.out.println("Avg millisecs with inbuilt () - " + averageOfTenForInbuilt(str1, str2));
        System.out.println("\nAvg millisecs with custom () - " + averageOfTenForCustom(str1, str2));
    }

    public static int averageOfTenForInbuilt(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start1 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                str1.equalsIgnoreCase(str2);
            }
            avg += System.currentTimeMillis() - start1;
        }
        return avg / 10;
    }

    public static int averageOfTenForCustom(String str1, String str2) {
        int avg = 0;
        for (int itr = 0; itr < 10; itr++) {
            long start2 = System.currentTimeMillis();
            for (int i = 0; i < 100000; i++) {
                isEqualsIgnoreCase(str1, str2);
            }
            avg += System.currentTimeMillis() - start2;
        }
        return avg / 10;
    }

    public static boolean isEqualsIgnoreCase(String str1, String str2) {
        int length = str1.length();
        if (str2.length() != length) {
            return false;
        }

        for (int i = 0; i < length; i++) {
            char ch1 = str1.charAt(i);
            char ch2 = str2.charAt(i);

            int val = Math.abs(ch1 - ch2);
            if (val != 0) {
                if (isInAlphabetsRange(ch1, ch2)) {
                    if (val != 32) {
                        return false;
                    }
                } else {
                    return false;
                }
            }
        }
        return true;
    }

    public static boolean isInAlphabetsRange(char ch1, char ch2) {
        return (((ch1 <= 122 && ch1 >= 97) || (ch1 <= 90 && ch1 >= 65)) && ((ch2 <= 122 && ch2 >= 97) || (ch2 <= 90 && ch2 >= 65)));
    }

}

输出-

内置 () 的平均毫秒数 - 14

使用自定义 () 的平均毫秒数 - 5

我发现内置方法正在提高效率，因为有很多检查和方法调用。这种实施背后有什么具体原因吗？还是我在逻辑中遗漏了什么？

任何建议，将不胜感激！

score 64 · Accepted Answer

Your routine only handles ASCII characters. The system one handles all unicode characters.

Consider following example:

public class Test {

    public static void main(String[] args) {
        System.out.println((int) 'ě'); // => 283
        System.out.println((int) 'Ě'); // => 282 
    }

}

score 56 · Accepted Answer

你的方法在很多方面都不正确。例如，它认为“！” 等于“B”，“B”等于“1”，但是“！” 不等于“1”（因此它不像我们所期望的那样传递equals方法）。

是的，很容易为该方法编写一个更快更简单的错误实现。一个公平的挑战是编写一个正确的，即正确处理 JDK 实现所做的所有参数。

您可能还希望查看如何在 Java 中编写正确的微基准测试？以获得更可靠的性能测量。

score 11 · Accepted Answer

这可能不是唯一的原因，但您的解决方案实际上并不适用于所有可能的字符串这一事实绝对是一个因素。

在某些（烦人的）语言环境中，两个字符可能具有相同的大写字母但不同的小写字母。出于这个原因，为了工作（大多数时候，请参见土耳其语），规范实现必须比较字符串的大小写和大小写。

您的实现可能在 99% 的情况下都是完美的，特别是如果您只需要处理英语语言环境，但不幸的是核心库实现不能做出这样的假设。

score 4 · Accepted Answer

我认为检查

String1.equalsIgnoreCase(String2)

提供的字符接受度要好得多，它接受Unicode中包含的所有类型的字符值；但是；您试图通过自定义代码弄清楚的是您只比较英文字母字符。

因此，我认为，根据您帖子的评论员Pavel Horel的说法，由于它提供了对各种 Unicode 字符进行比较的复杂性，可能需要更多时间。

score 2 · Accepted Answer

我认为 String.java 的这段摘录是相关的：

if (ignoreCase) {
    // If characters don't match but case may be ignored,
    // try converting both characters to uppercase.
    // If the results match, then the comparison scan should
    // continue.
    char u1 = Character.toUpperCase(c1);
    char u2 = Character.toUpperCase(c2);
    if (u1 == u2) {
        continue;
    }
    // Unfortunately, conversion to uppercase does not work properly
    // for the Georgian alphabet, which has strange rules about case
    // conversion.  So we need to make one last check before
    // exiting.
    if (Character.toLowerCase(u1) == Character.toLowerCase(u2)) {
        continue;
    }
}

java - 为什么 String.equalsIgnoreCase 这么慢

5 回答 5

Related

Reference