java - 检测字符串是否具有唯一字符：将我的解决方案与“Cracking the Coding Interview？”进行比较

Question

我正在阅读“Cracking the Coding Interview”一书，在这里我遇到了一些寻求答案的问题，但我需要帮助来比较我的答案与解决方案。我的算法有效，但我很难理解书中的解决方案。主要是我不明白一些运营商到底在做什么。

任务是：“实现一个算法来确定一个字符串是否包含所有唯一字符。如果你不能使用额外的数据结构怎么办？”

这是我的解决方案：

public static boolean checkForUnique(String str){
    boolean containsUnique = false;

    for(char c : str.toCharArray()){
        if(str.indexOf(c) == str.lastIndexOf(c)){
            containsUnique = true;
        } else {
            containsUnique = false;
        }
    }

    return containsUnique;
}

它有效，但效率如何？我看到Java中String的索引函数的复杂度是O(n*m)

这是书中的解决方案：

public static boolean isUniqueChars(String str) {
    if (str.length() > 256) {
        return false;
    }
    int checker = 0;
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

我对解决方案不太了解的几件事。首先，“|=”运算符是做什么的？为什么要从字符串中的当前字符中减去“a”以获得“val”的值？我知道“<<”是按位左移，但是有什么作用(checker & (1<<val))呢？我知道它是按位的，但我不理解它，因为我不理解 checker 获取值的行。

我只是不熟悉这些操作，不幸的是这本书没有给出解决方案的解释，可能是因为它假设你已经理解了这些操作。

score 110 · Accepted Answer

这里有两个单独的问题：您的解决方案的效率如何，参考解决方案在做什么？让我们独立对待每个人。

首先，您的解决方案：

public static boolean checkForUnique(String str){
    boolean containsUnique = false;

    for(char c : str.toCharArray()){
        if(str.indexOf(c) == str.lastIndexOf(c)){
            containsUnique = true;
        } else {
            containsUnique = false;
        }
    }

    return containsUnique;
}

您的解决方案基本上包括对字符串中所有字符的循环（假设有 n 个），在每次迭代时检查字符的第一个和最后一个索引是否相同。和方法每个都需要 O(n) 时间indexOf，lastIndexOf因为它们必须扫描字符串的所有字符以确定它们是否与您要查找的字符匹配。因此，由于您的循环运行 O(n) 次并且每次迭代执行 O(n) 工作，因此其运行时间为 O(n ² )。

但是，您的代码有些问题。尝试在字符串上运行它aab。它在此输入上是否正常工作？作为提示，一旦您确定有两个或多个重复字符，就可以保证存在重复字符，并且您可以返回并非所有字符都是唯一的。

现在，让我们看一下参考：

public static boolean isUniqueChars(String str) {
    if (str.length() > 256) { // NOTE: Are you sure this isn't 26?
        return false;
    }
    int checker = 0;
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i) - 'a';
        if ((checker & (1 << val)) > 0) return false;
        checker |= (1 << val);
    }
    return true;
}

这个解决方案很可爱。基本思想如下：假设您有一个由 26 个布尔值组成的数组，每个布尔值都跟踪特定字符是否已经出现在字符串中。你一开始都是假的。然后，您遍历字符串的字符，每次看到一个字符时，您都会查看该字符的数组槽。如果是false，这是您第一次看到该角色，您可以将插槽设置为true。如果是true，您已经看过这个角色，您可以立即报告有重复。

请注意，此方法不分配布尔数组。相反，它选择了一个聪明的把戏。由于可能只有 26 个不同的字符，并且中有 32 位int，因此该解决方案创建了一个int变量，其中变量的每个位对应于字符串中的一个字符。该解决方案不是读取和写入数组，而是读取和写入数字的位。

例如，看看这一行：

if ((checker & (1 << val)) > 0) return false;

做什么checker & (1 << val)？好吧，1 << val创建一个除第 th 位int外所有位都为零的值。val然后它使用按位 AND 将此值与checker. 如果位置valin的位checker已经设置，那么它的计算结果是一个非零值（意味着我们已经看到了这个数字）并且我们可以返回 false。否则，它的计算结果为 0，我们还没有看到这个数字。

下一行是这样的：

checker |= (1 << val);

这使用“按位或赋值”运算符，相当于

checker = checker | (1 << val);

此 ORchecker与仅在 position 设置 1 位的值进行OR 运算val，这将打开该位。相当于将val数字的第 th 位设置为 1。

这种方法比你的快得多。首先，由于该函数首先检查字符串的长度是否大于 26（我假设 256 是错字），因此该函数永远不必测试任何长度为 27 或更大的字符串。因此，内循环最多运行 26 次。每次迭代都在按位运算中完成 O(1) 工作，因此完成的整体工作是 O(1)（O(1) 次迭代乘以 O(1) 每次迭代的工作量），这比您的实现要快得多。

如果您还没有看到以这种方式使用按位运算，我建议您在 Google 上搜索“按位运算符”以了解更多信息。

希望这可以帮助！

score 14 · Accepted Answer

这本书的解决方案是我不喜欢的，我认为它功能失调..... templatetypedef 发布了一个全面的答案，表明该解决方案是一个好的解决方案。我不同意，因为本书的答案假设字符串只有小写字符（ascii）并且没有进行验证来确保这一点。

public static boolean isUniqueChars(String str) {
    // short circuit - supposed to imply that
    // there are no more than 256 different characters.
    // this is broken, because in Java, char's are Unicode,
    // and 2-byte values so there are 32768 values
    // (or so - technically not all 32768 are valid chars)
    if (str.length() > 256) {
        return false;
    }
    // checker is used as a bitmap to indicate which characters
    // have been seen already
    int checker = 0;
    for (int i = 0; i < str.length(); i++) {
        // set val to be the difference between the char at i and 'a'
        // unicode 'a' is 97
        // if you have an upper-case letter e.g. 'A' you will get a
        // negative 'val' which is illegal
        int val = str.charAt(i) - 'a';
        // if this lowercase letter has been seen before, then
        // the corresponding bit in checker will have been set and
        // we can exit immediately.
        if ((checker & (1 << val)) > 0) return false;
        // set the bit to indicate we have now seen the letter.
        checker |= (1 << val);
    }
    // none of the characters has been seen more than once.
    return true;
}

考虑到 templatedef 的答案，底线是实际上没有足够的信息来确定这本书的答案是否正确。

我不相信它。

templatedef 对复杂性的回答是我同意的…… ;-)

编辑：作为练习，我将本书的答案转换为可以使用的答案（尽管比本书的答案慢 - BigInteger 很慢）....这个版本与本书的逻辑相同，但没有相同的验证和假设问题（但速度较慢）。显示逻辑也很有用。

public static boolean isUniqueChars(String str) {
    if (str.length() > 32768) {
        return false;
    }
    BigInteger checker = new BigInteger(0);
    for (int i = 0; i < str.length(); i++) {
        int val = str.charAt(i);
        if (checker.testBit(val)) return false;
        checker = checker.setBit(val);
    }
    // none of the characters has been seen more than once.
    return true;
}

score 4 · Accepted Answer

由于一个char值只能包含 256 个不同值之一，因此任何长度超过 256 个字符的字符串都必须包含至少一个重复项。

代码的其余部分checker用作位序列，每个位代表一个字符。它似乎将每个字符转换为一个整数，从a= 1 开始。然后它检查checker. 如果已设置，则表示该字符已被看到，因此我们知道该字符串包含至少一个重复字符。如果尚未看到该字符，则代码设置相应的位checker并继续。

具体来说，(1<<val)生成一个在 position 中具有单个1位的整数val。例如，(1<<3)将是 binary1000或 8。如果 position中的位未在中设置（即值为 0），则表达式checker & (1<<val)将返回零，如果设置了该位，则始终为非零。该表达式将在中设置该位。valchecker(1<<val)checker |= (1<<val)checker

然而，该算法似乎有缺陷：它似乎没有考虑大写字符和标点符号（通常在字典上排在小写字母之前）。它似乎还需要一个 256 位整数，这不是标准的。

正如rolfl在下面的评论中提到的那样，我更喜欢您的解决方案，因为它有效。false您可以通过在识别非唯一字符后立即返回来优化它。

score 0 · Accepted Answer

书中的解决方案不区分大小写。根据实现，“A”和“a”被认为是重复的。

说明：对于带有 char 'A' 的输入字符串，'A' - 'a' 为 -32，因此 '1 << val' 将被评估为 1 << -32。对任何负数进行移位都会以相反的方向移动位。因此 1 << -32 将是 1 >> 32。这会将第一位设置为 1。char 'a' 也是如此。因此，“A”和“a”被视为重复字符。同样，对于“B”和“b”，第二位设置为 1，依此类推。

score 0 · Accepted Answer

第六版更新

    public static void main(String[] args) {
        System.out.println(isUniqueChars("abcdmc")); // false
        System.out.println(isUniqueChars("abcdm")); // true
        System.out.println(isUniqueChars("abcdm\u0061")); // false because \u0061 is unicode a
    }


    public static boolean isUniqueChars(String str) {
        /*
         You should first ask your interviewer if the string is an ASCII string or a Unicode string.
         Asking this question will show an eye for detail and a solid foundation in computer science.
         We'll assume for simplicity the character set is ASCII.
         If this assumption is not valid, we would need to increase the storage size.
         */
        // at 6th edition of the book, there is no pre condition on string's length
        /*
         We can reduce our space usage by a factor of eight by using a bit vector.
         We will assume, in the below code, that the string only uses the lowercase letters a through z.
         This will allow us to use just a single int.
          */
        // printing header to provide nice csv format log, you may uncomment
//        System.out.println("char,val,valBinaryString,leftShift,leftShiftBinaryString,checker");
        int checker = 0;
        for (int i = 0; i < str.length(); i++) {
            /*
                Dec Binary Character
                97  01100001    a
                98  01100010    b
                99  01100011    c
                100 01100100    d
                101 01100101    e
                102 01100110    f
                103 01100111    g
                104 01101000    h
                105 01101001    i
                106 01101010    j
                107 01101011    k
                108 01101100    l
                109 01101101    m
                110 01101110    n
                111 01101111    o
                112 01110000    p
                113 01110001    q
                114 01110010    r
                115 01110011    s
                116 01110100    t
                117 01110101    u
                118 01110110    v
                119 01110111    w
                120 01111000    x
                121 01111001    y
                122 01111010    z
             */
            // a = 97 as you can see in ASCII table above
            // set val to be the difference between the char at i and 'a'
            // b = 1, d = 3.. z = 25
            char c = str.charAt(i);
            int val = c - 'a';
            // means "shift 1 val numbers places to the left"
            // for example; if str.charAt(i) is "m", which is the 13th letter, 109 (g in ASCII) minus 97 equals 12
            // it returns 1 and 12 zeros = 1000000000000 (which is also the number 4096)
            int leftShift = 1 << val;
            /*
                An integer is represented as a sequence of bits in memory.
                For interaction with humans, the computer has to display it as decimal digits, but all the calculations
                are carried out as binary.
                123 in decimal is stored as 1111011 in memory.

                The & operator is a bitwise "And".
                The result is the bits that are turned on in both numbers.

                1001 & 1100 = 1000, since only the first bit is turned on in both.

                It will be nicer to look like this

                1001 &
                1100
                =
                1000

                Note that ones only appear in a place when both arguments have a one in that place.

             */
            int bitWiseAND = checker & leftShift;
            String leftShiftBinaryString = Integer.toBinaryString(leftShift);
            String checkerBinaryString = leftPad(Integer.toBinaryString(checker), leftShiftBinaryString.length());
            String leftShiftBinaryStringWithPad = leftPad(leftShiftBinaryString, checkerBinaryString.length());
//            System.out.printf("%s &\n%s\n=\n%s\n\n", checkerBinaryString, leftShiftBinaryStringWithPad, Integer.toBinaryString(bitWiseAND));
            /*
            in our example with string "abcdmc"
            0 &
            1
            =
            0

            01 &
            10
            =
            0

            011 &
            100
            =
            0

            0111 &
            1000
            =
            0

            0000000001111 &
            1000000000000
            =
            0

            1000000001111 &
            0000000000100
            =
            100
             */
//            System.out.println(c + "," + val + "," + Integer.toBinaryString(val) + "," + leftShift + "," + Integer.toBinaryString(leftShift) + "," + checker);
            /*
            char val valBinaryString leftShift leftShiftBinaryString checker
            a   0       0               1       1                       0
            b   1       1               2       10                      1
            c   2       10              4       100                     3
            d   3       11              8       1000                    7
            m   12      1100            4096    1000000000000           15
            c   2       10              4       100                     4111
             */
            if (bitWiseAND > 0) {
                return false;
            }
            // setting 1 on on the left shift
            /*
            0000000001111 |
            1000000000000
            =
            1000000001111
             */
            checker = checker | leftShift;
        }
        return true;
        /*
        If we can't use additional data structures, we can do the following:
        1. Compare every character of the string to every other character of the string.
            This will take 0( n 2 ) time and 0(1) space
        2. If we are allowed to modify the input string, we could sort the string in O(n log(n)) time and then linearly
            check the string for neighboring characters that are identical.
            Careful, though: many sorting algorithms take up extra space.

        These solutions are not as optimal in some respects, but might be better depending on the constraints of the problem.
         */
    }

    private static String leftPad(String s, int i) {
        StringBuilder sb = new StringBuilder(s);
        int charsToGo = i - sb.length();
        while (charsToGo > 0) {
            sb.insert(0, '0');
            charsToGo--;
        }
        return sb.toString();
    }

score 0 · Accepted Answer

正如'Cracking the Coding Interview'中所引用的那样，存在另一种解决方案：

boolean isUniqueChars(String str) {
  if(str.length() > 128) return false;

  boolean[] char_set = new boolean[128];
  for(int i = 0; i < str.length(); i++) {
    int val = str.charAt(i);

    if(char_set[val]) {
      return false;
    }
    char_set[val] = true;
  }
  return true;
}

当然，要达到更好的空间复杂度，请参考上面@templatetypedef的例子

score -1 · Accepted Answer

这是对本书代码的必要更正：

public static boolean checkForUnique(String str){
    boolean containsUnique = false;

    for(char c : str.toCharArray()){
        if(str.indexOf(c) == str.lastIndexOf(c)){
            containsUnique = true;
        } else {
            return false;
        }
    }

    return containsUnique;
}

java - 检测字符串是否具有唯一字符：将我的解决方案与“Cracking the Coding Interview？”进行比较

7 回答 7

Related

Reference