java - Java - What is the best way to find first duplicate character in a string

Question

I have written below code for detecting first duplicate character in a string.

public static int detectDuplicate(String source) {
    boolean found = false;
    int index = -1;
    final long start = System.currentTimeMillis();
    final int length = source.length();
    for(int outerIndex = 0; outerIndex < length && !found; outerIndex++) {
        boolean shiftPointer = false;
        for(int innerIndex = outerIndex + 1; innerIndex < length && !shiftPointer; innerIndex++ ) {
            if ( source.charAt(outerIndex) == source.charAt(innerIndex)) {
                found = true;
                index = outerIndex;
            } else {
                shiftPointer = true;
            }
        }
    }
    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

I need help on two things:

What is the worst case complexity of this algorithm? - my understanding is O(n).
Is it the best way to do this? Can somebody provide a better solution (if any)?

Thanks, NN

score 13 · Accepted Answer

正如其他人所提到的，您的算法是 O(n^2)。这是一个 O(N) 算法，因为 HashSet#add 在恒定时间内运行（散列函数将元素正确地分散在桶中） - 请注意，我最初将散列集大小设置为最大大小以避免调整大小/重新散列：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Set<Character> uniqueChars = new HashSet<Character> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        if (!uniqueChars.add(chars[i])) return i;
    }
    return -1;
}

注意：这将返回第一个重复的索引（即，与前一个字符重复的第一个字符的索引）。要返回该字符第一次出现的索引，您需要将索引存储在 a Map<Character, Integer>（Map#put在这种情况下也是 O(1)）：

public static int findDuplicate(String s) {
    char[] chars = s.toCharArray();
    Map<Character, Integer> uniqueChars = new HashMap<Character, Integer> (chars.length, 1);
    for (int i = 0; i < chars.length; i++) {
        Integer previousIndex = uniqueChars.put(chars[i], i);
        if (previousIndex != null) {
            return previousIndex;
        }
    }
    return -1;
}

score 1 · Accepted Answer

这是 O(n**2)，而不是 O(n)。考虑案例abcdefghijklmnopqrstuvwxyzz。outerIndex程序终止前的范围为 0 到 25，每次递增时，innerIndex范围为outerIndex到 26。

要达到 O(n)，您需要对列表进行一次遍历，并在每个位置进行 O(1) 工作。由于在每个位置要做的工作是检查角色之前是否见过（如果是，在哪里），这意味着你需要一个 O(1) 的地图实现。哈希表为您提供；由字符代码索引的数组也是如此。

assylias展示了如何使用 hashing 来做到这一点，所以这里是如何使用数组来做到这一点（只是为了笑，真的）：

public static int detectDuplicate(String source) {
    int[] firstOccurrence = new int[1 << Character.SIZE];
    Arrays.fill(firstOccurrence, -1);
    for (int i = 0; i < source.length(); i++) {
        char ch = source.charAt(i);
        if (firstOccurrence[ch] != -1) return firstOccurrence[ch];
        else firstOccurrence[ch] = i;
    }
    return -1;
}

score 1 · Accepted Answer

复杂度大致为O(M^2)，其中M是字符串长度和可能字符集大小之间的最小值K。

你可以通过简单O(M)地O(K)记住你第一次遇到每个独特角色的位置来记住它。

score 0 · Accepted Answer

我可以大大改进你的算法。应该这样做：

StringBuffer source ...
char charLast = source.charAt( source.len()-1 );
int xLastChar = source.len()-1;
source.setCharAt( xLastChar, source.charAt( xLastChar - 1 ) );
int i = 1;
while( true ){
    if( source.charAt(i) == source.charAt(i-1) ) break;
    i += 1;
}
source.setCharAt( xLastChar, charLast );
if( i == xLastChar && source.charAt( xLastChar-1 ) != charLast ) return -1;
return i;

对于大字符串，此算法的速度可能是您的两倍。

score 0 · Accepted Answer

好的，我发现下面的逻辑可以减少O(N^2)到O(N).

public static int detectDuplicate(String source) {
    int index = -1;
    boolean found = false;
    final long start = System.currentTimeMillis();

    for(int i = 1; i <= source.length() && !found; i++) {
        if(source.charAt(i) == source.charAt(i-1)) {
            index = (i - 1);
            found = true;
        }
    }

    System.out.println("Time taken --> " + (System.currentTimeMillis() - start) + " ms. for string of length --> " + source.length());
    return index;
}

这也显示了与我以前的算法相比的性能改进，该算法具有 2 个嵌套循环。这需要从末尾存在重复130ms.字符的字符中检测第一个重复字符。63million

我不确定这是否是最好的解决方案。如果有人找到更好的，请分享。

谢谢，

神经网络

score 0 · Accepted Answer

您可以尝试：

 public static char firstRecurringChar(String s)
    {
    char x=' ';
    System.out.println("STRING : "+s);
    for(int i =0;i<s.length();i++)
    {
        System.out.println("CHAR AT "+i+" = " +s.charAt(i));
        System.out.println("Last index of CHAR AT "+i+" = " +s.lastIndexOf(s.charAt(i)));
        if(s.lastIndexOf(s.charAt(i)) >i){
            x=s.charAt(i);
            break;
        }
    }
    return x;
    }

score -1 · Accepted Answer

O(1)算法

由于两个嵌套循环，您的解决方案是 O(n^2)。

最快的算法是O(1)（恒定时间）：

public static int detectDuplicate(String source) {
    boolean[] foundChars = new boolean[Character.MAX_VALUE+1];
    for(int i = 0; i < source.length(); i++) {
        if(i >= Character.MAX_VALUE) return Character.MAX_VALUE;
        char currentChar = source.charAt(i);
        if(foundChars[currentChar]) return i;
        foundChars[currentChar] = true;
    }
    return -1;
}

不过，这只是对大快而言哦。

java - Java - What is the best way to find first duplicate character in a string

7 回答 7

Related

Reference