java - 如何删除重复项？

Question

我无法从填充了随机整数的数组中删除重复项。我编写了一个 java 类来生成随机数，并在我的主程序中调用了这些随机数，它们将这些随机数写入 .txt 文件。然后我将从这个 .txt 文件中读取数据并将它们存储在一个新数组中，删除所有重复项。接下来，我必须将新的随机数集重新写入一个新的 .txt 文件，其中第一行的数字最小，最后一行的数字最大。所以新列表中的顺序无关紧要。

我的问题是我不确定如何删除重复项。我可以从发布的其他问题中看到人们说要使用 Set 或 hashset 但我已经研究过这些。那么是否有另一种方法可以通过遍历数组或其他方式来删除它们？

import java.io.*;
class MainProg{

    public static void main (String[]args){

        GenKeys keys = new GenKeys();

        //System.out.println(keys.getrandom());
        //System.out.println(keys.getrandom());

        try{
                    String f = "keys.txt";
                    FileWriter fw = new FileWriter(f);
                    BufferedWriter bw = new BufferedWriter(fw);

                    for (int i=1; i<=500; i++){
                        //bw.write(i+ System.getProperty("line.separtor"));
                        bw.write(keys.getrandom() + "\r\n");
                    }

                    // close the file after all the writing has taken place
                    bw.close ();
                } catch (IOException e){
                    System.out.println ("Error writing to file" + e.toString());
        }


            // declare a place to store each line as it is read in
            String str;
            String myArray[] = new String [500];
            int i = 0;

                try{
                    FileReader fr = new FileReader("keys.txt");
                    BufferedReader in = new BufferedReader(fr);

                    // read in the first line from the file
                    str = in.readLine();
                    while(str!=null){

                    myArray[i] = str;

                    str = in.readLine();
                    i++;
                    }

                    // close the file
                    in.close();
                    }catch(IOException e){
                    System.out.print(e.toString());
                    System.out.print("Non-Existant File");
        }
            int [] mySortedArray = new int [500];
            for(int k = 0; k<mySortedArray.length;k++){
                for(int j = 0;j<mySortedArray.length;j++){
                    if(mySortedArray[j] != k){
                        mySortedArray[k] = j;
                        System.out.print(mySortedArray[k]);
                    }

            }
        }
    }

}
}

score 3 · Accepted Answer

时间上，O(nlogn)是你最好的选择，通过：转换array成Set然后再转换回来：

Integer[] withDups = {1, 5, 2, 6, 3, 4, 2, 6, 3, 7};
Set<Integer> set = new TreeSet<Integer>(Arrays.asList(withDups));
Integer[] withoutDups = set.toArray(new Integer[set.size()]);
System.out.println(Arrays.toString(withoutDups));

输出：

[1, 2, 3, 4, 5, 6, 7]

集合（如数学中的集合）是一种不允许重复项的数据结构。

int[]如果您在来回转换时遇到问题Integer[]，请使用循环：

int[] intArray = ...;

Integer[] integerArray = new Integer[intArray.length];
int i = 0;
for (int value : oldArray) {
    integerArray[i++] = Integer.valueOf(value);
}

score 2 · Accepted Answer

如果您必须使用数组，那么最简单的方法是在添加之前检查该数字是否重复（通过循环遍历数组，检查新生成的随机数是否等于数组，如果是，则仅将其添加到数组的末尾。）

但是，当其他人建议在这种情况下使用 HashSet 时，其他人是对的，这可以通过设计防止重复（并且您可以免费进行此检查。）这并不复杂，基本用途可能类似于：

HashSet<Integer> set = new HashSet<>();
set.put(1);
set.put(3);
set.put(5);
set.put(3);
for(int num : set) {
    System.out.println(num);
}

...将打印 1、3 和 5。您最好阅读和研究 HashSet，因为它们是基本的、非常常用的数据结构（可能是列表中第二常用的结构。）

score 2 · Accepted Answer

删除欺骗的最快方法是使用LinkedHashSet. 因为这种类型的Set设计是通过散列直接跳转到值，它不会将两个值引用添加到同一个散列索引。

基本上，当您尝试添加相同的项目 n 次时，第一个之后的所有操作都会静默失败。你得到的是一个重复的自由数组。

public static int[] removeDuplicates(int[] arr) {
    Set<Integer> tmp = new LinkedHashSet<Integer>();
    for (Integer item : arr) {
        tmp.add(item);
    }
    int[] output = new int[tmp.size()];
    int i = 0;
    for (Integer item : tmp) {
        output[i++] = item;
    }
    return output;

};
mySortedArray = removeDuplicates(mySortedArray);

score 1 · Accepted Answer

排序和删除重复项，只使用数组，假设数组不为空（如果为空，正确的答案是返回另一个空数组）：

// sort the input
Arrays.sort(input);

// count unique elements in input
int unique=1;
for (int i=1; i<input.length; i++) {
   if (input[i] != input[i-1]) unique ++;
}

// create an output array of that size
int output[] = new int[unique];

// store unique copies of the (sorted) input elements
output[0] = input[0];
for (int i=1, j=1; i<input.length; i++) {
   if (input[i] != input[i-1]) output[j++] = input[i];
}

如果我们可以自由使用ArrayList，代码会更简洁：不需要第一次通过找出大小，第二次通过填充。除非有很多重复，否则这段代码比使用集合要快得多任何类型的，因为不涉及查找。

java - 如何删除重复项？

4 回答 4

Related

Reference