java - 使用一个文本文件搜索另一个文本文件

Question

所以我一直试图让它工作一段时间。让我先说我不是程序员。这更像是我最近开始的一种爱好。我一直在尝试让 2 个文本文件逐行搜索。即一个有一堆单词（大约 10 个，每行一个），另一个有更多（接近 500 个）也每行一个。我想让我的程序说出较小文本文件中的每个单词出现在较大文本文件中的次数。到目前为止我所拥有的是：

   import java.util.Scanner;  
   import java.io.File;  
   import java.util.regex.Pattern;  

   public class StringSearch 
   {  

   public static void main (String args[]) throws java.io.IOException
       {  
   int tot = 0;  
   Scanner scan = null;  
   Scanner scan2 = null;
   String str = null;
   String str2 = null;


   File file = new File("C:\\sample2.txt");
   File file2 = new File("C:\\sample3.txt");
   scan = new Scanner(file); 
   scan2 = new Scanner(file2);
      while (scan.hasNextLine()) 
      {
        str = scan.nextLine();
        tot = 0;
            while (scan2.hasNextLine())
            {
                str2 = scan2.nextLine();
                    if(str.equals(str2)) 
                    {
                 tot++;
                     }
             }
   System.out.println("The String = " + str + " and it occurred " + tot + " times");
        }

   }
   }

不知道为什么这不起作用。它会很好地读取第一个文本文件中的第一个单词并计算它在第二个文件中出现的次数，但随后它会停止并且不会在第一个文件中的第二个单词上移动。我希望这是有道理的。我认为第二个 while 循环有问题，但我不知道是什么。

因此，任何帮助将不胜感激。我希望能够让它发挥作用，并在未来继续进行更复杂的项目。总得从某个地方开始吧？

干杯伙计们

score 0 · Accepted Answer

您使用嵌套循环的方法将扫描第二个文件以查找第一个文件中的每个单词。这将是非常低效的。我建议将第一个文件加载到HashMap.

这不仅可以利用快速查找，还可以轻松更新出现次数。更不用说，您将只扫描第二个文件一次，并且您可能在第一个文件中存在的任何重复文件都将被自动忽略（因为结果是相同的）。

Map<String, Integer> wordCounts = new HashMap<String, Integer>();

Scanner scanner = new Scanner("one\nfive\nten");
while (scanner.hasNextLine()) {
    wordCounts.put(scanner.nextLine(), 0);
}
scanner.close();

scanner = new Scanner("one\n" + // 1 time
                      "two\nthree\nfour\n" +
                      "five\nfive\n" + // 2 times
                      "six\nseven\neight\nnine\n" +
                      "ten\nten\nten"); // 3 times

while (scanner.hasNextLine()) {
    String word = scanner.nextLine();
    Integer integer = wordCounts.get(word);
    if (integer != null) {
        wordCounts.put(word, ++integer);
    }
}
scanner.close();

for (String word : wordCounts.keySet()) {
    int count = wordCounts.get(word);
    if (count > 0) {
        System.out.println("'" + word + "' occurs " + count + " times.");
    }
}

输出：

'ten' occurs 3 times.
'five' occurs 2 times.
'one' occurs 1 times.

score 0 · Accepted Answer

创建一个缓冲阅读器并将文件读入以下地图<String, Integer>：

String filename = args[0];
BufferedReader words = new BufferedReader(new FileReader(FILENAME));
Map<String, Integer>m = new HashMap<String, Integer>();
for(String word: words.readLine()){
    if(word!=null && word.trim().length()>0) {
        m.add(String, 0);
    }
}

然后阅读单词列表并在每次找到时增加地图的值：

String filename = args[1];
BufferedReader listOfWords = new BufferedReader(new FileReader(FILENAME2));
for(String word: listOfWords.readLine()){
    if(word!=null && word.trim().length()>0) {
        if(m.get(word)!=null){
            m.add(word, m.get(word) + 1);
        }
    }
}

然后打印结果：

for(String word: map.keys()){
     if(map.get(word)>0){
         System.out.println("The String = " + word + " occurred " + map.get(word) + " times");
     }
}

score 0 · Accepted Answer

您遇到的问题是您在扫描仪中使用扫描仪。您当前嵌套扫描仪的方式，它会导致一个扫描仪完全读取其第一个单词的整个文本文件，但在第一次运行之后，它已经读取了整个文件并且永远不会返回 true for scan2.hasNextLine().

实现您想要的更好的方法是 remyabel 所说的。您应该创建一个数组，该数组将包含您的小文件中的所有单词，每次您遍历另一个文件中的单词时，这些单词都会被迭代。您还需要创建一些东西来跟踪每个单词被击中的次数，这样您就可以使用像哈希图这样的东西。

它看起来像这样：

Scanner scan = null;  
Scanner scan2 = null;
String str = null;
String str2 = null;


File file = new File("C:\\sample2.txt");
File file2 = new File("C:\\sample3.txt");
scan = new Scanner(file); 
scan2 = new Scanner(file2);
//Will contain all of your words to check against
ArrayList<String> dictionary = new ArrayList<String>();
//Contains the number of times each word is hit
HashMap<String,Integer> hits = new HashMap<String, Integer>();
while(scan.hasNextLine())
{
    str = scan.nextLine();
    dictionary.add(str);
    hits.put(str, 0);
}
  while (scan2.hasNextLine())
      {
          str2 = scan2.nextLine();
          for(String str: dictionary)
             {
              if(str.equals(str2)) 
                {
                   hits.put(str, hits.get(str) + 1);
                }
             }
      }
  for(String str: dictionary)
    {
       System.out.println("The String = " + str + " and it occurred " + hits.get(str) + " times");
    }
}

score 0 · Accepted Answer

这只是一个简单的逻辑问题..

在 System.out.println 下面添加以下语句

scan2 = 新扫描仪（文件 2）；

java - 使用一个文本文件搜索另一个文本文件

4 回答 4

Related

Reference