java - 如何在文本文件中查找单词并打印使用数组显示的最常见单词？

Question

我无法弄清楚如何为程序找到最常见的单词和最常见的不区分大小写的单词。我有一个扫描仪可以读取文本文件和一个 while 循环，但仍然不知道如何实现我要查找的内容。我是否使用不同的字符串函数来读取和打印单词？

这是我现在的代码：

public class letters {
public static void main(String[] args) throws FileNotFoundException {
    FileInputStream fis = new FileInputStream("input.txt");
    Scanner scanner = new Scanner(fis);
    String word[] = new String[500];
    while (scanner.hasNextLine()) {
        String s = scanner.nextLine();
        for (int i = 0; i < s.length(); i++) {
            char ch = s.charAt(i);
             }

          }
      String []roll = s.split("\\s");
       for(int i=0;i<roll.length;i++){
           String lin = roll[i];
           //System.out.println(lin);
      }
 }

这就是我到目前为止所拥有的。我需要输出说：

   Word:
   6 roll

  Case-insensitive word:
  18 roll

这是我的输入文件：

@
roll tide roll!
Roll Tide Roll!
ROLL TIDE ROLL!
ROll tIDE ROll!
 roll  tide  roll! 
 Roll  Tide  Roll! 
 ROLL  TIDE  ROLL! 
   roll    tide    roll!   
    Roll Tide Roll  !   
@
65-43+21= 43
65.0-43.0+21.0= 43.0
 65 -43 +21 = 43 
 65.0 -43.0 +21.0 = 43.0 
 65 - 43 + 21 = 43 
 65.00 - 43.0 + 21.000 = +0043.0000 
    65   -  43  +   21  =   43

我只需要它找到出现次数最多的单词（这是最大的连续字母序列）（即滚动）并打印出它被定位的次数（即 6）。如果有人可以在这方面帮助我，那就太好了！谢谢

score 4 · Accepted Answer

考虑使用 aMap<String,Integer>表示单词，然后您可以实现它来计算单词，并且适用于任意数量的单词。请参阅地图文档。

像这样（需要修改不区分大小写）

public Map<String,Integer> words_count = new HashMap<String,Integer>();

//read your line (you will have to determine if this line should be split or is equations
//also just noticed that the trailing '!' would need to be removed

String[] words = line.split("\\s+");
for(int i=0;i<words.length;i++)
{
     String s = words[i];
     if(words_count.ketSet().contains(s))
     {
          Integer count = words_count.get(s) + 1;
          words_count.put(s, count)
     }
     else
          words_count.put(s, 1)

}

然后你有字符串中每个单词的出现次数，并获得最多的出现次数，例如

Integer frequency = null;
String mostFrequent = null;
for(String s : words_count.ketSet())
{
    Integer i = words_count.get(s);
    if(frequency == null)
         frequency = i;
    if(i > frequency)
    {
         frequency = i;
         mostFrequent = s;
    }
}

然后打印

System.out.println("The word "+ mostFrequent +" occurred "+ frequency +" times");

score 0 · Accepted Answer

尝试使用 HashMap 以获得更好的结果。您需要使用BufferedReaderandFilereader获取输入文件，如下所示：

FileReader text = new FileReader("file.txt");
BufferedReader textFile = new BufferedReader(text);

该Bufferedreader对象textfile需要作为参数传递给以下方法：

public HashMap<String, Integer> countWordFrequency(BufferedReader textFile) throws IOException
{
/*This method finds the frequency of words in a text file
 * and saves the word and its corresponding frequency in 
 * a HashMap.
 */
    HashMap<String, Integer> mapper = new HashMap<String, Integer>();
    StringBuffer multiLine = new StringBuffer("");
    String line = null;
    if(textFile.ready())
    {
        while((line = textFile.readLine()) != null)
        {
            multiLine.append(line);
            String[] words = line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
            for(String word : words)
            {
                if(!word.isEmpty())
                {
                    Integer freq = mapper.get(word);
                    if(freq == null)
                    {
                        mapper.put(word, 1);
                    }
                    else
                    {
                        mapper.put(word, freq+1);
                    }
                }
            }
        }
        textFile.close();
    }
    return mapper;
}

该行line.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");用于替换除字母之外的所有字符，它将所有单词都设为小写（这解决了您的不区分大小写的问题），然后拆分由空格分隔的单词。

/*This method finds the highest value in HashMap
 * and returns the same.
 */
public int maxFrequency(HashMap<String, Integer> mapper)
{
    int maxValue = Integer.MIN_VALUE;
    for(int value : mapper.values())
    {
        if(value > maxValue)
        {
            maxValue = value;
        }
    }
    return maxValue;
}

上面的代码在 hashmap 中返回最高的值。

/*This method prints the HashMap Key with a particular Value.
 */
public void printWithValue(HashMap<String, Integer> mapper, Integer value)
{
    for (Entry<String, Integer> entry : mapper.entrySet()) 
    {
        if (entry.getValue().equals(value)) 
        {
            System.out.println("Word : " + entry.getKey() + " \nFrequency : " + entry.getValue());
        }
    }
}

现在您可以像上面一样打印最常用的单词及其频率。

score 0 · Accepted Answer

首先将所有单词累积到 Map 中，如下所示：

...
String[] roll = s.split("\\s+");
for (final String word : roll) {
    Integer qty = words.get(word);
    if (qty == null) {
        qty = 1;
    } else {
        qty = qty + 1;
    }
    words.put(word, qty);
}
...

然后你需要找出哪个得分最高：

String bestWord;
int maxQty = 0;
for(final String word : words.keySet()) {
    if(words.get(word) > maxQty) {
        maxQty = words.get(word);
        bestWord = word;
    }
}
System.out.println("Word:");
System.out.println(Integer.toString(maxQty) + " " + bestWord);

最后，您需要将同一个单词的所有形式合并在一起：

Map<String, Integer> wordsNoCase = new HashMap<String, Integer>();
for(final String word : words.keySet()) {
    Integer qty = wordsNoCase.get(word.toLowerCase());
    if(qty == null) {
        qty = words.get(word);
    } else {
        qty += words.get(word);
    }
    wordsNoCase.put(word.toLowerCase(), qty);
}
words = wordsNoCase;

然后重新运行之前的代码片段，找到得分最高的单词。

score -1 · Accepted Answer

    /*  i have declared LinkedHashMap containing String as a key and occurrences as  a value.
     * Creating BufferedReader object
     * Reading the first line into currentLine
     * Declere while-loop & splitting the currentLine into words
     * iterated using for loop. Inside for loop, i have an if else statement
     * If word is present in Map increment it's count by 1 else set to 1 as value
     * Reading next line into currentLine
     */
    public static void main(String[] args) {

        Map<String, Integer> map = new LinkedHashMap<String, Integer>();

        BufferedReader reader = null;

        try {
            reader = new BufferedReader(new FileReader("F:\\chidanand\\javaIO\\Student.txt"));
              String currentLine = reader.readLine();
            while (currentLine!= null) {
                String[] input = currentLine.replaceAll("[^a-zA-Z]", " ").toLowerCase().split(" ");
                  for (int i = 0; i < input.length; i++) {
                    if (map.containsKey(input[i])) {
                        int count = map.get(input[i]);
                        map.put(input[i], count + 1);

                    } else {
                        map.put(input[i], 1);
                    }

                }
                   currentLine = reader.readLine();
            }

            String mostRepeatedWord = null;
             int count = 0;
                 for (Entry<String, Integer> m:map.entrySet())
                    {
                        if(m.getValue() > count)
                        {
                           mostRepeatedWord = m.getKey();

                            count = m.getValue();
                        }
                    }

                 System.out.println("The most repeated word in input file is : "+mostRepeatedWord);

                    System.out.println("Number Of Occurrences : "+count);

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                reader.close();
            } catch (IOException e) {
                e.printStackTrace();
            }

        }

    }
}

java - 如何在文本文件中查找单词并打印使用数组显示的最常见单词？

4 回答 4

Related

Reference