0

我正在尝试计算 URL 中每个字母的出现次数。

我找到了这段代码,这似乎可以解决问题,但是我希望能解释一些事情。

1)我使用的是挪威字母,所以我需要再添加三个字母。我将数组更改为 29,但它不起作用。

2)你能解释一下是什么%c%7d\n意思吗?

01  import java.io.FileReader;
02  import java.io.IOException;
03   
04   
05  public class FrequencyAnalysis {
06      public static void main(String[] args) throws IOException {
07      FileReader reader = new FileReader("PlainTextDocument.txt");
08   
09      System.out.println("Letter Frequency");
10   
11      int nextChar;
12      char ch;
13   
14      // Declare 26 char counting
15      int[] count = new int[26];
16   
17      //Loop through the file char
18      while ((nextChar = reader.read()) != -1) {
19          ch = Character.toLowerCase((char) nextChar);
20   
21          if (ch >= 'a' && ch <= 'z')
22          count[ch - 'a']++;
23      }
24   
25      // Print out
26      for (int i = 0; i < 26; i++) {
27          System.out.printf("%c%7d\n", i + 'A', count[i]);
28      }
29   
30      reader.close();
31      }
32  }
4

2 回答 2

2

你还没有说你是如何检查另外三个字母的。count仅仅增加数组的大小是不够的。您需要在此处考虑新字符 unicode 点值。这些值可能不再方便地按顺序排列。在这种情况下,您可以使用 aMap<Integer, Integer>来存储频率。

%c是 Unicode 字符的格式说明符。%7d是具有最左边空格填充的整数的说明符。\n是换行符

格式化程序 javadoc中记录

于 2013-09-12T21:31:06.107 回答
1

这里重要的一点是,当您想增加数组中出现的次数时,您会隐式使用字符的 ASCII 代码:

//Here, ch is a char.
ch = Character.toLowerCase((char) nextChar);

  //I hate *if statements* without curly brackets but this is off-topic :)
  if (ch >= 'a' && ch <= 'z')

    /*
     * but here, ch is implicitly cast to an integer.
     * The int value of a char is its ASCII code.
     * for example, the value of 'a' is 97.
     * So if ch is 'a', (ch - 'a') = (97 - 97) = 0.
     * That's why you are incrementing count[0] in this case.
     *
     * Now, what happens if ch ='ø'? What is the ASCII code of ø?
     * Probably something quite high so that ch-'a' is probably out of bounds
     * but the size of your array is 26+3 only.
     *
     * EDIT : after a quick test, 'ø' = 248.
     *
     * This would work if norvegian characters had ASCII code between 98 and 100.
     */
     count[ch - 'a']++;

您应该改用 a 重写算法HashMap<Character, Integer>

//HashMap<Character, nb occurences of this character>
HashMap<Character, Integer> map = new HashMap<Character, Integer>();

while ((nextChar = reader.read()) != -1) {
  if(!map.containsKey(nextChar)) {
    map.put(nextChar, 0);
  }
  map.put(nextChar, map.get(nextChar)+1);
}
于 2013-09-12T22:14:19.240 回答