0

I'm using the American National Corpus to get the frequency of a word in English. The file structure is the following (it's a big file, ~8 MB):

Word1   Lemma1  Pos1    Frequency1
Word2   Lemma2  Pos2    Frequency2
Word3   Lemma3  Pos3    Frequency3

Here is my Java Class, but it's extremely slow... how can I change it to speed it up? (I want to find the Frequency related to a specific word)

    public static int frequency (String word) throws Exception {

    int ft=0;
    int fc=0;
    int exit=0;
    String frow;
    String[] separated = new String[10];
    String fwordC = "...";
    String fwordP = "...";

    Scanner fscan = new Scanner(new File("./ANC-all-lemma.data"));
    fscan.useDelimiter("\n");

    while(fscan.hasNext()){
        frow = fscan.next();
        separated = frow.split("    ");

        separated[0]= separated[0].replaceAll("(\\r|\\n)", "");
        fwordC = separated[0]; //set current word

        if (fwordC.equalsIgnoreCase(word)) {
            System.out.println("Found!!!");
            return(separated[3]);
        }
    }

}

Thanks a bunch!

4

1 回答 1

0

您绝对应该尝试使用BufferedReader阅读。Scanner 用于解析数据。BufferedReader 还有一个更大的缓冲区,大约 8 KB。

于 2013-07-13T14:56:04.027 回答