I'm using the American National Corpus to get the frequency of a word in English. The file structure is the following (it's a big file, ~8 MB):
Word1 Lemma1 Pos1 Frequency1
Word2 Lemma2 Pos2 Frequency2
Word3 Lemma3 Pos3 Frequency3
Here is my Java Class, but it's extremely slow... how can I change it to speed it up? (I want to find the Frequency related to a specific word)
public static int frequency (String word) throws Exception {
int ft=0;
int fc=0;
int exit=0;
String frow;
String[] separated = new String[10];
String fwordC = "...";
String fwordP = "...";
Scanner fscan = new Scanner(new File("./ANC-all-lemma.data"));
fscan.useDelimiter("\n");
while(fscan.hasNext()){
frow = fscan.next();
separated = frow.split(" ");
separated[0]= separated[0].replaceAll("(\\r|\\n)", "");
fwordC = separated[0]; //set current word
if (fwordC.equalsIgnoreCase(word)) {
System.out.println("Found!!!");
return(separated[3]);
}
}
}
Thanks a bunch!