我正在开发一个程序来从 .txt 文件中读取 Wikipedia 页面视图统计文件,到目前为止,我有一个读取该文件的加载方法,如下所示:
public void loadPVSF(String x) throws FileNotFoundException, IOException {
FileInputStream f = new FileInputStream(x); //obtains bytes from an input file
DataInputStream in = new DataInputStream(f); //reads primitive java types
BufferedReader br = new BufferedReader(new InputStreamReader(in));
while ((temp = br.readLine()) != null) {
tempArray = temp.split("\n"); //adds each line to an array tempArray
for (String st : tempArray) //puts each element of tempArray through String st
{
MainArray = st.split(" "); //adds each string after a " " to MainArray
for (String str : MainArray) {
if(linecounter<5){
linecounter++;
System.out.println(linecounter + ": " + str);
运行它,这是以下命令行输出的示例:
1: commons.m
2: Category:Gracie_Gold
3: 1
4: 7406
1: commons.m
2: Category:Grad_Maribor
3: 1
4: 7324
1: commons.m
2: Category:Grade_II*_listed_houses_in_Cheshire
3: 1
4: 7781
基本上每组四行是:
1 - Language/Project
2 - Article Title
3 - Number of Page views
4 - Size of the Page (bytes)
我需要知道如何正确分配这些读入行中的每一行。本质上,我最终需要的是一个哈希表,它将存储文章标题列表及其相应的查看次数,以便我可以确定哪个查看次数最多。
任何提示或建议将不胜感激。
输入 .txt 文件的示例:
nl Andreas_(apostel) 7 103145 nl Andreas_Baader 4 46158 nl Andreas_Bjelland 2 28288 nl Andreas_Burnier 2 11545 nl Andreas_Charles_van_Braam_Houckgeest 1 10373 nl Andreas_Eschbach 1 365 nl Andreas_Grassl 1 36