3

我在这里有这个长字符串,并且在文本文件中有类似这样1000的行。我希望计算该文本文件中每个日期出现的频率。知道我该怎么做吗?

{"interaction":{"author":{"id":"53914918","link":"http:\/\/twitter.com\/53914918","name":"ITTIA","username":"s8c"},"content":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"1e0aac556a44a400e07497f48f024000","link":"http:\/\/twitter.com\/s8c\/statuses\/89957594197803008","schema":{"version":2},"source":"oauth:258901","type":"twitter","tags":["attretail"]},"language":{"confidence":100,"tag":"en"},"salience":{"content":{"sentiment":4}},"twitter":{"created_at":"Sun, 10 Jul 2011 08:22:16 +0100","id":"89957594197803008","mentions":["fubarista"],"source":"oauth:258901","text":"RT @fubarista: After thousands of years of wars I am not an optimist about peace. The US economy is totally reliant on war. It is the on ...","user":{"created_at":"Mon, 05 Jan 2009 14:01:11 +0000","geo_enabled":false,"id":53914918,"id_str":"53914918","lang":"en","location":"Mouth of the abyss","name":"ITTIA","screen_name":"s8c","time_zone":"London","url":"https:\/\/thepiratebay.se"}}}

4

5 回答 5

1

each date has some stable pattern, like \d\d (Jan|Feb|...) 20\d\d so you can extract those dates using regular expressions (Pattern class in Java) then you can use HashMap to increment value of some pair where key is the found date. Sorry for no code, however i hope that helps you :)

于 2013-05-28T06:33:58.217 回答
1

use classes RandomAccessFile and BufferedReader to read data in parts and you can use string parsing to count the frequency of each date...

于 2013-05-28T06:32:48.440 回答
0

I thing its a JSON string u should parse it instead of matching. see this example HERE

于 2013-05-28T06:35:19.137 回答
0

您的输入字符串是JSON格式,因此我建议您使用 JSON 解析器,这使得解析更容易,更重要的是健壮!虽然可能需要几分钟才能进入 JSON 解析,但这是值得的。

之后,解析“created_at”标签。创建一个 Map ,将您的日期作为键,将计数作为值,并编写如下内容:

int estimatedSize = 500; // best practice to avoid some HashMap resizing
Map<String, Integer> myMap = new HashMap<>(estimatedSize);
String[] dates = {}; // here comes your parsed data, draw it into the loop later
for (String nextDate : dates) {
    Integer oldCount = myMap.get(nextDate);
    if (oldCount == null) { // not in yet
        myMap.put(nextDate, Integer.valueOf(1));
    }
    else { // already in
        myMap.put(nextDate, Integer.valueOf(oldCount.intValue() + 1));
    }
}
于 2013-05-28T09:25:54.203 回答
0

将所需的字符串复制到 test.text 并将其放在 c 驱动器工作代码中,我使用了 Pattern 和 Matcher 类

在模式中,我给出了您所要求的日期模式,您可以在此处查看模式

"(周日|周一|周二|周三|周四|周五|周六)[,] \d\d​​ (一月|二月|三月|四月|五月|六月|七月|八月|九月|十月|十一月|十二月) \d \d\d\d"

检查代码

import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class Test{
public static void main(String[] args) throws Exception {

    FileReader fw=new FileReader("c:\\test.txt");
    BufferedReader br=new BufferedReader(fw);
    int i;
    String s="";
    do
    {

        i=br.read();
        if(i!=-1)
        s=s+(char)i;


    }while(i!=-1);

    System.out.println(s);

    Pattern p=Pattern.compile
            (
                    "(Sun|Mon|Tue|Wed|Thu|Fri|Sat)[,] \\d\\d (Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \\d\\d\\d\\d"
                );

    Matcher m=p.matcher(s);
    int count=0;
    while(m.find())
    {
        count++;
        System.out.println("Match number "+count);
        System.out.println(s.substring(m.start(), +m.end()));


    }
    }


}

非常好的描述在这里链接 1链接 2

于 2013-05-28T07:09:45.947 回答