0

所以我需要解析这个输入文件,我似乎无法弄清楚如何去做。我试过使用scanner.Delimiter()但仍然有问题。任何人有什么想法如何正确地做到这一点?

这是输入文件中的一行:

200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] "GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x33200.52"HTTP/4611"http:52" //cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID% 3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album" "Opera/6.01 (Windows 98; U) [en]"

假设分为以下几个部分:

  1. address = 200.88.223.98

  2. date = 01/Feb/2007:04:02:22 -0500

  3. request = GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1

  4. status = 200

  5. bytes = 52464

  6. refer = http://cs.tcnj.edu/gallery/main.php? g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album

  7. agent = Opera/6.01 (Windows 98; U) [en]

这是我试图解析它的代码的一部分:

Scanner scan = new Scanner(input);
scan.useDelimiter("[-']+");
while (scan.hasNextLine()) 
{
    String address = scan.next();
    String date = scan.next();
    String request = scan.next();
    int status = scan.nextInt();
    int bytes = scan.nextInt();
    String refer = scan.next();
    String agent = scan.next(); 
}

显示以下错误:

Exception in thread "main" java.util.InputMismatchException      
  at java.util.Scanner.throwFor(Scanner.java:840) 
  at java.util.Scanner.next(Scanner.java:1461) 
  at java.util.Scanner.nextInt(Scanner.java:2091) 
  at java.util.Scanner.nextInt(Scanner.java:2050) 
  at Analyzer.start(Unknown Source) 
  at Driver.main(Unknown Source) 
Java Result: 1
4

1 回答 1

0

想想这个。按空格分割行并提取数据

String s = "200.88.223.98 - - [01/Feb/2007:04:02:22 -0500] \"GET /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852 HTTP/1.1\" 200 52464 \"http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album\" \"Opera/6.01 (Windows 98; U) [en]\"";

  String arr [] = s.split(" ");

  for(int i =0 ;i<arr.length;i++){
      System.out.println(i+" - "+arr[i]);
  }

输出是:

0 : 200.88.223.98
1 : -
2 : -
3 : [01/Feb/2007:04:02:22
4 : -0500]
5 : "GET
6 : /gallery/v/events/album02/contests/programmingContest05/?g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_fromNavId=x332be852
7 : HTTP/1.1"
8 : 200
9 : 52464
10 : "http://cs.tcnj.edu/gallery/main.php?g2_view=comment.AddComment&g2_itemId=664&g2_return=http%3A%2F%2Fcs.tcnj.edu%2Fgallery%2Fv%2Fevents%2Falbum02%2Fcontests%2FprogrammingContest05%2F%3Fg2_GALLERYSID%3D3be9666f9c07e16b7f33e2ea8acb8dd2&g2_GALLERYSID=3be9666f9c07e16b7f33e2ea8acb8dd2&g2_returnName=album"
  11 : "Opera/6.01
  12 : (Windows
  13 : 98;
  14 : U)
  15 : [en]"

所以第 0 个元素给出你的 ip,第 3 和第 4 个给出你的日期,第 6 和 7 个给出你的请求,这样你就可以提取你的数据。

于 2012-11-02T16:28:42.300 回答