java - 找到正确的正则表达式或方法将大文本文件拆分为块

Question

我想在使用 FileReader/BufferedReader 阅读时将大文本文件拆分为块。每个块将在我的下游代码中单独处理（即为每个块生成一个信息的 HashMap）。为了做到这一点，首先，我需要找到一种模式来定义一个块的外观。也许有人可以在这里帮助我。

这就是我的文件的一般结构的样子（“//”和“\\”不是文件的一部分）：

//

    Car: Oldtimer

     Ford Model T - 1908    
     Chevrolet Bel-Air - 1956
     Mercedes-Benz W 198 - 1954 

    Car: Compact Car

     Toyota iQ - 2008
     Volkswagen Polo V - 2009
     Audi A1 - 2010 

    Car: Special Car

     Bat Mobile - 1966
     Black Beauty - 1966    
     K.I.T.T. - 1982

   Total: 3

                       //

一个块应该以“汽车：ABC”开始，并在下一个“汽车：XYZ”条目之前结束。每个“汽车：ABC”条目前后总是有一个空白行。该文件以“总计：n”结尾。只是为了说明，我的示例文件的第一块将是：

//

Car: Oldtimer

 Ford Model T - 1908    
 Chevrolet Bel-Air - 1956
 Mercedes-Benz W 198 - 1954 

                           //

到目前为止，我尝试使用 REGEX 匹配来匹配“汽车：”标签之间的任何条目，Pattern.compile("Car:\\s(.*)Car:\\s")但是，这种方法会遗漏每个偶数块，例如以“汽车：紧凑型汽车”开头的块。也许您知道其他或更好的方法来为每个块赋予身份。提前致谢。

score 3 · Accepted Answer

将分隔符设置为 "Car:|Total:" RE 可能是一个解决方案；您在每个 .next() 调用中都有块。

Scanner sc = new Scanner(new File("file.txt"));
sc.useDelimiter("Car:|Total:");
while (sc.hasNext()) {
  System.out.println(sc.next());
}

score 0 · Accepted Answer

ACar是一个对象。所以为它开个课。然后，你知道一旦你看到了一个Car类型，你就知道它的类型是什么，然后使用一个简单String.split的来解析文件。

代码：

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.util.HashMap;
import java.util.HashSet;
import java.util.Map;
import java.util.Set;

public class Main {
  public static class Car {
    public String type;
    public String name;
    public int year;
    
    public Car(String type, String name, int year) {
      this.type = type;
      this.name = name;
      this.year = year;
    }
    
    @Override
    public String toString() {
      return String.format("Car: %s\tName: %s\tYear: %d", type, name, year);
    }
  }
  
  public static void main(String... args) throws Exception {
    BufferedReader br = new BufferedReader(new FileReader(new File("cars.txt")));
    
    Map<String, Set<Car>> carMap = makeCars(br);
    
    for(String key : carMap.keySet()) {
      System.out.println(key);
      for(Car car : carMap.get(key)) {
        System.out.println(car);
      }
    }
   }
  
  public static Map<String, Set<Car>> makeCars(BufferedReader br) throws Exception {
    String carType = null;
    Set<Car> carList = null;
    Map<String, Set<Car>> carMap = new HashMap<String, Set<Car>>();
    int linecounter = 0;
    
    while(br.ready()) {
      String line = br.readLine();
      linecounter++;
      
      if(line.contains("Car:")) {
        String[] typeSplit = line.split(": ");
        if (typeSplit.length != 2) System.err.format("Error reading file on line %d%n", linecounter);
        carType = typeSplit[1].trim();
        carList = new HashSet<Car>();
        carMap.put(carType, carList);
        continue;
      }
      String[] carSplit = line.split(" - ");
      if (carSplit.length != 2) continue;
      
      Car newCar = new Car(carType, carSplit[0].trim(), Integer.parseInt(carSplit[1].trim()));
      carList.add(newCar);
    }
    
    return carMap;
  }
}

score 0 · Accepted Answer

0

这对我有用：

(Car:).*?(?=Car:|Total:)

于 2013-04-29T17:57:27.543 回答

java - 找到正确的正则表达式或方法将大文本文件拆分为块

3 回答 3

代码：

Related

Reference