java - 使用 OpenCSV 从 Java 中的 CSV 数据中提取统计信息

Question

我正在使用 OpenCSV 解析 Java 中的 CSV 文件，并希望获得（例如）文件中每行的第三个元素状态为“UDP”或“TCP”的实例数。如何从我现在所在的位置选择我提到的特定数据并将其存储在单独的变量中？（即 - 如果在每行的第三个元素中包含的整个文件中有 20 个“UDP”实例，则显示计数为 20 的整数）到目前为止，我只能打印出文件的全部内容，我我解析如下：

try {
    CSVReader reader = new CSVReader(new FileReader(filePath), ',');

    // Reads the complete file into list of tokens.
    List<String[]> rowsAsTokens = null;
    try {
        rowsAsTokens = reader.readAll();
    } 
    catch (IOException e1) {
        // TODO Auto-generated catch block
        e1.printStackTrace();
    }

    Iterator<String[]> rowsAsTokensIt = rowsAsTokens.iterator();
    while (rowsAsTokensIt.hasNext()) {
        for (String token : rowsAsTokensIt.next()) {
            System.out.print(token + " ");
        }
        System.out.println();
    }
}

score 0 · Accepted Answer

您要问的是非常非常基本的 Java 或 C 或 C++，我建议您阅读有关 Java 的内容。

double sum = 0;
for(String[] row: rowsAsTokens) {
    // check the first row is HELLO
    if(row[0].equals("HELLO")) {
        // get the second row as a double
        sum += Double.parseDouble(row[1]);
    }
}
// print the grand total once at the end
System.out.println(sum);

score 0 · Accepted Answer

我会在HashMap中维护计数。我也会避免使用readAll()，这样您就不必重复数据两次。

只需声明地图

Map<Object, Integer> countMap = new HashMap<String, Integer>();

然后为您在第 3 列中遇到的每个值保留一个计数

String [] row;
while ((row = reader.readNext()) != null) {
  String value = row[2]; // value in 3rd column

  // default count to 0 if not in map
  Integer count = countMap.get(value) != null ? countMap.get(value) : 0;

  // increment count in map
  countMap.put(value, count + 1);

}

System.out.println("UDP count: " + countMap.get("UDP"));
System.out.println("TCP count: " + countMap.get("TCP"));

作为替代方案，您可以使用高度灵活/可配置的Super CSV 。上述解决方案适用于微不足道的场景（例如保留 1 列的计数），但如果您继续添加越来越多的功能，它很容易变得不可读。Super CSV 具有强大的单元处理器API，它可以自动进行转换和约束，从而大大简化这一过程。

例如，您可以编写一个自定义单元处理器，为它遇到的每个唯一列值维护一个计数。

package example;

import java.util.Map;

import org.supercsv.cellprocessor.CellProcessorAdaptor;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.util.CsvContext;

public class Counter extends CellProcessorAdaptor {

  private final Map<Object, Integer> countMap;

  public Counter(final Map<Object, Integer> countMap) {
    super();
    if (countMap == null){
      throw new IllegalArgumentException("countMap should not be null");
    }
    this.countMap = countMap;
  }

  public Counter(final Map<Object, Integer> countMap, final CellProcessor next) {
    super(next);
    if (countMap == null){
      throw new IllegalArgumentException("countMap should not be null");
    }
    this.countMap = countMap;
  }

  @Override
  public Object execute(Object value, CsvContext context) {

    validateInputNotNull(value, context);

    // get count from map (default to 0 if doesn't exist)
    Integer count = countMap.get(value) != null ? countMap.get(value) : 0;

    countMap.put(value, count + 1);

    return next.execute(value, context);
  }

}

然后使用第三列的处理器

package example;

import java.io.IOException;
import java.io.StringReader;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.supercsv.cellprocessor.ParseDate;
import org.supercsv.cellprocessor.constraint.NotNull;
import org.supercsv.cellprocessor.ift.CellProcessor;
import org.supercsv.io.CsvListReader;
import org.supercsv.io.ICsvListReader;
import org.supercsv.prefs.CsvPreference;

public class Counting {

  private static final String CSV = "id,time,protocol\n" + "1,01:23,UDP\n"
      + "2,02:34,TCP\n" + "3,03:45,TCP\n" + "4,04:56,UDP\n"
      + "5,05:01,TCP";

  public static void main(String[] args) throws IOException {

    final Map<Object, Integer> countMap = new HashMap<Object, Integer>();

    final CellProcessor[] processors = new CellProcessor[] { 
        new NotNull(), // id
        new ParseDate("hh:mm"), // time
        new NotNull(new Counter(countMap)) // protocol
    };

    ICsvListReader listReader = null;
    try {
      listReader = new CsvListReader(new StringReader(CSV),
          CsvPreference.STANDARD_PREFERENCE);

      listReader.getHeader(true);

      List<Object> row;
      while ((row = listReader.read(processors)) != null) {
        System.out.println(row);
      }

    } finally {
      listReader.close();
    }

    System.out.println("Protocol count = " + countMap);

  }

}

输出：

[1, Thu Jan 01 01:23:00 EST 1970, UDP]
[2, Thu Jan 01 02:34:00 EST 1970, TCP]
[3, Thu Jan 01 03:45:00 EST 1970, TCP]
[4, Thu Jan 01 04:56:00 EST 1970, UDP]
[5, Thu Jan 01 05:01:00 EST 1970, TCP]
Protocol count = {UDP=2, TCP=3}

java - 使用 OpenCSV 从 Java 中的 CSV 数据中提取统计信息

2 回答 2

Related

Reference