这可能是由于 TableRow 格式包含列的字符串名称,这增加了大小。
考虑使用以下内容来创建对象的 PCollection 而不是 TableRows。这允许您直接读入与架构匹配的对象,这应该会稍微减少数据大小。
/**
* Reads from a BigQuery table or query and returns a {@link PCollection} with one element per
* each row of the table or query result, parsed from the BigQuery AVRO format using the specified
* function.
*
* <p>Each {@link SchemaAndRecord} contains a BigQuery {@link TableSchema} and a
* {@link GenericRecord} representing the row, indexed by column name. Here is a
* sample parse function that parses click events from a table.
*
* <pre>{@code
* class ClickEvent { long userId; String url; ... }
*
* p.apply(BigQueryIO.read(new SerializableFunction<SchemaAndRecord, ClickEvent>() {
* public ClickEvent apply(SchemaAndRecord record) {
* GenericRecord r = record.getRecord();
* return new ClickEvent((Long) r.get("userId"), (String) r.get("url"));
* }
* }).from("...");
* }</pre>
*/
public static <T> TypedRead<T> read(
SerializableFunction<SchemaAndRecord, T> parseFn) {