我采用的方法(也适用于流模式):
根据传入的数据设置窗口,可以忽略 End Instant,因为起始值用于设置分区:
public class TablePartitionWindowFn extends NonMergingWindowFn<Object, IntervalWindow> {
private IntervalWindow assignWindow(AssignContext context) {
TableRow source = (TableRow) context.element();
String dttm_str = (String) source.get("DTTM");
DateTimeFormatter formatter = DateTimeFormat.forPattern("yyyy-MM-dd").withZoneUTC();
Instant start_point = Instant.parse(dttm_str,formatter);
Instant end_point = start_point.withDurationAdded(1000, 1);
return new IntervalWindow(start_point, end_point);
};
@Override
public Coder<IntervalWindow> windowCoder() {
return IntervalWindow.getCoder();
}
@Override
public Collection<IntervalWindow> assignWindows(AssignContext c) throws Exception {
return Arrays.asList(assignWindow(c));
}
@Override
public boolean isCompatible(WindowFn<?, ?> other) {
return false;
}
@Override
public IntervalWindow getSideInputWindow(BoundedWindow window) {
if (window instanceof GlobalWindow) {
throw new IllegalArgumentException(
"Attempted to get side input window for GlobalWindow from non-global WindowFn");
}
return null;
}
动态设置表分区:
public class DayPartitionFunc implements SerializableFunction<BoundedWindow, String> {
String destination = "";
public DayPartitionFunc(String dataset, String table) {
this.destination = dataset + "." + table+ "$";
}
@Override
public String apply(BoundedWindow boundedWindow) {
// The cast below is safe because CalendarWindows.days(1) produces IntervalWindows.
String dayString = DateTimeFormat.forPattern("yyyyMMdd")
.withZone(DateTimeZone.UTC)
.print(((IntervalWindow) boundedWindow).start());
return destination + dayString;
}}
有没有更好的方法来达到同样的结果?