我有一个日志文件如下
Begin ... 12-07-2008 02:00:05 ----> record1
incidentID: inc001
description: blah blah blah
owner: abc
status: resolved
end .... 13-07-2008 02:00:05
Begin ... 12-07-2008 03:00:05 ----> record2
incidentID: inc002
description: blah blah blahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblahblah
owner: abc
status: resolved
end .... 13-07-2008 03:00:05
我想使用 mapreduce 来处理这个。我想提取事件 ID、状态以及事件花费的时间
如何处理这两个记录,因为它们具有可变的记录长度,以及如果输入拆分发生在记录结束之前怎么办。