我有 2 个数据源。一个包含 api 调用列表,另一个包含所有相关的身份验证事件。每个 Api 调用可以有多个 Auth 事件,我想找到以下 auth 事件:
a)包含与 Api 调用相同的“标识符”
b)在 Api 调用后一秒钟内发生
c)最接近 Api上述过滤后调用。
我曾计划在 foreach 循环中遍历每个 ApiCall 事件,然后在 authevents 上使用过滤器语句来找到正确的事件 - 但是,这似乎是不可能的(在 PIG 中的嵌套 FOREACH 中使用过滤器)
任何人都可以提出其他方法来实现这一目标。如果有帮助,这是我尝试使用的 Pig 脚本:
apiRequests = LOAD '/Documents/ApiRequests.txt' AS (api_fileName:chararray, api_requestTime:long, api_timeFromLog:chararray, api_call:chararray, api_leadString:chararray, api_xmlPayload:chararray, api_sourceIp:chararray, api_username:chararray, api_identifier:chararray);
authEvents = LOAD '/Documents/AuthEvents.txt' AS (auth_fileName:chararray, auth_requestTime:long, auth_timeFromLog:chararray, auth_call:chararray, auth_leadString:chararray, auth_xmlPayload:chararray, auth_sourceIp:chararray, auth_username:chararray, auth_identifier:chararray);
specificApiCall = FILTER apiRequests BY api_call == 'CSGetUser'; -- Get all events for this specific call
match = foreach specificApiCall { -- Now try to get the closest mathcing auth event
filtered1 = filter authEvents by auth_identifier == api_identifier; -- Only use auth events that have the same identifier (this will return several)
filtered2 = filter filtered1 by (auth_requestTime-api_requestTime)<1000; -- Further refine by usings auth events within a second on the api call's tiime
sorted = order filtered2 by auth_requestTime; -- Get the auth event that's closest to the api call
limited = limit sorted 1;
generate limited;
};
dump match;