2

我正在尝试加载 apache 日志,拆分为字段并将其保存到 hcatalog。

apache_log = LOAD 'httpd-www01-access.log.2014-02-09-*' USING TextLoader AS (line:chararray);

apache_row = FOREACH apache_log GENERATE FLATTEN (
REGEX_EXTRACT_ALL
(line,'^"(\\S+)" \\[(\\d{2}\\/\\w+\\/\\d{4}:\\d{2}:\\d{2}:\\d{2} \\+\\d{4}]) (\\S+) (\\S+) "(.+?)" (\\S+) (\\S+) "([^"]*)" "([^"]*)" "([^"]*)"'))
AS (ip: chararray, datetime: chararray, session_id: chararray, time_of_request:chararray, request: chararray, status: chararray, size: chararray, referer : chararray, cookie: chararray, user_agent: chararray);

如果我做:

a = sample apache_row 0.001;
dump a

有用。

 store apache_row into 'stage.apache_log' using org.apache.hcatalog.pig.HCatStorer();

没有。

错误:

2014-02-17 08:17:13,812 [main] WARN  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure.
2014-02-17 08:17:13,812 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_201402120751_0117 has failed! Stop running all dependent jobs
2014-02-17 08:17:13,812 [main] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete
2014-02-17 08:17:13,814 [main] ERROR org.apache.pig.tools.pigstats.PigStatsUtil - 1 map reduce job(s) failed!
2014-02-17 08:17:13,815 [main] INFO  org.apache.pig.tools.pigstats.SimplePigStats - Script Statistics:

HadoopVersion   PigVersion      UserId  StartedAt       FinishedAt      Features
1.2.0.1.3.2.0-111       0.11.1.1.3.2.0-111      pig     2014-02-17 08:16:24     2014-02-17 08:17:13     UNKNOWN

Failed!

Failed Jobs:
JobId   Alias   Feature Message Outputs
job_201402120751_0117   apache_log,apache_row   MAP_ONLY        Message: Job failed! Error - # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201402120751_0117_m_000000    stage.atg_apache_log,

Input(s):
Failed to read data from "hdfs://hadoop1:8020/user/pig/httpd-www01-access.log.2014-02-09-*"

Output(s):
Failed to produce result in "stage.apache_log"

Counters:
Total records written : 0
Total bytes written : 0
Spillable Memory Manager spill count : 0
Total bags proactively spilled: 0
Total records proactively spilled: 0

Job DAG:
job_201402120751_0117

我在哪里可以找到问题的任何细节?有一个信息,我可以在下面找到更多详细信息:
hadoop1:50030/jobdetails.jsp?jobid=job_201402120751_0117
但是当工作完成时它不起作用......

问候
帕维尔

4

0 回答 0