a 在 Eclipse 中有一个 mapreduce 程序。我想运行它..我从下面的 url 关注程序:
http://www.orzota.com/step-by-step-mapreduce-programming/
我做页面上说的所有事情并运行程序。但它向我显示错误并且我的工作失败..程序创建输出文件夹但它是空的..这是我的鳕鱼:
package org.orzota.bookx.mappers;
import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class MyHadoopMapper extends MapReduceBase implements Mapper <LongWritable, Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);
public void map(LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
String st = value.toString();
String[] bookdata = st.split("\";\"");
output.collect(new Text(bookdata[3]), one);
}
}
public class MyHadoopReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{
public void reduce(Text _key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
Text key = _key;
int freq = 0;
while (values.hasNext()){
IntWritable value = (IntWritable) values.next();
freq += value.get();
}
output.collect(key, new IntWritable(freq));
}
}
public class MyHadoopDriver {
public static void main(String[] args) {
JobClient client = new JobClient();
JobConf conf = new JobConf(
org.orzota.bookx.mappers.MyHadoopDriver.class);
conf.setJobName("BookCrossing1.0");
// TODO: specify output types
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);
// TODO: specify a mapper
conf.setMapperClass(org.orzota.bookx.mappers.MyHadoopMapper.class);
// TODO: specify a reducer
conf.setReducerClass(org.orzota.bookx.mappers.MyHadoopReducer.class);
conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);
FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));
client.setConf(conf);
try {
JobClient.runJob(conf);
} catch (Exception e) {
e.printStackTrace();
}
}
}
这是错误:
13/09/03 12:19:11 INFO util.ProcessTree: setsid exited with exit code 0
13/09/03 12:19:11 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3c2378
13/09/03 12:19:11 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclip/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:11 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:12 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:12 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:12 INFO mapred.JobClient: map 0% reduce 0%
13/09/03 12:19:13 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:14 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:14 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000000_0 is done. And is in the process of commiting
13/09/03 12:19:14 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:14 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000000_0' done.
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000000_0
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:14 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@15dd910
13/09/03 12:19:14 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:14 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:14 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:14 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:14 INFO mapred.JobClient: map 20% reduce 0%
13/09/03 12:19:15 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:15 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:15 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000001_0 is done. And is in the process of commiting
13/09/03 12:19:15 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:15 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000001_0' done.
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000002_0
13/09/03 12:19:15 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7c3885
13/09/03 12:19:15 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Book-Ratings.csv:0+30682276
13/09/03 12:19:15 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:15 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000003_0
13/09/03 12:19:16 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11d2572
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Users.csv:0+12284157
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:16 INFO mapred.Task: Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@164b09c
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.JobClient: map 40% reduce 0%
13/09/03 12:19:17 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:17 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:17 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000004_0 is done. And is in the process of commiting
13/09/03 12:19:17 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:17 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000004_0' done.
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Map task executor complete.
13/09/03 12:19:17 WARN mapred.LocalJobRunner: job_local1379860058_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:17)
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
13/09/03 12:19:17 INFO mapred.JobClient: map 60% reduce 0%
13/09/03 12:19:17 INFO mapred.JobClient: Job complete: job_local1379860058_0001
13/09/03 12:19:17 INFO mapred.JobClient: Counters: 16
13/09/03 12:19:17 INFO mapred.JobClient: File Input Format Counters
13/09/03 12:19:17 INFO mapred.JobClient: Bytes Read=77795631
13/09/03 12:19:17 INFO mapred.JobClient: FileSystemCounters
13/09/03 12:19:17 INFO mapred.JobClient: FILE_BYTES_READ=178484057
13/09/03 12:19:17 INFO mapred.JobClient: FILE_BYTES_WRITTEN=6981917
13/09/03 12:19:17 INFO mapred.JobClient: Map-Reduce Framework
13/09/03 12:19:17 INFO mapred.JobClient: Map output materialized bytes=2971356
13/09/03 12:19:17 INFO mapred.JobClient: Map input records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Spilled Records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Map output bytes=2428578
13/09/03 12:19:17 INFO mapred.JobClient: Total committed heap usage (bytes)=883687424
13/09/03 12:19:17 INFO mapred.JobClient: CPU time spent (ms)=0
13/09/03 12:19:17 INFO mapred.JobClient: Map input bytes=77787439
13/09/03 12:19:17 INFO mapred.JobClient: SPLIT_RAW_BYTES=306
13/09/03 12:19:17 INFO mapred.JobClient: Combine input records=0
13/09/03 12:19:17 INFO mapred.JobClient: Combine output records=0
13/09/03 12:19:17 INFO mapred.JobClient: Physical memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient: Virtual memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient: Map output records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Job Failed: NA java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.orzota.bookx.mappers.MyHadoopDriver.main(MyHadoopDriver.java:44)
我认为错误来自这一行:
output.collect(new Text(bookdata[3]), one);
但我不知道它说什么..有人可以帮帮我吗?谢谢..