1

a 在 Eclipse 中有一个 mapreduce 程序。我想运行它..我从下面的 url 关注程序:

http://www.orzota.com/step-by-step-mapreduce-programming/

我做页面上说的所有事情并运行程序。但它向我显示错误并且我的工作失败..程序创建输出文件夹但它是空的..这是我的鳕鱼:

package org.orzota.bookx.mappers;

import java.io.IOException;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;

public class MyHadoopMapper extends MapReduceBase implements Mapper <LongWritable,  Text, Text, IntWritable>{
private final static IntWritable one = new IntWritable(1);

public void map(LongWritable _key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    String st = value.toString();
    String[] bookdata = st.split("\";\"");
    output.collect(new Text(bookdata[3]), one);
  }

   }

public class MyHadoopReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>{

public void reduce(Text _key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
    Text key = _key;
    int freq = 0;
    while (values.hasNext()){
        IntWritable value = (IntWritable) values.next();
        freq += value.get();
    }
    output.collect(key, new IntWritable(freq));
  }
  }


public class MyHadoopDriver {

public static void main(String[] args) {
    JobClient client = new JobClient();
    JobConf conf = new JobConf(
            org.orzota.bookx.mappers.MyHadoopDriver.class);
    conf.setJobName("BookCrossing1.0");


    // TODO: specify output types
    conf.setOutputKeyClass(Text.class);
    conf.setOutputValueClass(IntWritable.class);


    // TODO: specify a mapper
    conf.setMapperClass(org.orzota.bookx.mappers.MyHadoopMapper.class);

    // TODO: specify a reducer
    conf.setReducerClass(org.orzota.bookx.mappers.MyHadoopReducer.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);


    FileInputFormat.setInputPaths(conf, new Path(args[0]));
    FileOutputFormat.setOutputPath(conf, new Path(args[1]));


    client.setConf(conf);
    try {
        JobClient.runJob(conf);
    } catch (Exception e) {
        e.printStackTrace();
    }
  }

   }

这是错误:

13/09/03 12:19:11 INFO util.ProcessTree: setsid exited with exit code 0
13/09/03 12:19:11 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3c2378
13/09/03 12:19:11 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclip/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:11 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:12 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:12 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:12 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:12 INFO mapred.JobClient:  map 0% reduce 0%
13/09/03 12:19:13 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:14 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:14 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000000_0 is done. And is in the process of commiting
13/09/03 12:19:14 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:0+33554432
13/09/03 12:19:14 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000000_0' done.
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000000_0
13/09/03 12:19:14 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:14 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@15dd910
13/09/03 12:19:14 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:14 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:14 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:14 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:14 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:14 INFO mapred.JobClient:  map 20% reduce 0%
13/09/03 12:19:15 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:15 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:15 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000001_0 is done. And is in the process of commiting
13/09/03 12:19:15 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:33554432+33554432
13/09/03 12:19:15 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000001_0' done.
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000001_0
13/09/03 12:19:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000002_0
13/09/03 12:19:15 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7c3885
13/09/03 12:19:15 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Book-Ratings.csv:0+30682276
13/09/03 12:19:15 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:15 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000003_0
13/09/03 12:19:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@11d2572
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Users.csv:0+12284157
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.LocalJobRunner: Starting task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:16 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@164b09c
13/09/03 12:19:16 INFO mapred.MapTask: Processing split: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:16 INFO mapred.MapTask: numReduceTasks: 1
13/09/03 12:19:16 INFO mapred.MapTask: io.sort.mb = 100
13/09/03 12:19:16 INFO mapred.MapTask: data buffer = 79691776/99614720
13/09/03 12:19:16 INFO mapred.MapTask: record buffer = 262144/327680
13/09/03 12:19:16 INFO mapred.JobClient:  map 40% reduce 0%
13/09/03 12:19:17 INFO mapred.MapTask: Starting flush of map output
13/09/03 12:19:17 INFO mapred.MapTask: Finished spill 0
13/09/03 12:19:17 INFO mapred.Task: Task:attempt_local1379860058_0001_m_000004_0 is done. And is in the process of commiting
13/09/03 12:19:17 INFO mapred.LocalJobRunner: file:/home/ubuntu/Eclipse/Runs/input/BX-Books.csv:67108864+10678575
13/09/03 12:19:17 INFO mapred.Task: Task 'attempt_local1379860058_0001_m_000004_0' done.
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Finishing task: attempt_local1379860058_0001_m_000004_0
13/09/03 12:19:17 INFO mapred.LocalJobRunner: Map task executor complete.
13/09/03 12:19:17 WARN mapred.LocalJobRunner: job_local1379860058_0001
java.lang.Exception: java.lang.ArrayIndexOutOfBoundsException: 3
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 3
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:17)
at org.orzota.bookx.mappers.MyHadoopMapper.map(MyHadoopMapper.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
13/09/03 12:19:17 INFO mapred.JobClient:  map 60% reduce 0%
13/09/03 12:19:17 INFO mapred.JobClient: Job complete: job_local1379860058_0001
13/09/03 12:19:17 INFO mapred.JobClient: Counters: 16
13/09/03 12:19:17 INFO mapred.JobClient:   File Input Format Counters 
13/09/03 12:19:17 INFO mapred.JobClient:     Bytes Read=77795631
13/09/03 12:19:17 INFO mapred.JobClient:   FileSystemCounters
13/09/03 12:19:17 INFO mapred.JobClient:     FILE_BYTES_READ=178484057
13/09/03 12:19:17 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=6981917
13/09/03 12:19:17 INFO mapred.JobClient:   Map-Reduce Framework
13/09/03 12:19:17 INFO mapred.JobClient:     Map output materialized bytes=2971356
13/09/03 12:19:17 INFO mapred.JobClient:     Map input records=271380
13/09/03 12:19:17 INFO mapred.JobClient:     Spilled Records=271380
13/09/03 12:19:17 INFO mapred.JobClient:     Map output bytes=2428578
13/09/03 12:19:17 INFO mapred.JobClient:     Total committed heap usage (bytes)=883687424
13/09/03 12:19:17 INFO mapred.JobClient:     CPU time spent (ms)=0
13/09/03 12:19:17 INFO mapred.JobClient:     Map input bytes=77787439
13/09/03 12:19:17 INFO mapred.JobClient:     SPLIT_RAW_BYTES=306
13/09/03 12:19:17 INFO mapred.JobClient:     Combine input records=0
13/09/03 12:19:17 INFO mapred.JobClient:     Combine output records=0
13/09/03 12:19:17 INFO mapred.JobClient:     Physical memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=0
13/09/03 12:19:17 INFO mapred.JobClient:     Map output records=271380
13/09/03 12:19:17 INFO mapred.JobClient: Job Failed: NA  java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1357)
at org.orzota.bookx.mappers.MyHadoopDriver.main(MyHadoopDriver.java:44)

我认为错误来自这一行:

  output.collect(new Text(bookdata[3]), one);

但我不知道它说什么..有人可以帮帮我吗?谢谢..

4

2 回答 2

3

我检查了您提供的链接。我认为你能做的最好的事情就是对你的输入键值对(在你的输入数据集的一个小子集上)做一个 system.out.println() ,只是为了确定。如果您使用的输入文件包含“\n”,那么 csv 记录可能会被分成 2 个单独的记录,其中包含少于 8 个子字符串。ArrayOutOfBoundsException似乎指向了这个方向。我不认为这是一个 mapreduce 错误。您还可以将以下行添加到您的地图功能:

if (bookdata.length!=8){
  System.out.println("Warning, bad entry");
  return; 
}

如果模拟幸存下来,您已经隔离了问题..

于 2013-09-03T11:12:07.340 回答
1

您正在阅读的输入文件很可能有一行没有 4 列。

因此,当您将行拆分为数组时,

String[] bookdata = st.split("\";\"");

你想访问第四个元素

output.collect(new Text(bookdata[3]), one);

它失败。

于 2013-09-03T08:12:40.757 回答