我们正在尝试设计一个简单的程序,其目标是从文件中读取专利数据,并检查其他国家是否引用了该专利,这是来自'Hadoop in Action'
我们'chuck Lam'
正在尝试了解的教科书advanced map/reduce programming
。
我们设置的hadoop发行版是Local Node
,我们正在执行程序Windows environment
,使用cygwin
。
这是我们下载文件的 URL http://www.nber.org/patents/
:apat63_99.txt
和cite75_99.txt
.
我们'apat63_99.txt'
用作分布式缓存文件,并且'cite75_99.txt'
在input
文件夹中,我们从命令行参数传递。
问题是程序没有生成输出,我们看到的输出文件中没有数据。
我们已经尝试过映射器阶段和减速器阶段的输出,两者都是空白的。
这是我们为此任务开发的代码:
package com.sample.patent;
import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;
import java.util.Hashtable;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;
public class country_cite {
private static Hashtable<String, String> joinData
= new Hashtable<String, String>();
public static class Country_Citation_Class extends
Mapper<Text, Text, Text, Text> {
Path[] cacheFiles;
public void configure(JobConf conf) {
try {
cacheFiles = DistributedCache.getLocalCacheArchives(conf);
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public void map(Text key, Text value, Context context)
throws IOException, InterruptedException {
if (cacheFiles != null && cacheFiles.length > 0) {
String line;
String[] tokens;
BufferedReader joinReader = new BufferedReader(new FileReader(
cacheFiles[0].toString()));
try {
while ((line = joinReader.readLine()) != null) {
tokens = line.split(",");
joinData.put(tokens[0], tokens[4]);
}
} finally {
joinReader.close();
}
}
if (joinData.get(key) != null)
context.write(key, new Text(joinData.get(key)));
}
}
public static class MyReduceClass extends Reducer<Text, Text, Text, Text> {
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
String patent_country = joinData.get(key);
if (patent_country != null) {
for (Text val : values) {
String cited_country = joinData.get(val);
if (cited_country != null
&& !cited_country.equals(patent_country)) {
context.write(key, new Text(cited_country));
}
}
}
}
}
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
Configuration conf = new Configuration();
DistributedCache.addCacheFile(new Path(args[0]).toUri(),
conf);
String[] otherArgs = new GenericOptionsParser(conf, args)
.getRemainingArgs();
if (otherArgs.length != 3) {
System.err.println("Usage: country_cite <in> <out>");
System.exit(2);
}
Job job = new Job(conf,"country_cite");
job.setJarByClass(country_cite.class);
job.setMapperClass(Country_Citation_Class.class);
job.setInputFormatClass(org.apache.hadoop.mapreduce.lib.input.KeyValueTextInputFormat.class);
// job.setReducerClass(MyReduceClass.class);
job.setNumReduceTasks(0);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
FileInputFormat.addInputPath(job, new Path(otherArgs[1]));
FileOutputFormat.setOutputPath(job, new Path(otherArgs[2]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
该工具是Eclipse
,Hadoop's version
我们正在使用的是1.2.1
.
这些是运行作业的命令行参数:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
这是程序执行时生成的跟踪:
/cygdrive/c/cygwin64/usr/local/hadoop
$ bin/hadoop jar PatentCitation.jar country_cite apat63_99.txt input output
Patch for HADOOP-7682: Instantiating workaround file system
14/06/22 12:39:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging to 0700
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001 to 0700
14/06/22 12:39:21 WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
14/06/22 12:39:21 INFO input.FileInputFormat: Total input paths to process : 1
14/06/22 12:39:21 WARN snappy.LoadSnappy: Snappy native library not loaded
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.split": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.split to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.splitmetainfo": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.splitmetainfo to 0644
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "file:/tmp/hadoop-RaoSa/mapred/staging/RaoSa1277400315/.staging/job_local1277400315_0001/job.xml": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\staging\RaoSa1277400315\.staging\job_local1277400315_0001\job.xml to 0644
14/06/22 12:39:23 INFO filecache.TrackerDistributedCacheManager: Creating fileapat63_99.txt in /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806 with rwxr-xr-x
Patch for HADOOP-7682: Ignoring IOException setting persmission for path "/tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498-work-5016028422992714806": Failed to set permissions of path: \tmp\hadoop-RaoSa\mapred\local\archive\7067728792316735217_-679065598_1881640498-work-5016028422992714806 to 0755
14/06/22 12:40:06 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:08 INFO filecache.TrackerDistributedCacheManager: Cached apat63_99.txt as /tmp/hadoop-RaoSa/mapred/local/archive/7067728792316735217_-679065598_1881640498/fileapat63_99.txt
14/06/22 12:40:09 INFO mapred.JobClient: Running job: job_local1277400315_0001
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Waiting for map tasks
14/06/22 12:40:10 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:10 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:10 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:0+33554432
14/06/22 12:40:10 INFO mapred.JobClient: map 0% reduce 0%
14/06/22 12:40:15 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000000_0 is done. And is in the process of commiting
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task attempt_local1277400315_0001_m_000000_0 is allowed to commit now
14/06/22 12:40:15 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000000_0' to output
14/06/22 12:40:15 INFO mapred.LocalJobRunner:
14/06/22 12:40:15 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000000_0' done.
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000000_0
14/06/22 12:40:15 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:15 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:15 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:33554432+33554432
14/06/22 12:40:16 INFO mapred.JobClient: map 12% reduce 0%
14/06/22 12:40:21 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000001_0 is done. And is in the process of commiting
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task attempt_local1277400315_0001_m_000001_0 is allowed to commit now
14/06/22 12:40:21 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000001_0' to output
14/06/22 12:40:21 INFO mapred.LocalJobRunner:
14/06/22 12:40:21 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000001_0' done.
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000001_0
14/06/22 12:40:21 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:21 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:21 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:67108864+33554432
14/06/22 12:40:21 INFO mapred.JobClient: map 25% reduce 0%
14/06/22 12:40:26 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000002_0 is done. And is in the process of commiting
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task attempt_local1277400315_0001_m_000002_0 is allowed to commit now
14/06/22 12:40:26 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000002_0' to output
14/06/22 12:40:26 INFO mapred.LocalJobRunner:
14/06/22 12:40:26 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000002_0' done.
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000002_0
14/06/22 12:40:26 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:26 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:26 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:100663296+33554432
14/06/22 12:40:26 INFO mapred.JobClient: map 37% reduce 0%
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.JobClient: map 42% reduce 0%
14/06/22 12:40:29 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000003_0 is done. And is in the process of commiting
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task attempt_local1277400315_0001_m_000003_0 is allowed to commit now
14/06/22 12:40:29 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000003_0' to output
14/06/22 12:40:29 INFO mapred.LocalJobRunner:
14/06/22 12:40:29 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000003_0' done.
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000003_0
14/06/22 12:40:29 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:29 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:29 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:134217728+33554432
14/06/22 12:40:30 INFO mapred.JobClient: map 50% reduce 0%
14/06/22 12:40:30 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000004_0 is done. And is in the process of commiting
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task attempt_local1277400315_0001_m_000004_0 is allowed to commit now
14/06/22 12:40:30 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000004_0' to output
14/06/22 12:40:30 INFO mapred.LocalJobRunner:
14/06/22 12:40:30 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000004_0' done.
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000004_0
14/06/22 12:40:30 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:30 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:30 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:167772160+33554432
14/06/22 12:40:31 INFO mapred.JobClient: map 62% reduce 0%
14/06/22 12:40:31 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000005_0 is done. And is in the process of commiting
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task attempt_local1277400315_0001_m_000005_0 is allowed to commit now
14/06/22 12:40:31 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000005_0' to output
14/06/22 12:40:31 INFO mapred.LocalJobRunner:
14/06/22 12:40:31 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000005_0' done.
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000005_0
14/06/22 12:40:31 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:31 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:31 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:201326592+33554432
14/06/22 12:40:32 INFO mapred.JobClient: map 75% reduce 0%
14/06/22 12:40:32 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000006_0 is done. And is in the process of commiting
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task attempt_local1277400315_0001_m_000006_0 is allowed to commit now
14/06/22 12:40:32 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000006_0' to output
14/06/22 12:40:32 INFO mapred.LocalJobRunner:
14/06/22 12:40:32 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000006_0' done.
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000006_0
14/06/22 12:40:32 INFO mapred.LocalJobRunner: Starting task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:32 INFO mapred.Task: Using ResourceCalculatorPlugin : null
14/06/22 12:40:33 INFO mapred.MapTask: Processing split: file:/C:/cygwin64/usr/local/hadoop/input/cite75_99.txt:234881024+29194407
14/06/22 12:40:33 INFO mapred.JobClient: map 87% reduce 0%
14/06/22 12:40:35 INFO mapred.Task: Task:attempt_local1277400315_0001_m_000007_0 is done. And is in the process of commiting
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task attempt_local1277400315_0001_m_000007_0 is allowed to commit now
14/06/22 12:40:35 INFO output.FileOutputCommitter: Saved output of task 'attempt_local1277400315_0001_m_000007_0' to output
14/06/22 12:40:35 INFO mapred.LocalJobRunner:
14/06/22 12:40:35 INFO mapred.Task: Task 'attempt_local1277400315_0001_m_000007_0' done.
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Finishing task: attempt_local1277400315_0001_m_000007_0
14/06/22 12:40:35 INFO mapred.LocalJobRunner: Map task executor complete.
14/06/22 12:40:35 INFO mapred.JobClient: map 100% reduce 0%
14/06/22 12:40:35 INFO mapred.JobClient: Job complete: job_local1277400315_0001
14/06/22 12:40:35 INFO mapred.JobClient: Counters: 9
14/06/22 12:40:35 INFO mapred.JobClient: File Output Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Written=64
14/06/22 12:40:35 INFO mapred.JobClient: FileSystemCounters
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_READ=5009033659
14/06/22 12:40:35 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3820489832
14/06/22 12:40:35 INFO mapred.JobClient: File Input Format Counters
14/06/22 12:40:35 INFO mapred.JobClient: Bytes Read=264104103
14/06/22 12:40:35 INFO mapred.JobClient: Map-Reduce Framework
14/06/22 12:40:35 INFO mapred.JobClient: Map input records=16522439
14/06/22 12:40:35 INFO mapred.JobClient: Spilled Records=0
14/06/22 12:40:35 INFO mapred.JobClient: Total committed heap usage (bytes)=708313088
14/06/22 12:40:35 INFO mapred.JobClient: Map output records=0
14/06/22 12:40:35 INFO mapred.JobClient: SPLIT_RAW_BYTES=952
请让我们知道我们哪里出错了,如果我错过了任何重要信息,请告诉我。
谢谢并恭祝安康