我已经使用我的两台笔记本电脑创建了一个多节点 hadoop 集群,并已成功对其进行了测试。之后,我在 hadoop 环境中安装了 RHadoop。安装了所有必要的软件包并设置了路径变量。
然后,尝试运行一个 wordcount 示例,如下所示:
map <- function(k,lines) {
words.list <- strsplit(lines, "\\s")
words <- unlist(words.list)
return(keyval(words, 1))
}
reduce <- function(word, counts) {
keyval(word, sum(counts))
}
wordcount <- function(input, output = NULL) {
mapreduce(input = input, output = output, input.format = "text", map = map, reduce = reduce)
}
hdfs.root <- "wordcount"
hdfs.data <- file.path(hdfs.root, "data")
hdfs.out <- file.path(hdfs.root, "out")
out <- wordcount(hdfs.data, hdfs.out)
我收到以下错误:
15/05/24 21:09:20 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
15/05/24 21:09:20 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
15/05/24 21:09:20 INFO jvm.JvmMetrics: Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized
15/05/24 21:09:21 INFO mapreduce.JobSubmitter: Cleaning up the staging area file:/app/hadoop/tmp/mapred/staging/master91618435/.staging/job_local91618435_0001
15/05/24 21:09:21 ERROR streaming.StreamJob: Error Launching job : No such file or directory
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce, :
hadoop streaming failed with error code 5
Called from: mapreduce(input = input, output = output, input.format = "text",
map = map, reduce = reduce)
在运行此之前,我创建了两个 hdfs 文件夹wordcount/data
并wordcount/out
使用命令行将一些文本上传到第一个。
另一个问题是:我的计算机上有两个用户:hduser
和master
. 第一个是为 hadoop 安装创建的。我想当我打开 R/RStudio 时,我将它运行为master
,并且因为 hadoop 是为创建的,hduser
所以存在一些导致此错误的权限问题。正如人们可以在输出的 4. 行中看到的那样,系统试图找到master91618435
,我怀疑应该是hduser...
。
我的问题是,我怎样才能摆脱这个错误?
PS:这是一个类似的问题,但对我没有任何有用的答案