1

我正在运行以下 Rscript gdp.R

#!/usr/bin/env Rscript

Sys.getenv(c("HADOOP_HOME", "HADOOP_CMD", "HADOOP_STREAMING", "HADOOP_CONF_DIR"))

library(rmr2)
library(rhdfs)

setwd("/root/somnath/GDP_data/")
gdp <- read.csv("GDP.csv")
head(gdp)

hdfs.init()
gdp.values <- to.dfs(gdp)

aaplRevenue = 156508

gdp.map.fn <- function(k,v) {
key <- ifelse(v[4] < aaplRevenue, "less", "greater")
keyval(key, 1)
}

count.reduce.fn <- function(k,v) {
keyval(k, length(v))
}

count <- mapreduce(input=gdp.values, map=gdp.map.fn, reduce=count.reduce.fn)

from.dfs(count)

$val

并且无法克服 mapreduce 函数中的以下错误:

流式传输命令失败!mr 中的错误(map = map,reduce = reduce,combine = combine,vectorized.reduce,:hadoop 流式传输失败,错误代码为 1 调用:mapreduce -> mr

[root@kkws029 RHadoop_scripts]# Rscript gdp.R
Loading required package: methods
Loading required package: rJava

HADOOP_CMD=/opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop

Be sure to run hdfs.init()
  CountryCode Number   CountryName     GDP
1         USA      1  UnitedStates  168000
2         CHN      2         China 9240270
3         JPN      3         Japan 4901530
4         DEU      4       Germany 3634823
5         FRA      5        France 2734949
6         GBR      6 UnitedKingdom 2522261
14/07/08 16:57:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
14/07/08 16:57:02 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
14/07/08 16:57:02 INFO compress.CodecPool: Got brand-new compressor [.deflate]
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
packageJobJar: [/tmp/hadoop-root/hadoop-unjar4163972761639211537/] [] /tmp/streamjob5583656452134995821.jar tmpDir=null
14/07/08 16:57:04 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/07/08 16:57:13 INFO mapred.FileInputFormat: Total input paths to process : 1
14/07/08 16:57:20 INFO streaming.StreamJob: getLocalDirs(): [/tmp/hadoop-root/mapred/local]
14/07/08 16:57:20 INFO streaming.StreamJob: Running job: job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:20 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:20 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:21 INFO streaming.StreamJob:  map 0%  reduce 0%
14/07/08 16:57:26 INFO streaming.StreamJob:  map 50%  reduce 0%
14/07/08 16:57:49 INFO streaming.StreamJob:  map 100%  reduce 100%
14/07/08 16:57:49 INFO streaming.StreamJob: To kill this job, run:
14/07/08 16:57:49 INFO streaming.StreamJob: /opt/cloudera/parcels/CDH/lib/hadoop/bin/hadoop job  -Dmapred.job.tracker=kkws030.mara-ison.com:8021 -kill job_201407071547_0024
14/07/08 16:57:49 INFO streaming.StreamJob: Tracking URL: http://kkws030.mara-ison.com:50030/jobdetails.jsp?jobid=job_201407071547_0024
14/07/08 16:57:49 ERROR streaming.StreamJob: Job not successful. Error: NA
14/07/08 16:57:49 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  :
  hadoop streaming failed with error code 1
Calls: mapreduce -> mr
In addition: Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("hdfs.tempdir") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
3: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
4: In rmr.options("backend.parameters") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
Execution halted
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 24: /opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 140: cygpath: command not found
/opt/cloudera/parcels/CDH-4.3.0-1.cdh4.3.0.p0.22/lib/hadoop/libexec/../../hadoop-hdfs/bin/hdfs: line 172: exec: : not found
Warning messages:
1: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)
2: In rmr.options("backend") :
  Please set an HDFS temp directory with rmr.options(hdfs.tempdir = ...)

我的标准错误日志如下:

Loading objects:
  .Random.seed
  aaplRevenue
  count.reduce.fn
  gdp
  gdp.map.fn
  gdp.values
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Loading required package: rhdfs
Loading required package: methods
Loading required package: rJava
Error : .onLoad failed in loadNamespace() for 'rhdfs', details:
  call: fun(libname, pkgname)
error: Environment variable HADOOP_CMD must be set before loading package rhdfs
Warning in FUN(c("rhdfs", "rJava", "methods", "rmr2", "stats", "graphics",  :
  can't load rhdfs
Loading required package: rmr2
Loading objects:
  backend.parameters
  combine
  combine.file
  combine.line
  debug
  default.input.format
  default.output.format
  in.folder
  in.memory.combine
  input.format
  libs
  map
  map.file
  map.line
  out.folder
  output.format
  pkg.opts
  postamble
  preamble
  profile.nodes
  reduce
  reduce.file
  reduce.line
  rmr.global.env
  rmr.local.env
  save.env
  tempfile
  vectorized.reduce
  verbose
  work.dir
Warning in Ops.factor(left, right) : < not meaningful for factors
Error in split.default(a.list, ceiling(seq_along(a.list)/every.so.many),  : 
  first argument must be a vector
Calls: <Anonymous> ... <Anonymous> -> do.call -> mapply -> split -> split.default
No traceback available 
Error during wrapup: 
Execution halted
java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 1
    at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
    at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
    at org.apache.hadoop.streaming.PipeMapper.close(PipeMapper.java:136)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
    at org.apache.hadoop.streaming.PipeMapRunner.run(PipeMapRunner.java:34)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)

任何建议将不胜感激

谢谢-S

注意:我已经在 stderr 日志中引用了错误,它指定了系统变量 HADOOP_CMD 未找到。有没有办法可以将 HADOOP 系统环境变量导出到 R?另请注意,我在脚本的开头使用 Sys.getenv(c("HADOOP_HOME", ...)) ,但这似乎不像 stderr 建议的那样工作

请注意,我已经在 ~/.bash_profile 中为 HADOOP 环境变量添加了以下导出命令

# .bash_profile

# Get the aliases and functions
if [ -f ~/.bashrc ]; then
        . ~/.bashrc
fi

# User specific environment and startup programs

export JRI_PATH=/usr/lib64/R
export R_HOME=/usr/lib64/R
export R_SHARE_DIR=/usr/share/R
#export JRI_LD_PATH=${R_HOME}/library/rJava/jri:${R_HOME}/lib:${R_HOME}/bin
export LD_LIBRARY_PATH=${R_HOME}/library/rJava/jri

export HADOOP_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export HADOOP_CMD=${HADOOP_HOME}/bin/hadoop
export HADOOP_STREAMING=/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce/contrib/streaming/hadoop-streaming-2.0.0-mr1-cdh4.3.0.jar
export HADOOP_CONF_DIR=${HADOOP_HOME}/etc/hadoop

export JAVA_HOME=/var/jdk1.7.0_25
#PATH=$PATH:$HOME/bin:$JAVA_HOME/bin

PATH=$PATH:$R_HOME/bin:$JAVA_HOME/bin:$LD_LIBRARY_PATH:/opt/cloudera/parcels/CDH/lib/mahout:/opt/cloudera/parcels/CDH/lib/hadoop:/opt/cloudera/parcels/CDH/lib/hadoop-hdfs:/opt/cloudera/parcels/CDH/lib/hadoop-mapreduce:/opt/cloudera/parcels/CDH/lib/hadoop-0.20-mapreduce:/var/lib/storm-0.9.0-rc2/lib:$HADOOP_CMD:$HADOOP_STREAMING:$HADOOP_CONF_DIR

export PATH
4

2 回答 2

0

count <- mapreduce(input=gdp.values, map = gdp.map.fn, reduce = count.reduce.fn, #output = output, input.format = "text",
combine = T)

我正在处理相同的代码......但没有给我想要的答案......所以通过在地图减少的每个黑色上应用每件事来进行一些练习......最后我发现作者犯的错误......听到是修改代码..你可以运行它

于 2015-04-10T11:21:33.513 回答
0

流式传输命令失败!

Error in mr(map = map, reduce = reduce, combine = combine, vectorized.reduce,  : 
  hadoop streaming failed with error code 1
于 2014-09-29T10:39:47.393 回答