我是 R 新手,我对 MapReduce rmr2 有疑问。我有一个要读取的文件,在每一行中,都有一个日期和一些单词 (A,B,C..):
2016-05-10, A, B, C, A, R, E, F, E
2016-05-18, A, B, F, E, E
2016-06-01, A, B, K, T, T, E, G, E, A, N
2016-06-03, A, B, K, T, T, E, F, E, L, T
我想在输出中获得类似的东西:
2016-05: A 3
2016-05: E 4
2016-05: E 4
我用java实现做了同样的问题,现在我必须在R代码中做同样的事情,但我必须弄清楚如何做我的Reducer。有一种方法可以在我的 mapper 和 Reduce 代码中进行一些打印,因为在 Mapper 或 Reduce 中使用 print 命令,我在 RStudio 中得到一个错误
Sys.setenv(HADOOP_STREAMING = "/usr/local/hadoop/share/hadoop/tools/lib/hadoop-streaming-2.8.0.jar")
Sys.setenv(HADOOP_HOME = "/usr/local/hadoop/bin/hadoop")
Sys.setenv(HADOOP_CMD = "/usr/local/hadoop/bin/hadoop")
library(stringr)
library(rmr2)
library(stringi)
customMapper = function(k,v){
#words = unlist(strsplit(v,"\\s"))
#words = unlist(strsplit(v,","))
tmp = unlist(stri_split_fixed(v, pattern= ",",n = 2))
data = tmp[1]
onlyYearMonth = unlist(stri_split_fixed(data, pattern= "-",n = 3))
#print(words)
words = unlist(strsplit(tmp[2],","))
compositeK = paste(onlyYearMonth[1],"-",onlyYearMonth[2])
keyval(compositeK,words)
}
customReducer = function(k,v) {
#Here there are all the value with same date ???
elementsWithSameDate = unlist(v)
#defining something similar to java Map to use for counting elements in same date
# myMap
for(elWithSameDate in elementsWithSameDate) {
words = unlist(strsplit(elWithSameDate,","))
for(word in words) {
compositeNewK = paste(k,":",word)
# if myMap contains compositeNewK
# myMap (compositeNewK, 1 + myMap.getValue(compositeNewK))
# else
#myMap (compositeNewK, 1)
}
}
#here i want to transorm myMap in a String, containing the first 3 words with max occurrencies
#fromMapToString = convert(myMap)
keyval(k,fromMapToString)
}
wordcount = function(inputData,outputData=NULL){
mapreduce(input = inputData,output = outputData,input.format = "text",map = customMapper,reduce = customReducer)
}
hdfs.data = file.path("/user/hduser","folder2")
hdfs.out = file.path("/user/hduser","output1")
result = wordcount(hdfs.data,hdfs.out)