hadoop - Hadoop中的Combiners,Reducers和EcoSystemProject

Question

您如何看待本站提到的问题 4 的答案？

答案是对还是错

问题：4

In the standard word count MapReduce algorithm, why might using a combiner reduce theoverall Job running time?

A. Because combiners perform local aggregation of word counts, thereby allowing the mappers to process input data faster.
B. Because combinersperform local aggregation of word counts, thereby reducing the number of mappers that need to run.
C. Because combiners perform local aggregation of word counts, and then transfer that data toreducers without writing the intermediate data to disk.
D. Because combiners perform local aggregation of word counts, thereby reducing the number of key-value pairs that need to be snuff let across the network to the reducers.

Answer:A

和

问题：3

What happens in a MapReduce job when you set the number of reducers to one?

A. A single reducer gathers and processes all the output from all the mappers. The output iswritten in as many separate files as there are mappers.
B. A single reducer gathers andprocesses all the output from all the mappers. The output iswritten to a single file in HDFS.
C. Setting the number of reducers to one creates a processing bottleneck, and since the number of reducers as specified by the programmer is used as a reference value only, the MapReduceruntime provides a default setting for the number of reducers.
D. Setting the number of reducers to one is invalid, and an exception is thrown.
Answer:A

根据我对上述问题的理解

Question 4: D
Question 3: B

更新

You have user profile records in your OLTP database,that you want to join with weblogs you have already ingested into HDFS.How will you obtain these user records?
Options
A. HDFS commands
B. Pig load
C. Sqoop import
D. Hive
Answer:B

对于更新的问题，我的回答对B和C表示怀疑

编辑

正确答案：Sqoop。

score 5 · Accepted Answer

据我了解，这两个答案都是错误的。

我没有Combiner与Mapper. 第4 题的答案应该是D。

再次从实践经验中我发现输出文件的数量总是等于Reducers 的数量。所以问题 3的答案应该是B。使用时可能不是这种情况，MultipleOutputs但这并不常见。

最后，我认为 Apache 不会对 MapReduce 撒谎（确实会发生异常:)。这两个问题的答案都可以在他们的wiki 页面中找到。看一看。

顺便说一句，我喜欢“100% Pass-Guaranteed or Your Money Back!!!” 在您提供的链接上引用;-)

编辑
不确定更新部分中的问题，因为我对 Pig & Sqoop 知之甚少。但当然可以使用 Hive 通过在 HDFS 数据上创建外部表然后加入来实现相同的目的。

更新
在用户milk3422和所有者发表评论后，我进行了一些搜索，发现我假设 Hive 是最后一个问题的答案是错误的，因为涉及另一个 OLTP 数据库。正确答案应该是C，因为 Sqoop 旨在在 HDFS 和关系数据库之间传输数据。

score 0 · Accepted Answer

问题 4 和 3 的答案对我来说似乎是正确的。对于问题 4，这是非常合理的，因为在使用组合器时，地图输出被保存在集合 n 中，首先处理，然后缓冲区在满时被刷新。为了证明这一点，我将添加此链接：http ://wiki.apache.org/hadoop/HadoopMapReduce

在这里，它清楚地说明了为什么组合器会加快处理速度。

此外，我认为 q.3 的答案也是正确的，因为通常这是默认配置的基本配置。为了证明我将添加另一个信息链接：https ://www.inkling.com/read/hadoop-definitive-guide-tom-white-3rd/chapter-7/mapreduce-types

hadoop - Hadoop中的Combiners,Reducers和EcoSystemProject

2 回答 2

Related

Reference