对于写入 Accumulo(数据摄取),运行 MapReduce 作业是有意义的,其中映射器输入是您在 HDFS 上的输入文件。您基本上会遵循 Accumulo 文档中的这个示例:
http://accumulo.apache.org/1.4/examples/mapred.html
(本文的第四部分提供了有关将数据摄取到 Accumulo 的技术的更多背景信息:http: //ieee-hpec.org/2012/index_htm_files/byun.pdf)
对于从 Accumulo 读取(数据查询),我不会使用 MapReduce。Accumulo/Zookeeper 将自动将您的查询分布在平板服务器上。如果您将行用作原子记录,请使用(或扩展)WholeRowIterator 并在您感兴趣的行范围内启动 Scanner(或 BatchScanner)。Scanner 将在您的平板电脑服务器上并行运行。您真的不想直接从 HDFS 或 MapReduce 访问 Accumulo 数据。
以下是一些示例代码,可帮助您入门:
//some of the classes you'll need (in no particular order)...
import org.apache.accumulo.core.client.Connector;
import org.apache.accumulo.core.client.Instance;
import org.apache.accumulo.core.client.ZooKeeperInstance;
import org.apache.accumulo.core.Constants;
import org.apache.accumulo.core.client.Scanner;
import org.apache.accumulo.core.client.IteratorSetting;
import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Range;
import org.apache.accumulo.core.data.Value;
import org.apache.hadoop.io.Text;
//Accumulo client code...
//Accumulo connection
Instance instance = new ZooKeeperInstance( /* put your installation info here */ );
Connector connector = instance.getConnector(username, password);
//setup a Scanner or BatchScanner
Scanner scanner = connector.createScanner(tableName, Constants.NO_AUTHS);
Range range = new Range(new Text("rowA"), new Text("rowB"));
scanner.setRange(range);
//use a WholeRowIterator to keep rows atomic
IteratorSetting itSettings = new IteratorSetting(1, WholeRowIterator.class);
scanner.addScanIterator(itSettings);
//now read some data!
for (Entry<Key, Value> entry : scanner) {
SortedMap<Key,Value> wholeRow = WholeRowIterator.decodeRow(entry.getKey(), entry.getValue());
//do something with your data!
}