2

我正在考虑编写一个 Accumulo 迭代器来返回一个表百分位数的随机样本。

我将不胜感激任何建议。

纳克斯,

克里斯

4

2 回答 2

3

稍微扩展 Ben Tse 的答案以允许可变数量的选择:

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();
    private double percentToAllow;
    public static final String RATIO = "ratio";
    public static final String DEFAULT = "0.05";        

    @Override
    public void init(SortedKeyValueIterator<Key, Value> source, Map<String, String> options, IteratorEnvironment env) throws IOException {
        super.init(source, options, env);
        String option = options.containsKey(RATIO) ? options.get(RATIO) : DEFAULT;
        this.percentToAllow = Double.parseDouble(option);
    }

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < this.percentToAllow;
    }
}

然后,当您从代码中调用迭代器时,您会这样做

IteratorSetting itr = new IteratorSetting(15, "myIterator", RandomAcceptFilter.class);
itr.addOption(RandomAcceptFilter.RATIO, "0.20");
myScanner.addScanIterator(itr);

显然,您需要添加边界检查等,但您明白了。

于 2014-02-05T15:30:02.293 回答
2

您可以扩展 org.apache.accumulo.core.iterators.Filter 并随机接受 x% 的条目。以下迭代器将随机返回 5% 的条目。

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < .05;
    }
}
于 2014-02-05T01:49:39.257 回答