2

我开始对在 Guava 缓存中按值查找键的方法进行基准测试,我注意到与并发级别相关的奇怪行为。我不确定这是错误还是未定义的行为,甚至可能是预期但未指定。

我的基准测试应该在 Guava Cache 中按值查找键,我知道这不是通常的事情。

这是我完整的基准课程:

@Fork(4)
@State(Scope.Benchmark)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MICROSECONDS)
@Warmup(iterations = 1, time = 100, timeUnit = TimeUnit.MILLISECONDS)
@Measurement(iterations = 4, time = 100, timeUnit = TimeUnit.MILLISECONDS)
public class ValueByKey {

    private Long counter = 0L;

    private final int MAX = 2500;

    private final LoadingCache<String, Long> stringToLong = CacheBuilder.newBuilder()
        .concurrencyLevel(1)
        .maximumSize(MAX + 5)
        .build(new CacheLoader<String, Long>() {
            public Long load(String mString) {
                return generateIdByString(mString);
            }
        });

    private final Map<String, Long> mHashMap = new Hashtable<>(MAX);
    private final Map<String, Long> concurrentHashMap = new ConcurrentHashMap<>(MAX);

    @Setup(Level.Trial)
    public void setup() {
        // Populate guava cache
        for(int i = 0; i <= MAX; i++) {
            try {
                stringToLong.get(UUID.randomUUID().toString());
            } catch (ExecutionException e) {
                e.printStackTrace();
                System.exit(1);
            }
        }
    }

    @Benchmark
    public String stringToIdByIteration() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : stringToLong.asMap().entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

    @Benchmark
    public String stringToIdByIterationHashTable() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : mHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("Returning null as value not found " + randomNum);
        return null;
    }

@Benchmark
    public String stringToIdByIterationConcurrentHashMap() {
        Long randomNum = ThreadLocalRandom.current().nextLong(1L, MAX);

        for(Map.Entry<String, Long> entry : concurrentHashMap.entrySet()) {
            if(Objects.equals(randomNum, entry.getValue())) {
                return entry.getKey();
            }
        }
        System.out.println("concurrentHashMap Returning null as value not found " + randomNum);
        return null;
    }

    private Long generateIdByString(final String mString) {
        mHashMap.put(mString, counter++);
        concurrentHashMap.put(mString, counter);
        return counter;
    }

}

我注意到的是,当我更改.concurrencyLevel(1)为不同于 1 的数字时,我开始丢失数据。以下输出来自并发级别 4:

Iteration   1: Returning null as value not found 107
Returning null as value not found 43
Returning null as value not found 20
Returning null as value not found 77
Returning null as value not found 127
Returning null as value not found 35
Returning null as value not found 83
Returning null as value not found 43
Returning null as value not found 127
Returning null as value not found 107
Returning null as value not found 83
Returning null as value not found 82
Returning null as value not found 40
Returning null as value not found 58
Returning null as value not found 127
Returning null as value not found 114
Returning null as value not found 119
Returning null as value not found 43
Returning null as value not found 114
Returning null as value not found 18
Returning null as value not found 58
66.778 us/op

我注意到在使用HashMapHashTable使用相同的代码时我从未丢失任何数据,它的性能也更好:

Benchmark Mode Cnt Score Error Units ValueByKey.stringToIdByIteration avgt 16 58.637 ± 15.094 us/op ValueByKey.stringToIdByIterationConcurrentHashMap avgt 16 16.148 ± 2.046 us/op ValueByKey.stringToIdByIterationHashTable avgt 16 11.705 ± 1.095 us/op

我的代码是错误的还是 Guava 无法正确处理并发级别高于 1 的分区 HashTable?

  • 并发级别选项用于在内部对表进行分区,以便可以在没有争用的情况下进行更新。
  • 理想的设置是一次可能访问缓存的最大线程数。
4

1 回答 1

3

没有缓存可以保证始终命中缓存

缓存中数据的存在/不存在由驱逐策略(以及首先将数据加载到缓存中)确定。

由于您已经使用CacheBuilder.maximumSize(MAX + 5)了缓存,因此将使用基于大小的驱逐,并将在达到预设的最大大小之前开始删除元素。

将并发级别设置为4时,Guava Cache 可以保证安全,并将驱逐阈值设置得更低,这是有道理的,因为元素可以在被驱逐时不断到达。

这就是为什么您的元素开始“消失”的原因。

要对此进行测试,请让您的类实现RemovalListener接口:

public class ValueByKey implements RemovalListener<String, Long> { 
    //...
    @Override
    public void onRemoval(RemovalNotification<String, Long> notification) {
        System.out.println("removed: " + notification.getKey() + " -> " + notification.getValue());
    }
    //...
}

...在运行测试时,您会注意到匹配缺失值的驱逐:

# Warmup Iteration   1: 
removed: 110c0a73-1dc3-40ee-8909-969e6dee0ea0 -> 3
removed: 6417015a-f154-467f-b3bf-3b95831ac5b7 -> 6
removed: 5bc206f9-67ec-49a2-8471-b386ffc03988 -> 14
removed: 3c0a33e1-1fe1-4e42-b262-bf6a3e8c53f7 -> 21
Returning null as value not found 14
Returning null as value not found 14
Returning null as value not found 3
64.778 us/op
Iteration   1: 
Returning null as value not found 21
Returning null as value not found 21
Returning null as value not found 6
37.719 us/op
[...]

我可以想象驱逐的阈值计算可能很复杂,但是在我的机器上将最大大小提高 5% ( )在运行基准测试时CacheBuilder.maximumSize(Math.round(MAX * 1.05))阻止了所有驱逐。

于 2018-01-26T12:27:23.667 回答