amazon-web-services - 如何使用 KCL 确定特定分区键的分片 ID？

Question

PutRequestAPI 使用分区键来确定记录的分片 ID。即使响应PutRequest包含分片 id，它也不可靠，因为分片是可拆分的，因此记录可能会移动到新的分片。我找不到在消费者端确定特定分区键的分片 ID 的方法。

似乎 AWS 将分区键映射到 128 位整数键，但文档中没有解释散列算法。我想要做的是处理具有特定分区键的 Kinesis 流中的记录，这意味着它们将位于特定分片中，这样我就可以在特定分片中获取数据但我找不到合适的 API在文档中。

score 7 · Accepted Answer

根据文档，使用的散列算法是 MD5。

MD5 散列函数用于将分区键映射到 128 位整数值，并使用分片的散列键范围将关联的数据记录映射到分片。

请参阅http://docs.aws.amazon.com/kinesis/latest/APIReference/API_PutRecord.html

在您的情况下，如果您知道要为其识别适当分片的分区键，则需要执行以下两件事：

计算分区键的 MD5 哈希
遍历分片列表，找到哈希键范围包含第一步计算的哈希值的分片。

这里有一些代码片段可以帮助您上路：

MD5 哈希为 BigInteger

String partitionKey = "YourKnownKey";
byte[] partitionBytes = partitionKey.getBytes("UTF-8");
byte[] hashBytes = MessageDigest.getInstance("MD5").digest(partitionBytes);
BigInteger biPartitionKey = new BigInteger(1, hashBytes);

查找分区键的分片

Shard shardYouAreAfter = null;
String streamName = "YourStreamName";
StreamDescription streamDesc =  client.describeStream(streamName).getStreamDescription();
List<Shard> shards =  streamDesc.getShards();
for(Shard shard : shards){
        BigInteger startingHashKey = new BigInteger(shard.getHashKeyRange().getStartingHashKey());
        BigInteger endingHashKey = new BigInteger(shard.getHashKeyRange().getEndingHashKey());
        if(startingHashKey.compareTo(biPartKey) <= 0 &&
                endingHashKey.compareTo(biPartKey) >=0) {
            shardYouAreAfter = shard;
            break;
        }
}

如果您一直在拆分和/或合并分片，事情可能会变得更加复杂。以上假设您只存在活动分片。

amazon-web-services - 如何使用 KCL 确定特定分区键的分片 ID？

1 回答 1

MD5 哈希为 BigInteger

查找分区键的分片

Related

Reference