amazon-kinesis - 与许多生产者的 Kinesis 分片

Question

我需要从大量数据源（例如手机）收集数据。例如，1000 部手机，每部手机每 20 分钟上传一批 1MB。我正在考虑使用带有单个分片的 Kinesis 流来摄取数据（总吞吐量约为 1MB/s）。单个手机直接访问 Kinesis API 是否有意义，或者我应该将自己的前端（例如，Web 服务器）放在前面？做出此决定时要牢记的主要限制/考虑因素是什么？

PS 使用 AWS IoT 基础设施的替代方案会更加昂贵

score 3 · Accepted Answer

You should have a web service that receives the data from your clients and will send them to Kinesis. This web server can use the Kinesis Producer Library (KPL) that offer best performance in terms of message rate delivery, timeout, policy retry and scalability. KPL can create many workers and can be tuned to optimize the message rate and not exceed the write limit imposed by Kinesys Shards.

Have every single client that sends data to kinesis could be an overkill in terms of performance, mainenantce costs and delivery. What happen if a client start to sends data at high rate traffic? A shard has a rate limit for write operation (up to 1,000 Record/s, data write rate up to 1 MB/s). An 'aggressive' client could generate eccessive traffic and make a shard not responding for a while, and block all the other clients that send records that should be stored in the same shard.

Moreover, think about the delivery cost over thousands of clients. What happen if you want change the stream name? or change the accessID/ key? Or just switch from kinesis to kafka? You have to manage the update of thousands of clients.

With a web server, you can hide the complexity and make any change transparent to the client. You can think to run the web service directly in EC2. Have the producer directly in AWS should reduce the network latency. Moreover, you can take advantage of all the scalability/resiliency/fault tolerance features offered by AWS.

amazon-kinesis - 与许多生产者的 Kinesis 分片

1 回答 1

Related

Reference