azure-eventhub - Azure Event Hub - Process data through scala script

Question

I have a requirement to send data present in a flat file from VM unix server to Azure event hub and to publish to azure blob storage.

I am able to do this using below code

val producer: EventHubProducerClient = new EventHubClientBuilder().connectionString(connectionString, eventHubName).buildProducerClient
val batch: EventDataBatch = producer.createBatch()

Reading the content of my file line by line and sending to tryAdd methos. 
for (line <- fileContent.getLines)
{
batch.tryAdd(new EventData(fileLine)) }

// send the batch of events to the event hub
//producer.send(batch)

// close the producer
producer.close()

My file has got about 1000 records. For it Event hub has created about 12 requests (Seems this is doing randomly).

I am just trying to understand on what basis event hub creates the requests and is there a way I can control it?

Any info around it would be very helpful

score 1 · Accepted Answer

对事件中心服务的每个发布操作都限制为一定数量的字节，由事件中心命名空间的层控制。可以在事件中心文档中查看每个层的配额。

您添加到批次的每个事件在tryAdd调用时都会根据该限制进行测量。如果事件不能安全地放入批次中，则tryAdd返回false。此时批次可能已满或可能剩余一些容量。任何剩余容量都不足以容纳已通过的特定事件的全部大小。

除了有效负载的大小之外，fileLine在这种情况下，诊断元数据和批处理打包也会产生一些大小开销，这些开销会影响事件的最终大小和批处理的容量。根据您的fileLine序列化传输后大小的一致性，您可能会看到大小一致的批次，或者可能会看到可以放入单个批次的事件数量有所变化。

所需的send调用次数与保存每个事件所需的批次数成正比fileLine。每个调用可能会发布一批send，因为该调用的流量受服务强制执行的字节大小限制。

我意识到您问题中的片段可能仅用于说明，但我确实想提一下，您忽略了来自的返回tryAdd，我强烈建议您不要这样做。如果批处理已满，tryAdd则调用不会失败。如果您忽略返回值，当false返回时，您可能没有意识到该事件未被接受到批处理中。这通常会导致数据丢失，因为事件不在批次中，但应用程序认为它在批次中并继续前进。

azure-eventhub - Azure Event Hub - Process data through scala script

1 回答 1

Related

Reference