2

A month ago I tried to use F# agents to process and record Twitter StreamingAPI Data here. As a little exercise I am trying to transfer the code to Windows Azure.

So far I have two roles:

  • One worker role (Publisher) that puts messages (a message being the json of a tweet) to a queue.

  • One worker role (Processor) that reads messages from the queue, decodes the json and dumps the data into a cloud table.

Which leads to lots of questions:

  • Is it okay to think of a worker role as an agent ?
  • In practice the message can be larger than 8 KB so I am going to need to use a blob storage and pass as message the reference to the blob (or is there another way?), will that impact performance ?
  • Is it correct to say that if needed I can increase the number of instances of the Processor worker role, and the queue will magically be processed faster ?

Sorry for pounding all these questions, hope you don't mind,

Thanks a lot!

4

3 回答 3

3

有一个名为 Lokad.Cloud 的开源库,它可以透明地处理大消息,您可以在http://code.google.com/p/lokad-cloud/上查看它

于 2011-02-14T06:07:14.703 回答
1
可以将工人角色视为代理人吗?

当然是。

实际上,消息可能大于 8 KB,因此我将需要使用 blob 存储并将对 blob 的引用作为消息传递(或者有其他方法吗?),这会影响性能吗?

是的,使用您正在谈论的技术(将 JSON 保存到名称为“JSONMessage-1”的 blob 存储,然后将消息发送到内容为“JSONMessage-1”的队列)似乎是标准的方式在 Azure 中传递大于 8KB 的消息。当您对 Azure 存储进行 4 次调用而不是 2 次(1 次获取队列消息,1 次获取 blob 内容,1 次从队列中删除,1 次删除 blob)时,速度会更慢。它会明显变慢吗?可能不是。如果大量消息在 Base64 编码时小于 8KB(这是 StorageClient 库中的一个问题),那么您可以输入一些逻辑来确定如何发送它。

说如果需要我可以增加处理器工作者角色的实例数量是否正确,并且队列会神奇地被更快地处理?

只要您编写了工作角色以使其自包含并且实例不会相互干扰,那么是的,增加实例数将增加吞吐量。如果您的角色主要只是读取和写入存储,那么在增加实例计数之前,您可能会先通过多线程工作人员角色受益,这将节省资金。

于 2010-09-13T21:51:18.363 回答
0

Is it okay to think of a worker role as an agent ?

This is the perfect way to think of it. Imagine the workers at McDonald's. Each worker has certain tasks and they communicate with each other via messages (spoken).

In practice the message can be larger than 8 KB so I am going to need to use a blob storage and pass as message the reference to the blob (or is there another way?), will that impact performance?

As long as the message is immutable this is the best way to do it. Strings can be very large and thus are allocated to the heap. Since they are immutable passing around references is not an issue.

Is it correct to say that if needed I can increase the number of instances of the Processor worker role, and the queue will magically be processed faster?

You need to look at what your process is doing and decide if it is IO bound or CPU bound. Typically IO bound processes will have an increase in performance by adding more agents. If you are using the ThreadPool for your agents the work will be balanced quite well even for CPU bound processes but you will hit a limit. That being said don't be afraid to mess around with your architecture and MEASURE the results of each run. This is the best way to balance the amount of agents to use.

于 2010-09-13T18:34:31.150 回答