c# - StreamInsight 性能问题

Question

我正在使用 StreamInsight 2.1 并遇到了意外的性能问题。

我有一个财务数据输入适配器，每秒输入 5,000 到 10,000 个事件。然后，我有大量针对该输入的查询。每个查询都连接到完全相同的直通查询，因此我有 1000 个查询使用完全相同的输入数据。

为了测试系统是否能够处理这个问题，我创建了 1000 个查询，这些查询除了传递（from d in fullStream select d）事件到仅释放事件的输出适配器之外什么都不做。

当我以这种方式运行 1,000 个查询时，系统无法跟上数据流。它越来越落后。如果我将其减少到 100 个查询，则系统可以完美运行。

我是否只是在使用 StreamInsight 时遇到了性能壁垒？它无法处理我正在构建的解决方案类型吗？还是我在这里做一些愚蠢的事情......任何帮助都会很棒，不知道还有什么可以让它更快。我需要它能够执行超过 1000 个查询，并且我需要运行比这更复杂的查询。

score 0 · Accepted Answer

这听起来确实像一个横向扩展问题。您已经确定可以在服务器上运行 100 个查询而不会出现任何问题。然后，在您对其他答案的评论中，您正在谈论数以万计的客户添加数以千计的查询。有了这么多客户，我怀疑您将有能力添加新服务器来满足这些客户群的需求。

所以通过分散负载来增加吞吐量，也许通过——我不知道——某种形式的分布式计算？

score 0 · Accepted Answer

I think you maybe having performance issues because of your current approach.

First off, let's cover the differences between the editions of StreamInsight. Standard edition has only 1 scheduler thread while Premium has one per core. The Evaluation edition is equivalent to Premium.

I think the way to fix this is to reduce the number of queries you have. If you are creating 1000 queries (each with their own instance of an output adapter) I can see where you are going to have issues. On a quad-core machine, you are going to have 4 scheduler threads trying to run 1000 queries.

Are your queries that are arranged "horizontally" doing the same thing? If so, see if you can consolidate them. For instance, if I needed to do a query like the "Price>5 Vol<2k" for 5 different stocks, I would write it in such a way that I can handle all 5 stocks in a standing query that sends all the results to 1 output adapter. If a client is "subscribing" to results from a query, that's something that can/should be handled by your output adapter. You could also turn results on and off for certain stocks by streaming in reference data.

Take a look at the sample below. "sourceStream" is going to be my raw stock data coming from the data source. "referenceStream" is going to be some configuration streamed in from a reference data source (i.e. SQL). The success or failure of the join will throttle the events that get passed on for further processing.

var myPrice5Vol2kSourceStream = from s in sourceStream
join r in referenceStream
on s.StockSymbol equals r.StockSymbol
select s;

score 0 · Accepted Answer

每个查询都需要一个线程来执行。您有 1000 个查询。所以你需要多少个线程？正确的。实际上，StreamInsight 会使用线程池来限制创建的线程数。所以......您将有有限数量的线程来执行您的查询。您最终会花费更多时间进行上下文切换，而不是实际执行查询。

我不明白为什么你甚至需要1000 个查询。我们已经构建了应用程序，可以从多个来源获取 100 多个传感器并一起分析它们……并获得超过 10 万个事件/秒。归根结底，是您的应用程序设计不佳，而不是 StreamInsight 的性能不佳，这才是导致问题的原因。

你真的需要花一些时间重新考虑你是如何处理这件事的。无论您如何切片，您当前的方法都会给您带来问题。并且......考虑一下......每个输入适配器是否创建自己的线程来监听入站和入队事件？您认为这增加了多少线程？

c# - StreamInsight 性能问题

3 回答 3

Related

Reference