6

I am working on a project, where I want to perform data acquisition, data processing and GUI visualization (using pyqt with pyqtgraph) all in Python. Each of the parts is in principle implemented, but the different parts are not well separated, which makes it difficult to benchmark and improve performance. So the question is:

Is there a good way to handle large amounts of data between different parts of a software?

I think of something like the following scenario:

  • Acquisition: get data from some device(s) and store them in some data container that can be accessed from somewhere else. (This part should be able to run without the processing and visualization part. This part is time critical, as I don't want to loose data points!)
  • Processing: take data from the data container, process it, and store the results in another data container. (Also this part should be able to run without the GUI and with a delay after the acquisition (e.g. process data that I recorded last week).)
  • GUI/visualization: Take acquired and processed data from container and visualize it.
  • save data: I want to be able to store/stream certain parts of the data to disk.

When I say "large amounts of data", I mean that I get arrays with approximately 2 million data points (16bit) per second that need to be processed and possibly also stored.

Is there any framework for Python that I can use to handle this large amount of data properly? Maybe in form of a data-server that I can connect to.

4

1 回答 1

2

多少数据?

换句话说,您是否获取了如此多的数据,以至于您无法在需要时将其全部保存在内存中?

例如,有些测量会产生如此多的数据,处理它们的唯一方法是事后:

  1. 获取数据到存储(通常是RAID0
  2. 后处理数据
  3. 分析结果
  4. 选择和归档子集

小数据

如果您的计算机系统能够跟上数据的生成,您可以在每个阶段之间使用单独的 Python队列。

大数据

如果您的测量创建的数据比您的系统可以消耗的多,那么您应该从定义数据重要性的几个层(可能只有两个)开始:

  • 无损——如果缺少一点,那么你不妨重新开始
  • 有损——如果点或一组数据丢失,没什么大不了的,只是等待下一次更新

一个类比可能是视频流......

  • 无损——档案的金牌大师
  • 有损——YouTube、Netflix、Hulu 可能会丢几帧,但您的体验不会受到很大影响

根据您的描述,采集处理必须是无损的,而GUI/可视化可能是有损的。

对于无损数据,您应该使用queues。对于有损数据,您可以使用deques

设计

无论您的数据容器如何,这里有三种不同的方式来连接您的阶段:

  1. 生产者-消费者:PC 模仿 FIFO——一个参与者生成数据,另一个参与者消费它。您可以建立一个生产者/消费者链来实现您的目标。
  2. 观察者:虽然 PC 通常是一对一的,但观察者模式也可以是一对多的。如果您需要多个参与者在一个源更改时做出反应,观察者模式可以为您提供这种能力。
  3. 调解员:调解员通常是多对多的。如果每个参与者都能引起其他参与者的反应,那么他们都可以通过中介进行协调。

似乎您只需要每个阶段之间的 1-1 关系,因此生产者-消费者设计看起来适合您的应用程序。

于 2015-01-24T02:21:38.970 回答