9

Apache Apex看起来与Apache Storm相似。

  • 用户在两个平台上将应用程序/拓扑构建为有向无环图 (DAG)。Apex 使用操作符/流,Storm 使用 spouts/streams/bolts。
  • 它们都实时处理数据,而不是批处理。
  • 两者似乎都具有高吞吐量和低延迟

所以,乍一看,两者看起来很相似,我并没有完全理解差异。有人可以解释一下主要区别是什么吗?换句话说,我什么时候应该使用一个而不是另一个?

4

2 回答 2

3

架构存在根本差异,这使得每个平台在延迟、扩展和状态管理方面都非常不同。

在最基本的层面上,

  1. Apache Storm 使用记录确认来保证消息传递。
  2. Apache Apex 使用检查点来保证消息传递。

您可以在以下博客中了解更多差异,其中还包括其他主流处理平台。

https://databaseline.wordpress.com/2016/03/12/an-overview-of-apache-streaming-technologies/

于 2016-04-14T23:21:00.917 回答
3

架构和特点

+-------------------+---------------------------+---------------------+
|                   |           Storm           |         Apex        |
+-------------------+---------------------------+---------------------+
| Model             | Native Streaming          | Native Streaming    |
|                   | Micro batch (Trident      |                     |
+-------------------+---------------------------+---------------------+
| Language          | Java.                     | Java (Scala)        |
|                   | Ability to use non        |                     |
|                   | JVM languages support     |                     |
+-------------------+---------------------------+---------------------+
| API               | Compositional             | Compositional (DAG) |
|                   | Declarative (Trident)     | Declarative         |
|                   | Limited SQL               |                     |
|                   | support (Trident)         |                     |
+-------------------+---------------------------+---------------------+
| Locality          | Data Locality             | Advance Processing  |
+-------------------+---------------------------+---------------------+
| Latency           | Low                       | Very Low            |
|                   | High (Trident)            |                     |
+-------------------+---------------------------+---------------------+
| Throughput        | Limited in Ack mode       | Very high           |
+-------------------+---------------------------+---------------------+
| Scalibility       | Limited due to Ack        | Horizontal          |
+-------------------+---------------------------+---------------------+
| Partitioning      | Standard                  | Advance             |
|                   | Set parallelism at work,  | Parallel pipes,     |
|                   | executor and task level   | unifiers            |
+-------------------+---------------------------+---------------------+
| Connector Library | Limited (certification)   | Rich library of     |
|                   |                           | connectors in       |
|                   |                           | Apex Malhar         |
+-------------------+---------------------------+---------------------+

可操作性

+------------+--------------------------+---------------------+
|            |           Storm          |         Apex        |
+------------+--------------------------+---------------------+
| State      | External store           | Checkpointing       |
| Management | Limited checkpointing    | Local checkpointing |
|            | Difficult to exploit     |                     |
|            | local state              |                     |
+------------+--------------------------+---------------------+
| Recovery   | Cumbersome API to        | Incremental         |
|            | store and retrieve state | (buffer server)     |
|            | Require user code        |                     |
+------------+--------------------------+---------------------+
| Processing | At least once            |                     |
| Semantic   | Exactly once require     | At least once       |
|            | user code and affect     | End to end          |
|            | latency                  |                     |
|            |                          | exactly once        |
+------------+--------------------------+---------------------+
| Back       | Watermark on queue       | Automatic           |
| Pressure   | size for spout and bolt  | Buffer server       |
|            | Does not scale           | memory and disk     |
+------------+--------------------------+---------------------+
| Elasticity | Through CLI only         | Yes w/ full user    |
|            |                          | control             |
+------------+--------------------------+---------------------+
| Dynamic    | No                       | Yes                 |
| topology   |                          |                     |
+------------+--------------------------+---------------------+
| Security   | Kerberos                 | Kerberos, RBAC,     |
|            |                          | LDAP                |
+------------+--------------------------+---------------------+
| Multi      | Mesos, RAS - memory,     | YARN                |
| Tenancy    | CPU, YARN                | full isolation      |
+------------+--------------------------+---------------------+
| DevOps     | REST API                 | REST API            |
| Tools      | Basic UI                 | DataTorrent RTS     |
+------------+--------------------------+---------------------+

资料来源:网络研讨会:Apache Apex(下一代 Hadoop)与 Storm - 比较和迁移大纲https://www.youtube.com/watch?v=sPjyo2HfD_I

于 2017-02-18T02:56:12.927 回答