database-performance - ArangoDB 中的 UPDATE 出现奇怪的性能问题

Question

我正在创建一个使用 ArangoDB 作为数据存储的 Node.js 应用程序。基本上，我拥有的数据结构是两张表，一张用于管理所谓instances的entities. 我要做的是：

对于我拥有instances的每一个，集合中都有一个文档。instance
每当我将实体添加到entities集合中时，我还想跟踪属于特定实例的实体。
因此，每个instance文档都有一个用于的数组字段entities，我将实体的 ID 推送到该数组中。

以下代码显示了一般大纲：

// Connect to ArangoDB.
db = new Database(...);
db.useBasicAuth(user, password);

// Use the database.
await db.createDatabase(database);
db.useDatabase(database);

// Create the instance collection.
instanceCollection = db.collection(`instances-${uuid()}`);
await instanceCollection.create();

// Create the entities collection.
entityCollection = db.collection(`entities-${uuid()}`);
await entityCollection.create();

// Setup an instance.
instance = {
  id: uuid(),
  entities: []
};

// Create the instance in the database.
await db.query(aql`
  INSERT ${instance} INTO ${instanceCollection}
`);

// Add lots of entities.
for (let i = 0; i < scale; i++) {
  // Setup an entity.
  const entity = {
    id: uuid()
  };

  // Update the instance.
  instance.entities.push(entity);

  // Insert the entity in the database.
  await db.query(aql`
    INSERT ${entity} INTO ${entityCollection}
  `);

  // Update the instance in the database.
  await db.query(aql`
    FOR i IN ${instanceCollection}
      FILTER i.id == ${instance.id}
      UPDATE i WITH ${instance} IN ${instanceCollection} OPTIONS { mergeObjects: false }
  `);
}

现在的问题是，我添加的实体越多，这变得非常慢。它基本上呈指数增长，尽管我预计会线性增长：

Running benchmark 'add and update'
  100 Entities:   348.80ms [+0.00%]
 1000 Entities:  3113.55ms [-10.74%]
10000 Entities: 90180.18ms [+158.54%]

添加索引会产生效果，但不会改变整体问题的任何内容：

Running benchmark 'add and update with index'
  100 Entities:   194.30ms [+0.00%]
 1000 Entities:  2090.15ms [+7.57%]
10000 Entities: 89673.52ms [+361.52%]

问题可以追溯到UPDATE语句。如果你把它排除在外，只使用数据库的INSERT语句，事情就会线性扩展。因此，更新本身似乎有问题。但是，我不明白问题出在哪里。

这就是我想理解的：为什么UPDATE语句会随着时间的推移而显着变慢？我用错了吗？这是 ArangoDB 中的一个已知问题吗？……？

我不感兴趣的是讨论这种方法： Please take is as given。让我们关注UPDATE语句的性能。有任何想法吗？

更新

正如评论中所要求的，这里有一些关于系统设置的信息：

ArangoDB 3.4.6、3.6.2.1 和 3.7.0-alpha.2（都在 Docker、macOS 和 Linux 上运行）
单服务器设置
ArangoJS 6.14.0（我们在早期版本中也有这个，虽然我不能告诉确切的版本）

score 2 · Accepted Answer

发现问题

您是否尝试过解释或分析查询？

Arango 的详细计划说明非常出色。您可以explain使用内置的 Aardvark Web 管理界面或使用db._explain(query). 这是你的样子：

Execution plan:
 Id   NodeType                  Est.   Comment
  1   SingletonNode                1   * ROOT
  5   CalculationNode              1     - LET #5 = { "_key" : "123", "_id" : "collection/123", "_rev" : "_aQcjewq---", ...instance }   /* json expression */   /* const assignment */
  2   EnumerateCollectionNode      2     - FOR i IN collection   /* full collection scan, projections: `_key`, `id` */   FILTER (i.`id` == "1")   /* early pruning */
  6   UpdateNode                   0       - UPDATE i WITH #5 IN pickups 

Indexes used:
 By   Name      Type      Collection   Unique   Sparse   Selectivity   Fields       Ranges
  6   primary   primary   pickups      true     false       100.00 %   [ `_key` ]   i

问题

计划中的关键部分是- FOR i IN collection /*full collection scan 完整收集扫描将......缓慢。它应该随着收藏的大小线性增长。因此，对于您for的scale迭代循环，这绝对意味着集合大小呈指数增长。

解决方案

索引id应该会有所帮助，但我认为这取决于您如何创建索引。

使用_key而不是索引更改计划以显示primary

- FOR i IN pickups   /* primary index scan, index only, projections: `_key` */

这应该是恒定时间，因此对于您for的scale迭代循环，这应该意味着线性时间。

database-performance - ArangoDB 中的 UPDATE 出现奇怪的性能问题

更新

1 回答 1

发现问题

问题

解决方案

Related

Reference