opa - Opa：如何高效地读/写大量记录

Question

问题

我需要读取和写入大量记录（大约 1000 条）。下面的示例需要长达 20 分钟的时间来写入 1000 条记录，并且需要长达 12 秒的时间来读取它们（在进行“读取”测试时，我注释掉了该行do create_notes()）。

来源

这是一个完整的示例（构建和运行）。它只将输出打印到控制台（而不是浏览器）。

type User.t =
  { id : int
  ; notes : list(int) // a list of note ids
  }

type Note.t =
  { id : int
  ; uid : int // id of the user this note belongs to
  ; content : string
  }

db /user : intmap(User.t)
db /note : intmap(Note.t)

get_notes(uid:int) : list(Note.t) =
  noteids = /user[uid]/notes
  List.fold(
    (h,acc -> 
      match ?/note[h] with
      | {none} -> acc
      | {some = note} -> [note|acc]
    ), noteids, [])

create_user() =
  match ?/user[0] with
  | {none} -> /user[0] <- {id=0 notes=[]}
  | _ -> void

create_note() =
  key = Db.fresh_key(@/note)
  do /note[key] <- {id = key uid = 0 content = "note"}
  noteids = /user[0]/notes
  /user[0]/notes <- [key|noteids]

create_notes() =
  repeat(1000, create_note)

page() =
  do create_user()
  do create_notes()
  do Debug.alert("{get_notes(0)}")
  <>Notes</>

server = one_page_server("Notes", page)

还有一件事

我还尝试通过交易获取笔记（如下所示）。看起来 Db.transaction 可能是正确的工具，但我还没有找到成功使用它的方法。我发现这种get_notes_via_transaction方法与get_notes.

get_notes_via_transaction(uid:int) : list(Note.t) =
  result = Db.transaction( ->
    noteids = /user[uid]/notes
    List.fold(
      (h,acc -> 
        match ?/note[h] with
        | {none} -> acc
        | {some = note} -> [note|acc]
      ), noteids, [])
  )
  match result with
  | {none} -> []
  |~{some} -> some

谢谢你的帮助。

编辑：更多细节

一些可能有用的额外信息：

经过更多测试后，我注意到写入前 100 条记录只需 5 秒。每条记录的写入时间都比前一条记录长。在第 500 条记录处，写入每条记录需要 5 秒。

如果我中断程序（当它开始感觉很慢时）并再次启动它（不清除数据库），它会以与我中断它时相同的（慢）速度写入记录。

这会让我们更接近解决方案吗？

score 3 · Accepted Answer

Nic，这可能不是您希望的答案，但它是：

我建议这种性能实验来改变框架；例如根本不使用客户端。我会用create_node这个替换函数中的代码：

counter = Reference.create(0)
create_note() =
  key = Db.fresh_key(@/note)
  do /note[key] <- {id = key uid = 0 content = "note"}
  noteids = /user[0]/notes
  do Reference.update(counter, _ + 1)
  do /user[0]/notes <- [key|noteids]
  cntr = Reference.get(counter)
  do if mod(cntr, 100) == 0 then
       Log.info("notes", "{cntr} notes created")
     else
       void
  void

import stdlib.profiler

create_notes() =
  repeat(1000, -> P.execute(create_note, "create_note"))

P = Server_profiler

_ =
  do P.init()
  do create_user()
  do create_notes()
  do P.execute(-> get_notes(0), "get_notes(0)")
  P.summarize()

每 100 次插入的中间时间是打印机，您会很快看到插入时间与插入项目的数量成二次关系，而不是线性关系。这是因为列表更新/user[0]/notes <- [key|noteids]显然会导致整个列表被再次写入。AFAIK 我们进行了优化来避免这种情况，但要么我错了，要么由于某些原因它们在这里不起作用——我会试着调查一下，一旦我知道更多，我会告诉你的。

除了前面提到的优化之外，在 Opa 中对这些数据建模的更好方法是使用集合，如以下程序中所示：

type Note.t =
{ id : int
; uid : int // id of the user this note belongs to
; content : string
}

db /user_notes[{user_id; note_id}] : { user_id : int; note_id : int }
db /note : intmap(Note.t)

get_notes(uid:int) : list(Note.t) =
  add_note(acc : list(Note.t), user_note) =
    note = /note[user_note.note_id]
    [note | acc]
  noteids = /user_notes[{user_id=uid}] : dbset({user_id:int; note_id:int})
  DbSet.fold(noteids, [], add_note)

counter = Reference.create(0)

create_note() =
  key = Db.fresh_key(@/note)
  do /note[key] <- {id = key uid = 0 content = "note"}
  do DbVirtual.write(@/user_notes[{user_id=0}], {note_id = key})
  do Reference.update(counter, _ + 1)
  cntr = Reference.get(counter)
  do if mod(cntr, 100) == 0 then
       Log.info("notes", "{cntr} notes created")
     else
       void
  void

import stdlib.profiler

create_notes() =
  repeat(1000, -> Server_profiler.execute(create_note, "create_note"))

_ =
  do Server_profiler.init()
  do create_notes()
  do Server_profiler.execute(-> get_notes(0), "get_notes(0)")
  Server_profiler.summarize()

您将在其中设置填充数据库大约需要 2 秒。不幸的是，这个特性是大量实验性的，因此没有记录，正如你将看到的，它确实在这个例子中爆炸了。

恐怕我们真的不打算改进 (3) 和 (4)，因为我们意识到提供符合行业标准的内部数据库解决方案不太现实。因此，目前我们正集中精力将 Opa 与现有的 No-SQL 数据库紧密集成。我们希望在接下来的几周内能有一些关于这方面的好消息。

我会尝试从我们的团队那里了解更多关于这个问题的信息，如果我发现我错过/出错了，我会进行更正。

opa - Opa：如何高效地读/写大量记录

问题

来源

还有一件事

编辑：更多细节

1 回答 1

Related

Reference