0

I would like to loop trough a database, find the appropriate values and insert them in the appropriate cell in a separate file. It maybe a csv, or any other human-readable format. In pseudo-code:

for item in huge_db:
   for list_of_objects_to_match:
      if itemmatch():
         if there_arent_three_matches_yet_in_list():
            matches++
            result=performoperationonitem()
            write_in_file(result, row=object_to_match_id, col=matches)
         if matches is 3:
            remove_this_object_from_object_to_match_list()

can you think of any way other than going every time through all the outputfile line by line? I don't even know what to search for... even better, there are better ways to find three matching objects in a db and have the results in real-time? (the operation will take a while, but I'd like to see the results popping out RT)

4

1 回答 1

0

假设itemmatch()是一个相当简单的函数,这将比你的伪代码做我认为你想要的更好:

for match_obj in list_of_objects_to_match:
  db_objects = query_db_for_matches(match_obj)
  if len(db_objects) >= 3:
      result=performoperationonitem()
      write_in_file(result, row=match_obj.id, col=matches)
  else:
      write_blank_line(row=match_obj.id)  # if you want

然后诀窍就变成了编写query_db_for_matches()函数。没有详细信息,我假设您正在寻找与某个特定字段匹配的对象,称之为type. 在 pymongo 中,这样的查询看起来像:

def query_db_for_matches(match_obj):
    return pymongo_collection.find({"type":match_obj.type})

为了让它有效地运行,请确保您的数据库在您正在查询的字段上具有索引,方法是首先调用:

pymongo_collection.ensure_index({"type":1})

您第一次调用ensure_index它可能需要很长时间才能获得大量收藏。但是每次之后它都会很快——足够快,你甚至可以把它放在query_db_for_matches你的前面find,这很好。

于 2013-04-06T04:59:37.080 回答