c# - PLINQ 和大数据搜索

Question

我一直在寻找验证，我认为是一个理想的解决方案。

我有一个来自客户的数百万“实体”的列表。我想将每个实体与另一个（或其他几个）列表进行比较，该列表可以拥有数百万个实体并记录点击。

实体通常是一个人，有姓名/号码/出生日期/等，但也可以是公司名称。

我有一个项目，它将请求作为一个实体 xml，搜索并将请求和结果 xml 保存到数据库中。

我需要的是在可配置数量的线程上运行该项目，并在其他线程完成时产生新线程。PLINQ 是解决这个问题的理想解决方案吗？

所以说我想要10个线程。我想获取前 10 个实体并生成 10 个线程。当第一个线程结束时，第 11 个实体应该在一个新线程上开始，等等，直到所有的都被搜索到。

谢谢您的任何意见，我不精通并行性。

score 0 · Accepted Answer

If you're going to be saving into a database anyway, why don't you just bulk import your data and use queries to join the two sets of data? That should perform much faster than trying to do it in memory. I would hate to see the memory you are consuming with millions of entities.

If you must do it in memory, using PLinq MAY prove to be faster. There is an overhead to thread creation and context switching. With PLinq, you should let the engine determine the thread allocations. You should rarely create threads explicitly at this point.

However, if the list you are comparing against is relatively static, you may benefit more from making it a dictionary and relying on the key for the lookup as you won't have to scan through the entire list for each item you are trying to find.

c# - PLINQ 和大数据搜索

1 回答 1

Related

Reference