c - 将子线程拆分为新的子线程（Openmp）

Question

我有一个关于多线程（Openmp 和 C 代码）的问题。

我将在给定的文本文件中搜索 16 个不同的单词。这样做的方法是创建一个 for 循环，该循环遍历包含要搜索的每个单词的数组。16 个不同的词意味着可以同时运行 16 个不同的线程。使用多线程的另一种方法是将文本文件切割成 x 个类似大小的块并同时搜索每个块。我的问题是：我可以使用多线程为每个单词创建一个线程，然后将该特定子线程拆分为新的子线程以扫描一个大小的数据块吗？

如果这不可能/不可行，我想唯一的解决方案是将文本文件手动拆分为不同的字符数组，然后对我要搜索的每个单词使用#pragma。

只会对文本文件执行读取操作，写入操作仅限于分配给每个单词的变量以用于计数目的。即不会有比赛条件，除非我错过了什么。

score 0 · Accepted Answer

To start with, 16 words does not mean 16 different threads. Each thread in itself brings factors like synchronization overhead, unequal work distribution, spawn-relax time etc., if not carefully worked out, will result in completely compromising the benefits of parallelism.

Parallelism can be exploited greatly in absence of dependency between work data. Your job is independent of the work data in either of the possible solution you have expressed. To look for some characters within a large pool of characters, has nothing to do with the arrangement of characters (e.g. Look of "++" after every "C" vs Look for "++").

The question now is how do you utilize the memory bandwidth and how to dissolve the lost by "Branch Prediction".

Common Data Different threads is very good in memory bound situation. The first thread to read data would load it into cache and others would just end up utilizing this chunk of data. So data would be loaded only once from the memory. Split data will also result for loading the data only once from the memory, as each thread would load its part and then flush out. So, Memory load is very much similar in both cases.

Your code is something called "logically bound". Branch Prediction is very important aspect to take care of but is very tricky. The common data module has a high probability of failing it. Say you have to find a red colored ball in a million balls. The prediction would tend to make the natural choice of "not-present". But say it fails within the first 100 balls, all the processed balls after this fail would be end up in a redo and you are back to square one. On the contrary, In case of splitting, the worst scenario of "redo" would not be bad as the common data. But we cannot guarantee that.

Choosing any of the above modules then completely depends on the architecture at you disposal.

SMT / Logical cores vs Physical Cores. Using any of the above method with SMT would result in less performance w.r.t same no. of Physical cores. Your idea of one thread splitting into sub threads is basis of SMT. Each new thread than adds up chase for resources with synchronization overhead and also a possibility of branch misprediction. In case of increasing Physical Threads, branch Prediction limits the performance.

Splitting or Common data, your code is not scalable. The best thing to do would be to utilize minimum no. of available physical cores. Common data brings less complexity and should be very easy to work on. The minimum no. of threads can be found by experimenting. The performance would just be constant after a this value.

c - 将子线程拆分为新的子线程（Openmp）

1 回答 1

Related

Reference