1

我做了一个数值模拟。我有 50,000 个粒子在一个盒子中结晶,其中有几个晶体(具有周期性边界条件)。我试着在盒子里找到最大的水晶。该代码运行良好,但它需要100 [s]在我的计算机上。速率确定步骤是find_neighbors()下面的函数。

有没有办法使用并行计算来缩短计算时间?

我的电脑有12核。也许我们可以将数组(中心)分成 12 个部分?

我的函数的输出是一个二维列表。

这是find_neighbors()没有并行计算编写的函数:

# Calculate if two particles are neighbors (Periodic Boundary Condition)
#              center is the coordinates of the particles
#              d      is the distance threshold for two clusters to be neighbors
#              Maxima_Range and
#              Lattice_Range are for periodic boundary conditions
#              Pos_3d is a list.
# Suppose particle i has neighbors j,k,l.
#              Pos_3d[i] = [j,k,l]

def find_neighbors(center,d,Maxima_Range,Lattice_Range):
    a,b = np.shape(center)          # get the dimension of the array
    flag_3d = np.zeros(a)           # count the number of neighbors for particle i
    Pos_3d = [[] for i in range(a)] # Pos_3d is a two dimensional list. 
    #                               # Pos_3d[i] includes the indices of all the neighbors of particle i

    diff_sqrt = d ** 0.5
    for i in range(a):
        if i % 6 == 5:              # for some reason, when i % 6 == 5,
            #                       #     it is the center coordinates,
            #                       #     this cannot be changed.
            for j in range(i+1,a):
                if j % 6 == 5:

                    diff_x = center[i][0]-center[j][0]
                    diff_y = center[i][1]-center[j][1]
                    diff_z = center[i][2]-center[j][2]

                    if (   diff_x <  diff_sqrt
                       and diff_y <  diff_sqrt
                       and diff_z <  diff_sqrt
                       and diff_x > -diff_sqrt
                       and diff_y > -diff_sqrt
                       and diff_z > -diff_sqrt ):
                           #        # if one of the x,y,z component is already
                           #        #    bigger than d, we don't need
                           #        #    to calculate their distance in 3-D.
                           #        #    They are not neighbors.

                        diff = ( diff_x * diff_x
                               + diff_y * diff_y
                               + diff_z * diff_z
                                 )

                        if diff <= d: # without period boundary condition
                            flag_3d[i] +=1
                            flag_3d[j] +=1
                            Pos_3d[i].append(j)
                            Pos_3d[j].append(i)
                            continue
                            
                       
                        # With periodic boundary condition
                        # in x

                        do something..

    return Pos_3d
4

1 回答 1

0

“有没有办法使用并行计算来缩短计算时间?”

就在这里。

然而,在任何人开始尝试之前,请修复您的原始代码以避免任何和所有开销:

最好避免列表,附加但更喜欢 numpy 可向量化和高效处理数据结构:

aVectorOfDIFFs = center[i] - center[j] # aVectorOfDIFFs.shape  ( 3, ) # _x, _y, _z
if d >= np.dot( aVectorOfDIFFs,        # diff == ( ( _x * _x )
                aVectorOfDIFFs.T       #         + ( _y * _y )
                ):                     #         + ( _z * _z ) )
    # makes sense to do anything ... with ( i, j )
    ...

除了将代码智能自上而下地重构为性能最佳的向numpy量化处理(如果对最终性能感兴趣,我们可以稍后再谈),只是原始循环的简单串联for效率非常低,花费最多在那段时间里,python 多年来都知道这一点,缓慢的循环迭代。

它可以(原样)~ 20x更快地运行:

>>> from zmq import Stopwatch; aClk = Stopwatch()

>>> def aPairOfNaiveFORs( a = 1E6 ):
...     intA = int( a )                        #           a gets fused into (int)
...     for            i in range(      intA ):# original, keep looping all i-s
...         if         i % 6 == 5:             #        if i-th "center":
...             for    j in range( i+1, intA ):#           keep looping all j-s
...                 if j % 6 == 5:             #           if j-th "center":
...                     # NOP,                 #              do some usefull work
...                     # not important here   #
...                     # for the testing, the overhead comparison matters
...                     continue               #               loop next
...     return


>>> aClk.start(); _ = aPairOfNaiveFORs( 1E3 ); aClk.stop()
   14770 [us] for 1E3 ~  13+ [us] per one item of a[:1000]
   14248
   13430
   13252

>>> aClk.start(); _ = aPairOfNaiveFORs( 1E5 ); aClk.stop()
64559866 [us] for 1E5 ~ 645+ [us] per one item of a[:10000]

现在,避免所有不符合的荒谬循环5 == <index> modulo 6

>>> def aPairOfLessNaiveFORs( a = 1E6 ):
...     intA = int( a )
...     for     i in range (  5, intA, 6 ):    # yes, start with the 5-th item of the first matching "center"
...         for j in range( i+6, intA, 6 ):    # yes, keep matching ( modulo 6 ) steps
...               continue

测试显示~ +13x ~ +17x更快的代码(仅避免 5/6 循环):

>>> aClk.start(); _ = aPairOfLessNaiveFORs( 1E3 ); aClk.stop()
    1171 [us] for 1E3 ~  1 [us] per one item of a[:1000]
     976
    1157
    1181
    1185

>>> aClk.start(); _ = aPairOfLessNaiveFORs( 1E5 ); aClk.stop()
 3725167 [us] for 1E5 ~ 37 [us] per one item of a[:10000]

恢复 :

正是这种避免主要开销的精确工程将缩短处理在大约 5E5 结晶中心(即 3E6 坐标)某处缩放的循环。

>>> aClk.start(); _ = aPairOfLessNaiveFORs( 5E5 ); aClk.stop()
 100104334 [us] for 5E5 ~ 17.6 x FASTER
>>> aClk.start(); _ =     aPairOfNaiveFORs( 5E5 ); aClk.stop()
1759420300 [us] for 5E5 ~ ^^^^^^^^^^^^^

即使这样,它也很容易从进入并行计算模型中获得任何“积极”的提升(记住修正后的阿姆达尔定律,这样做的附加开销不会被忽略或忽略)。

于 2020-05-24T13:49:33.643 回答