1

This code fragment is a bottleneck in a project of mine. Are there function calls that could replace the for loops and speed it up?

D = np.zeros((nOcc,nOcc,nVir,nVir))
for i in range(nOcc):
   for j in range(i+1):
      tmp = Ew[i] + Ew[j]
      for a in range(nVir):
         tmp2 = tmp - Ew[a+nOcc]
         for b in range(a+1):
            tmp3 = 1.0/(tmp2 - Ew[b+nOcc])
            D[i,j,a,b] = Iiajb[i,a,j,b]*tmp3
            D[i,j,b,a] = Iiajb[i,b,j,a]*tmp3
            D[j,i,a,b] = D[i,j,b,a]
            D[j,i,b,a] = D[i,j,a,b]
4

2 回答 2

3

To start off lets generate some arbitrary data, thats obeys a few required principles:

nOcc = 30
nVir = 120
Ew = np.random.rand(nOcc+nVir)
Ew[:nOcc]*=-1
Ia = np.random.rand(nOcc)
Ib = np.random.rand(nVir)
I = np.einsum('a,b,c,d->abcd',Ia,Ib,Ia,Ib)

Lets wrap your base code as an example:

def oldcalc_D(Iiajb,nOcc,nVir,Ew):
    D = np.zeros((nOcc,nOcc,nVir,nVir))
    for i in range(nOcc):
       for j in range(i+1):
          tmp = Ew[i] + Ew[j]
          for a in range(nVir):
             tmp2 = tmp - Ew[a+nOcc]
             for b in range(a+1):
                tmp3 = 1.0/(tmp2 - Ew[b+nOcc])
                D[i,j,a,b] = Iiajb[i,a,j,b]*tmp3
                D[i,j,b,a] = Iiajb[i,b,j,a]*tmp3
                D[j,i,a,b] = D[i,j,b,a]
                D[j,i,b,a] = D[i,j,a,b]
    return D

Taking advantage of integral symmetry is typically a good tactic; however, in numpy alone it is not worth the cost so we are going to ignore this and simply vectorize your code:

def newcalc_D(I,nOcc,nVir,Ew):
    O = Ew[:nOcc]
    V = Ew[nOcc:]
    D = O[:,None,None,None] - V[:,None,None] + O[:,None] - V
    return (I/D).swapaxes(1,2)

Some timings:

np.allclose(oldcalc_D(I,nOcc,nVir,Ew),newcalc_D(I,nOcc,nVir,Ew))
True

%timeit newcalc_D(I,nOcc,nVir,Ew)
1 loops, best of 3: 142 ms per loop

%timeit oldcalc_D(I,nOcc,nVir,Ew)
1 loops, best of 3: 15 s per loop

So only about ~100x faster, as I said this is a fairly simple pass to give you an idea what to do. This can be done much better, but should be a trivial part of the calculation as the integral transformation is (O)N^5 vs this at (O)N^4. For these operations I use numba's autojit feature:

from numba import autojit

numba_D = autojit(oldcalc_D)

%timeit numba_D(I,nOcc,nVir,Ew)
10 loops, best of 3: 55.1 ms per loop
于 2013-10-31T04:29:13.940 回答
0

unless this is Python3, you may want to start by replacing range with xrange: the former creates the entire list, while the latter is just an iterator, which is all you need in this case.
For large Ns, speed difference should be noticeable.

Also, seeing as your using numpy, there is probably a vectorized way to implement the algorithm. If that's the case, the vectorized implementation should be orders of magnitude faster. But unless you explain the variables and the algorithm, we can't help you in that direction.

于 2013-10-30T21:40:17.480 回答