一方面,你真的不需要p作为函数的参数:len(qs)只需要调用一次,而且非常便宜。如果您的输入始终是一个 numpy 数组(并且在这种情况下没有理由不应该),qs.size那么也可以。
让我们从重写开始f_map。那里的循环只是qs(但从零开始)的累积和,您可以预先计算一次(或每次调用外部函数至少一次)。
def f_map(i, j, cumsum_q):
return j + cumsum_q[i]
将在哪里cumsum_q定义build_prolongation_operator为
cumsum_q = np.roll(np.cumsum(qs), 1)
cumsum_q[0] = 0
我相信你会体会到f_map在build_prolongation_operator. 为了使它更容易,我们可以完全删除f_map并在您的条件下使用它所代表的表达式:
if i == k + cumsum_q[j]:
I[i, j] = 1
然后循环k意味着“if iis k + cumsum[j]for any k ”,将元素设置为1。如果我们将条件重写为i - cumsum_q[j] == k,您可以看到我们根本不需要循环k。如果它是非负的并且严格小于 ,i - cumsum_q[j]则将等于范围内的一些 。你可以检查k[0, qs[j])qs[j]
if i >= cumsum_q[j] and i - cumsum_q[j] < qs[j]:
I[i, j] = 1
这将您的循环减少到矩阵的每个元素一次迭代。你不能做得比这更好:
def build_prolongation_operator_optimized(qs):
'''
The prolongation operator describes the relationship between
the coarse and fine bases:
V_coarse = np.dot(V_fine, I)
'''
qs = np.asanyarray(qs)
p = qs.size
cumsum_q = np.roll(np.cumsum(qs), 1)
q = cumsum_q[0]
cumsum_q[0] = 0
I = np.zeros([q, p])
for i in range(0, q):
for j in range(0, p):
# if BV i is a child of j, we set I[i, j] = 1
if 0 <= i - cumsum_q[j] < qs[j]:
I[i, j] = 1
return I
现在您已经知道每个单元格的公式,您可以让 numpy 使用广播在基本上一行中为您计算整个矩阵:
def build_prolongation_operator_numpy(qs):
qs = np.asanyarray(qs)
cumsum_q = np.roll(np.cumsum(qs), 1)
q = cumsum_q[0]
cumsum_q[0] = 0
i_ = np.arange(q).reshape(-1, 1) # Make this a column vector
return (i_ >= cumsum_q) & (i_ - cumsum_q < qs)
我运行了一个小型演示,以确保 (A) 建议的解决方案获得与原始解决方案相同的结果,并且 (B) 工作得更快:
In [1]: p = 10
In [2]: q = np.random.randint(3, size=p)
In [3]: ops = (
... build_prolongation_operator(p, qs),
... build_prolongation_operator_optimized(qs),
... build_prolongation_operator_numpy(qs),
... build_prolongation_operator_RaunaqJain(p, qs),
... build_prolongation_operator_gboffi(p, qs),
... )
In [4]: np.array([[(op1 == op2).all() for op1 in ops] for op2 in ops])
Out[4]:
array([[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True],
[ True, True, True, True, True]])
In [5]: %timeit build_prolongation_operator(p, qs)
321 µs ± 890 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [6]: %timeit build_prolongation_operator_optimized(qs)
75.1 µs ± 1.85 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [7]: %timeit build_prolongation_operator_numpy(qs)
24.8 µs ± 77.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [8]: %timeit build_prolongation_operator_RaunaqJain(p, qs)
28.5 µs ± 1.55 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [9]: %timeit build_prolongation_operator_gboffi(p, qs)
31.8 µs ± 772 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
In [10]: %timeit build_prolongation_operator_gboffi2(p, qs)
26.6 µs ± 768 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
如您所见,最快的选项是完全矢量化的选项,但@RaunaqJain和@gboffi 的选项紧随其后。
笔记
我的矢量化解决方案创建了一个布尔数组。如果您不希望使用I.astype(...)转换为所需的 dtype,或将其视为np.uint8数组:I.view(dtype=np.uint8)。