所以我试图在 Julia 中使用 ArrayFire,我发现随着时间的推移性能会奇怪地下降:
using ArrayFire
srand(1)
function f()
r = AFArray(zeros(Float32, 100, 100000))
a = AFArray(rand(Float32, 100, 100000))
for d in 1:100:90000
r[:,d:d+99] = a[:,d:d+99] .* a[:,d:d+99]
end
nothing
end
function g()
r = zeros(Float32, 100, 100000)
a = ones(Float32, 100, 100000)
for d in 1:100:90000
r[:,d:d+99] = a[:,d:d+99] .* a[:,d:d+99]
end
nothing
end
for _ in 1:15
@time f()
end
如果你运行这段代码,你会发现每次迭代都变得越来越慢。finalize
我尝试在内部调用r
并尝试将这些数组从 GPU 内存中抛出,以防出现问题,但它没有做任何事情。a
f()
这是输出:
0.810842 seconds (114.91 k allocations: 80.216 MB, 0.71% gc time)
0.283941 seconds (79.22 k allocations: 78.561 MB, 3.22% gc time)
0.267405 seconds (79.22 k allocations: 78.561 MB, 2.31% gc time)
0.332186 seconds (79.22 k allocations: 78.561 MB, 1.76% gc time)
0.405174 seconds (79.22 k allocations: 78.561 MB, 1.50% gc time)
0.433224 seconds (79.22 k allocations: 78.561 MB, 2.11% gc time)
0.501358 seconds (79.22 k allocations: 78.561 MB, 1.18% gc time)
0.572704 seconds (79.22 k allocations: 78.561 MB, 1.07% gc time)
0.650663 seconds (79.22 k allocations: 78.561 MB, 1.10% gc time)
0.794873 seconds (79.22 k allocations: 78.561 MB, 1.16% gc time)
0.838882 seconds (79.22 k allocations: 78.561 MB, 1.04% gc time)
1.281940 seconds (79.22 k allocations: 78.561 MB, 0.61% gc time)
1.200713 seconds (79.22 k allocations: 78.561 MB, 0.37% gc time)
1.268786 seconds (79.22 k allocations: 78.561 MB, 0.78% gc time)
1.396851 seconds (79.22 k allocations: 78.561 MB, 0.66% gc time)