arrays - Julia：一种从数组数组中获取矩阵的快速而优雅的方法

Question

有一个包含超过 10,000 对 Float64 值的数组。像这样的东西：

v = [[rand(),rand()], ..., [rand(),rand()]]

我想从中得到一个包含两列的矩阵。可以通过一个循环绕过所有对，它看起来很麻烦，但会在几分之一秒内给出结果：

x = Vector{Float64}()
y = Vector{Float64}()
for i = 1:length(v)
    push!(x, v[i][1])
    push!(y, v[i][2])
end
w = hcat(x,y)

permutedims(reshape(hcat(v...), (length(v[1]), length(v))))我在这个任务中找到的解决方案看起来更优雅但完全暂停了 Julia，需要重新启动会话。也许六年前它是最佳的，但现在它不适用于大型阵列。有没有既紧凑又快速的解决方案？

score 11 · Accepted Answer

我希望这对您来说足够简短有效：

 getindex.(v, [1 2])

如果你想要一些更容易消化的东西：

[v[i][j] for i in 1:length(v), j in 1:2]

解决方案也hcat可以写成：

permutedims(reshape(reduce(hcat, v), (length(v[1]), length(v))));

它不应该挂起你的 Julia（请确认 - 它对我有用）。

@Antonello：要了解为什么会这样，请考虑一个更简单的示例：

julia> string.(["a", "b", "c"], [1 2])
3×2 Matrix{String}:
 "a1"  "a2"
 "b1"  "b2"
 "c1"  "c2"

我正在广播一个专栏Vector ["a", "b", "c"]和一个 1-row Matrix [1 2]。关键是这[1 2]是一个Matrix. 因此，它使广播同时扩展行（由向量强制）和列（由 a 强制Matrix）。[1 2]为了使这种扩展发生，矩阵只有一行是至关重要的。这现在更清楚了吗？

score 3 · Accepted Answer

您自己的示例非常接近一个好的解决方案，但是通过创建两个不同的向量并重复使用push!. 此解决方案类似，但更简单。它不像getindex@BogumilKaminski 广播的那样简洁，但更快：

function mat(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for i in eachindex(v)
        M[i, 1] = v[i][1]
        M[i, 2] = v[i][2]
    end
    return M
end

您可以进一步简化它，而不会损失性能，如下所示：

function mat_simpler(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for (i, x) in pairs(v)
        M[i, 1], M[i, 2] = x
    end
    return M
end

score 1 · Accepted Answer

迄今为止发布的各种解决方案的基准...

using BenchmarkTools
# Creating the vector
v = [[i, i+0.1] for i in 0.1:0.2:2000]

M1 = @btime vcat([[e[1] e[2]] for e in $v]...)
M2 = @btime getindex.($v, [1 2])
M3 = @btime [v[i][j] for i in 1:length($v), j in 1:2]
M4 = @btime permutedims(reshape(reduce(hcat, $v), (length($v[1]), length($v))))
M5 = @btime permutedims(reshape(hcat($v...), (length($v[1]), length($v))))

function original(v)
    x = Vector{Float64}()
    y = Vector{Float64}()
    for i = 1:length(v)
        push!(x, v[i][1])
        push!(y, v[i][2])
    end
    return hcat(x,y)
end
function mat(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for i in eachindex(v)
        M[i, 1] = v[i][1]
        M[i, 2] = v[i][2]
    end
    return M
end
function mat_simpler(v)
    M = Matrix{eltype(eltype(v))}(undef, length(v), 2)
    for (i, x) in pairs(v)
        M[i, 1], M[i, 2] = x
    end
    return M
end

M6 = @btime original($v)
M7 = @btime mat($v) 
M8 = @btime mat($v)

M1 == M2 == M3 == M4 == M5 == M6 == M7 == M8 # true

输出：

1.126 ms (10010 allocations: 1.53 MiB)       # M1
  54.161 μs (3 allocations: 156.42 KiB)      # M2
  809.000 μs (38983 allocations: 765.50 KiB) # M3
  98.935 μs (4 allocations: 312.66 KiB)      # M4
  244.696 μs (10 allocations: 469.23 KiB)    # M5
219.907 μs (30 allocations: 669.61 KiB)      # M6
34.311 μs (2 allocations: 156.33 KiB)        # M7
34.395 μs (2 allocations: 156.33 KiB)        # M8

请注意，基准代码中的美元符号只是强制@btime将向量视为局部变量。

arrays - Julia：一种从数组数组中获取矩阵的快速而优雅的方法

3 回答 3

Related

Reference