几个月前我有一个类似的要求,给定一个 2d 点,我希望从最接近该点的线段数组中找到。我努力强迫 tensorflowjs 有效地执行此操作,最终偶然发现了更适合编译自定义 GPU 内核函数的 gpu.js。
在下面我制作的示例中,我有代表 11 个 (X,Y) 坐标的数组和代表 5 个 (X,Y) 坐标的另一对数组。结果将是一个 11x5 的矩阵,它计算两组点之间的每个距离。关键功能是“内核”,它由 gpu.js 编译以在 GPU 内核上运行,本质上计算来自 11 个坐标和 5 个坐标的一对点之间的距离。理论上,这个内核函数将被放置在许多 GPU 内核上以加速性能。即,在这种情况下,一次执行所有 55 个。(我说“理论上”,因为据我了解,gpu.js 利用了 webGL 着色器映射功能,并且不完全确定堆栈中涉及的虚拟化层的影响会导致 GPU 内核实际执行工作。 ..)
结果是一个 11x5 矩阵,其中包含与每个点对组合的距离,然后这个 11x5 矩阵通过管道传输到“kernelMin”,这会慢一些,因为它正在循环寻找最小值的结果,并且还捕获最小值的索引。话虽如此,应该有 11 个并发 GPU 内核在努力寻找 5 个坐标中的哪一个最接近。
const kernel = gpu.createKernel(function(x0, y0, x1, y1) {
let dx = x1[this.thread.y][0] - x0[0][this.thread.x];
let dy = y1[this.thread.y][0] - y0[0][this.thread.x];
return Math.sqrt(dx * dx + dy * dy);
}).setPipeline(true).setOutput([11,5]);
const result1 = kernel(
GPU.input(
new Float32Array([0,10,20,30,40,50,60,70,80,90,100]),
[11,1]
),
GPU.input(
new Float32Array([100,100,100,100,100,100,100,100,100,100,100]),
[11,1]
),
GPU.input(
new Float32Array([0,30,50,70,100]),
[1,5]
),
GPU.input(
new Float32Array([10,10,10,10,10]),
[1,5]
)
);
console.log(result1.toArray());
const kernelMin = gpu.createKernel(function(a) {
let minVal = 1000000;
let minIdx = 0;
for (let y = 0; y < 5; y++) {
if (a[y][this.thread.x] < minVal) {
minVal = a[y][this.thread.x];
minIdx = y;
}
}
return [minVal,minIdx];
}).setOutput([11]);
const result2 = kernelMin(result1);
console.log(result2);
最终输出是...
0: Float32Array(2) [90, 0]
1: Float32Array(2) [90.55384826660156, 0]
2: Float32Array(2) [90.55384826660156, 1]
3: Float32Array(2) [90, 1]
4: Float32Array(2) [90.55384826660156, 1]
5: Float32Array(2) [90, 2]
6: Float32Array(2) [90.55384826660156, 2]
7: Float32Array(2) [90, 3]
8: Float32Array(2) [90.55384826660156, 3]
9: Float32Array(2) [90.55384826660156, 4]
10: Float32Array(2) [90, 4]
请注意,为了清楚起见,我已将矩阵大小硬编码到示例中。Gpu.js 显然接受变量。此外,在您的情况下,根据矩阵的大小,您可能必须根据容纳完整交叉距离矩阵所需的 GPU RAM 量将问题分解为多个块......
我意识到这不是 tensorflowjs,但希望这会有所帮助。
编辑 - 通过 TensorFlow.JS
花了几分钟移植到 tensorflow.js。核心概念是构建 x 和 y 值的矩阵,为执行大规模计算做准备。
const x0 = tf.tensor1d([0,10,20,30,40,50,60,70,80,90,100]);
const y0 = tf.tensor1d([100,100,100,100,100,100,100,100,100,100,100]);
const x1 = tf.tensor1d([0,30,50,70,100]);
const y1 = tf.tensor1d([10,10,10,10,10]);
const x0mat = x0.tile([5]).reshape([5,11]);
const y0mat = y0.tile([5]).reshape([5,11]);
const x1mat = x1.tile([11]).reshape([11,5]).transpose();
const y1mat = y1.tile([11]).reshape([11,5]).transpose();
x0mat.print();
x1mat.print();
const xDeltas = x1mat.squaredDifference(x0mat);
y0mat.print();
y1mat.print();
const yDeltas = y1mat.squaredDifference(y0mat);
const distance = xDeltas.add(yDeltas).sqrt();
distance.print();
distance.argMin(1).print();
distance.min(1).print();
结果...
Tensor - x0mat
[[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100],
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]]
Tensor - x1mat
[[0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 , 0 ],
[30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 , 30 ],
[50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 , 50 ],
[70 , 70 , 70 , 70 , 70 , 70 , 70 , 70 , 70 , 70 , 70 ],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]]
Tensor - y0mat
[[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100],
[100, 100, 100, 100, 100, 100, 100, 100, 100, 100, 100]]
Tensor - y1mat
[[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10],
[10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10]]
Tensor - distance
[[90 , 90.5538483 , 92.1954422 , 94.8683319, 98.4885788 , 102.9562988, 108.1665344, 114.01754 , 120.415947 , 127.2792206, 134.5362396],
[94.8683319 , 92.1954422 , 90.5538483 , 90 , 90.5538483 , 92.1954422 , 94.8683319 , 98.4885788, 102.9562988, 108.1665344, 114.01754 ],
[102.9562988, 98.4885788 , 94.8683319 , 92.1954422, 90.5538483 , 90 , 90.5538483 , 92.1954422, 94.8683319 , 98.4885788 , 102.9562988],
[114.01754 , 108.1665344, 102.9562988, 98.4885788, 94.8683319 , 92.1954422 , 90.5538483 , 90 , 90.5538483 , 92.1954422 , 94.8683319 ],
[134.5362396, 127.2792206, 120.415947 , 114.01754 , 108.1665344, 102.9562988, 98.4885788 , 94.8683319, 92.1954422 , 90.5538483 , 90 ]]
Tensor - argMin of distance
[0, 3, 5, 7, 10]
Tensor - min of distance
[90, 90, 90, 90, 90]
代码按步骤分解以显示基本概念。我确信它可以进一步压缩和优化。