我一直在浏览Armadillo 文档和示例,但似乎没有真正有效的方法来对大向量或矩阵进行二次采样(或重新采样),这样如果你最初有 N 个元素,你最终会得到 N / k 个元素. 有几种方法可以洗牌和转移,但仅此而已。
所以我只是按顺序循环遍历所有元素,但除了对可用内核进行矢量化之外,肯定还有更好的方法吗?
bool subsample(config& cfg, arma::mat& data, int skippCount)
{
const auto processor_count = 1; // currently not using threading because 'inplace'
const size_t cols = data.n_cols;
const size_t period = skippCount + 1 ;
size_t newCols = cols / period;
newCols += (0 == (cols % period)) ? 0 : 1;
const size_t blockSize = 256;
std::vector<thread> workers;
for (size_t blockID = 0; blockID < newCols / blockSize; ++blockID) {
workers.push_back(std::thread([&data, blockID, newCols, period]() {
// copy blockSize elements inplace (overwrites other entries))
size_t c = blockID * blockSize;
for (size_t b = 0; (c < newCols) && (b < blockSize); c++, b++) {
arma::vec v = data.col(period * c);
data.col(c) = v;
}
}));
if (workers.size()==processor_count) {
for (auto& thread : workers) thread.join();
workers.clear();
}
}
for (auto& thread : workers) thread.join(); // make sure all threads finish
data.resize(data.n_rows, newCols);
return true;
}
如果您对此有任何改进建议,将不胜感激。此外,最好这样做“就地”以节省内存。