c++ - 对大型犰狳矩阵或向量进行二次采样

Question

我一直在浏览Armadillo 文档和示例，但似乎没有真正有效的方法来对大向量或矩阵进行二次采样（或重新采样），这样如果你最初有 N 个元素，你最终会得到 N / k 个元素. 有几种方法可以洗牌和转移，但仅此而已。

所以我只是按顺序循环遍历所有元素，但除了对可用内核进行矢量化之外，肯定还有更好的方法吗？

bool subsample(config& cfg, arma::mat& data, int skippCount)
{
    const auto processor_count = 1; // currently not using threading because 'inplace'

    const size_t cols = data.n_cols;
    const size_t period = skippCount + 1 ;
    size_t newCols = cols / period;
    newCols += (0 == (cols % period)) ? 0 : 1;
       
    const size_t blockSize = 256;
    std::vector<thread> workers;

    for (size_t blockID = 0; blockID < newCols / blockSize; ++blockID) {
        workers.push_back(std::thread([&data, blockID, newCols, period]() { 
            // copy blockSize elements inplace (overwrites other entries))
            size_t c = blockID * blockSize;
            for (size_t b = 0; (c < newCols) && (b < blockSize); c++, b++) {
                arma::vec v = data.col(period * c); 
                data.col(c) = v;
            }
        }));

        if (workers.size()==processor_count) {
            for (auto& thread : workers) thread.join();
            workers.clear();
        }
    }
    for (auto& thread : workers) thread.join(); // make sure all threads finish
    data.resize(data.n_rows, newCols);
    return true;
}

如果您对此有任何改进建议，将不胜感激。此外，最好这样做“就地”以节省内存。

c++ - 对大型犰狳矩阵或向量进行二次采样

0 回答 0

Related

Reference