c++ - 在非线性有限元代码中有效地使用 Eigen 进行重复稀疏矩阵组装

Question

我正在尝试使用 Eigen 有效地组装用于非线性有限元计算的刚度矩阵。

从我的有限元离散化中，我可以准确地提取我的稀疏模式。因此我可以使用：

mat.reserve(nnz);
mat.setFromTriplets(TripletList.begin(), TripletList.end());

正如http://eigen.tuxfamily.org/dox/group__SparseQuickRefPage.html中所建议的那样。

我在这里提出的问题是：

由于非线性特性，我必须经常重新填充我的矩阵。因此，我是否应该将所有贡献存储在一个三元组中并mat.setFromTriplets(...)一次又一次地重复使用？
如果我重用 mat.setFromTriplets(...)我是否可以以某种方式利用这样一个事实，即我总是以相同的顺序为组件评估我的元素矩阵，因此我在三元组中的索引永远不会改变，而只会改变值。因此，可以避免“在内存中搜索”，因为我可以将放置它的位置存储在新数组中？
如果mat.coeffRef(i,j)更快，我可以利用上述事实吗？
一个额外的问题：（较低优先级）是否可以有效地存储和组装具有相同稀疏模式的 3 个矩阵，即如果我必须循环执行？例如一个矩阵包装器，其中我有一个 SparseMatrix 来获取矩阵为 M1=mat[0]、M2=mat[1]、M3=mat[2]，其中 mat[i] 返回第一个矩阵和 M1、M2 和M3 例如SparseMatrix<double> M1(1000,1000).-

一般设置如下（对于问题 1.-3。仅出现 M1）：

std::vector< Eigen::Triplet<double> > tripletListA; // triplets differ only in the values and not in the indices
std::vector< Eigen::Triplet<double> > tripletListB;
std::vector< Eigen::Triplet<double> > tripletListC;

SparseMatrix<double> M1(1000,1000); 
SparseMatrix<double> M2(1000,1000);
SparseMatrix<double> M3(1000,1000);

//Reserve space in triplets
tripletListA.reserve(nnz);
tripletListB.reserve(nnz);
tripletListC.reserve(nnz);



//Reserve space in matrices
M1.reserve(nnz);
M2.reserve(nnz);
M3.reserve(nnz);

//fill triplet list with zeros

M1.setFromTriplets(tripletListA.begin(), tripletListA.end());
M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
M3.setFromTriplets(tripletListC.begin(), tripletListC.end());
for (int i=0; i<1000; i++) {

   //Fill triplets

   M1.setFromTriplets(tripletListA.begin(), tripletListA.end()); //or use coeffRef?
   M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
   M3.setFromTriplets(tripletListC.begin(), tripletListC.end());

//solve
//update
}

谢谢你和问候，亚历克斯

更新：

谢谢您的回答。最初，我访问非零的顺序是非常随意的。但是由于我对迭代方案感兴趣，所以我考虑记录这种随机排序并构建一个处理此问题的运算符。这个运算符可以从最初构造的三元组构造（至少在我看来）。

SparseMatrix<double> mat(rows,cols);
std::vector<double> valuevector(nnz);
//Initially construction 
std::vector< Eigen::Triplet<double> > tripletList;

//naive fill of tripletList

//Sorting of entries and identifying double entries in tripletList from col and row values
//generating from this information operator P

for (int i=0; i<1000; i++) 
{
  //naive refill of tripletList

  valuevector= P*tripletList.value(); //constructing vector in efficient ordering from values of triplets (tripletList.value() call does not makes since for std::vector but i hope it is clear what i have in mind

  for (int k=0; k<mat.outerSize(); ++k)
    for (SparseMatrix<double>::InnerIterator it(mat,k); it; ++it)
          it.valueRef() =valuevector(it);
}

我将运算符P视为在适当位置具有 1 和 0 的矩阵。

问题仍然存在，这是否是一个更有效的程序？

UPDATE-2：基准：

我试图在代码片段中构建我的想法。我首先生成一个随机三元组列表。该列表被构造为获得 95% 的稀疏度，此外，列表中的一些值被复制以模仿三元组列表中的重复项，这些重复项写入稀疏矩阵中的相同位置。然后根据不同的概念插入这些值。第一个是setfromtriplet方法，第二个和第三个尝试利用已知的结构。

第二种和第三种方法记录了三元组列表的排序。然后利用此信息直接将值写入纯mat1.coeffs()向量中。

#include <iostream>
#include <Eigen/Sparse>
#include <random>
#include <fstream>
#include <chrono>

using namespace std::chrono;
using namespace Eigen;
using namespace std;

typedef Eigen::Triplet<double> T;


void findDuplicates(vector<pair<int, int> > &dummypair, Ref<VectorXi> multiplicity) {
  // Iterate over the vector and store the frequency of each element in map
  int pairCount = 0;
  pair<int, int> currentPair;
  for (int i = 0; i < multiplicity.size(); ++i) {
    currentPair = dummypair[pairCount];
    while (currentPair == dummypair[pairCount + multiplicity[i]]) {
      multiplicity[i]++;
    }
    pairCount += multiplicity[i];
  }
}

typedef Matrix<duration<double, std::milli>, Dynamic, Dynamic> MatrixXtime;

int main() {


  //init random generators
  std::default_random_engine gen;
  std::uniform_real_distribution<double> dist(0.0, 1.0);

  int sizesForTest = 5;
  int measures = 6;
  MatrixXtime timeArray(sizesForTest, measures);
  cout << "TripletTime NestetTime LNestedTime " << endl;
  for (int m = 0; m < sizesForTest; ++m) {


    int rows = pow(10, m + 1);
    int cols = rows;
    std::uniform_int_distribution<int> distentryrow(0, rows - 1);
    std::uniform_int_distribution<int> distentrycol(0, cols - 1);

    std::vector<T> tripletList;
    SparseMatrix<double> mat1(rows, cols);
//  SparseMatrix<double> mat2(rows,cols);
//  SparseMatrix<double> mat3(rows,cols);

    //generate sparsity pattern of matrix with  10% fill-in
    tripletList.emplace_back(3, 0, 15);
    for (int i = 0; i < rows; ++i)
      for (int j = 0; j < cols; ++j) {
        auto value = dist(gen);                         //generate random number
        auto value2 = dist(gen);                         //generate random number
        auto value3 = dist(gen);                         //generate random number
        if (value < 0.05) {
          auto rowindex = distentryrow(gen);
          auto colindex = distentrycol(gen);
          tripletList.emplace_back(rowindex, colindex, value);      //if larger than treshold, insert it

          //dublicate every third entry to mimic entries which appear more then once
          if (value2 < 0.3333333333333333333333)
            tripletList.emplace_back(rowindex, colindex, value);

          //triple every forth entry to mimic entries which appear more then once
          if (value3 < 0.25)
            tripletList.emplace_back(rowindex, colindex, value);
        }
      }
    tripletList.emplace_back(3, 0, 9);

    int numberOfValues = tripletList.size();

    //initially set all matrices from triplet to allocate space and sparsity pattern
    mat1.setFromTriplets(tripletList.begin(), tripletList.end());
//  mat2.setFromTriplets(tripletList.begin(), tripletList.end());
//  mat3.setFromTriplets(tripletList.begin(), tripletList.end());

    int nnz = mat1.nonZeros();
    //reset all entries back to zero to fill in later
    mat1.coeffs().setZero();
//  mat2.coeffs().setZero();
//  mat3.coeffs().setZero();

    //document sorting of entries for repetative insertion
    VectorXi internalIndex(numberOfValues);
    vector<pair<int, int> > dummypair(numberOfValues);

    VectorXd valuelist(numberOfValues);
    for (int l = 0; l < numberOfValues; ++l) {
      valuelist(l) = tripletList[l].value();
    }

    //init internalindex and dummy pair
    internalIndex = Eigen::VectorXi::LinSpaced(numberOfValues, 0.0, numberOfValues - 1);
    for (int i = 0; i < numberOfValues; ++i) {

      dummypair[i].first = tripletList[i].col();
      dummypair[i].second = tripletList[i].row();
    }

    auto start = high_resolution_clock::now();


// sort the vector  internalIndex based on the dummypair
    sort(internalIndex.begin(), internalIndex.end(), [&](int i, int j) {
        return dummypair[i].first < dummypair[j].first ||
               (dummypair[i].first == dummypair[j].first && dummypair[i].second < dummypair[j].second);
    });

    auto stop = high_resolution_clock::now();
    timeArray(m, 3) = (stop - start) / 1000;


    start = high_resolution_clock::now();
    sort(dummypair.begin(), dummypair.end());
    stop = high_resolution_clock::now();
    timeArray(m, 4) = (stop - start) / 1000;


    start = high_resolution_clock::now();
    VectorXi dublicatecount(nnz);
    dublicatecount.setOnes();
    findDuplicates(dummypair, dublicatecount);
    stop = high_resolution_clock::now();
    timeArray(m, 5) = (stop - start) / 1000;

    dummypair.clear();




    //calculate vector containing all indices of triplet
    //therefore vector[k] is the vectorXi containing the entries of triples which should be written at dof k
    int indextriplet = 0;
    int multiplicity = 0;

    vector<VectorXi> listofentires(mat1.nonZeros());
    for (int k = 0; k < mat1.nonZeros(); ++k) {
      multiplicity = dublicatecount[k];
      listofentires[k] = internalIndex.segment(indextriplet, multiplicity);
      indextriplet += multiplicity;
    }


    //========================================
    //Here the nonlinear analysis should start and everything beforehand is prepocessing

    //Test1 from triplets
    start = high_resolution_clock::now();

    mat1.setFromTriplets(tripletList.begin(), tripletList.end());

    stop = high_resolution_clock::now();
    timeArray(m, 0) = (stop - start) / 1000;

    mat1.coeffs().setZero();


    //Test2 use internalIndex but calculate listofentires on the fly
    indextriplet = 0;
    start = high_resolution_clock::now();

    for (int k = 0; k < mat1.nonZeros(); ++k) {
      multiplicity = dublicatecount[k];
      mat1.coeffs()[k] += valuelist(internalIndex.segment(indextriplet, multiplicity)).sum();
      indextriplet += multiplicity;
    }

    stop = high_resolution_clock::now();
    timeArray(m, 1) = (stop - start) / 1000;
    mat1.coeffs().setZero();

    //Test3 directly use listofentires
    start = high_resolution_clock::now();
    for (int k = 0; k < mat1.nonZeros(); ++k)
      mat1.coeffs()[k] += valuelist(listofentires[k]).sum();

    stop = high_resolution_clock::now();
    timeArray(m, 2) = (stop - start) / 1000;


    std::ofstream file("test.txt");
    if (file.is_open()) {
      file << mat1 << '\n';
    }
    cout << "Size: " << rows << ": ";
    for (int n = 0; n < measures; ++n)
      cout << timeArray(m, n).count() << " ";
    cout << endl;
  }

  return 0;
}

如果我在 i5-6600K 3.5Ghz 和 16GB 内存上运行此示例，我最终会得到以下结果。这是以秒为单位的时间。

  Size Triplet   Nested LessNested  Sort_intIndex Sort_dum_pair findDuplica
    10   1e-06    1e-06      2e-06          1e-06         1e-06       1e-06 
   100 2.8e-05    4e-06    1.4e-05          5e-05       4.2e-05       1e-05 
  1000   0.003 0.000416   0.001489        0.01012       0.00627    0.000635 
 10000   0.426 0.093911    0.48912         1.5389      0.780676    0.061881 
100000 337.799  99.0801    37.3656        292.397       87.4488     0.79996

前三列表示不同方法的计算时间，第 4 到第 6 列表示不同预处理步骤的时间。

对于 100000 行和列的大小，我的 Ram 相对较快地变满，因此应小心处理最后一个表条目。这里最快的方法从 2 变为 3。

我的问题是这种方法是否朝着提高效率的正确方向发展？这是一个完全错误的方向吗，因为例如对于 10000 大小的情况，0.48 秒的组装时间似乎有点高？

此外，预处理步骤变得非常昂贵，是否有更好的方法来构建矩阵的排序？最后一个问题是以正确的方式进行基准测试吗？

谢谢你的时间，亚历克斯

c++ - 在非线性有限元代码中有效地使用 Eigen 进行重复稀疏矩阵组装

0 回答 0

Related

Reference