我正在尝试使用 Eigen 有效地组装用于非线性有限元计算的刚度矩阵。
从我的有限元离散化中,我可以准确地提取我的稀疏模式。因此我可以使用:
mat.reserve(nnz);
mat.setFromTriplets(TripletList.begin(), TripletList.end());
正如http://eigen.tuxfamily.org/dox/group__SparseQuickRefPage.html中所建议的那样。
我在这里提出的问题是:
由于非线性特性,我必须经常重新填充我的矩阵。因此,我是否应该将所有贡献存储在一个三元组中并
mat.setFromTriplets(...)
一次又一次地重复使用?如果我重用
mat.setFromTriplets(...)
我是否可以以某种方式利用这样一个事实,即我总是以相同的顺序为组件评估我的元素矩阵,因此我在三元组中的索引永远不会改变,而只会改变值。因此,可以避免“在内存中搜索”,因为我可以将放置它的位置存储在新数组中?如果
mat.coeffRef(i,j)
更快,我可以利用上述事实吗?一个额外的问题:(较低优先级)是否可以有效地存储和组装具有相同稀疏模式的 3 个矩阵,即如果我必须循环执行?例如一个矩阵包装器,其中我有一个 SparseMatrix 来获取矩阵为 M1=mat[0]、M2=mat[1]、M3=mat[2],其中 mat[i] 返回第一个矩阵和 M1、M2 和M3 例如
SparseMatrix<double> M1(1000,1000)
.-
一般设置如下(对于问题 1.-3。仅出现 M1):
std::vector< Eigen::Triplet<double> > tripletListA; // triplets differ only in the values and not in the indices
std::vector< Eigen::Triplet<double> > tripletListB;
std::vector< Eigen::Triplet<double> > tripletListC;
SparseMatrix<double> M1(1000,1000);
SparseMatrix<double> M2(1000,1000);
SparseMatrix<double> M3(1000,1000);
//Reserve space in triplets
tripletListA.reserve(nnz);
tripletListB.reserve(nnz);
tripletListC.reserve(nnz);
//Reserve space in matrices
M1.reserve(nnz);
M2.reserve(nnz);
M3.reserve(nnz);
//fill triplet list with zeros
M1.setFromTriplets(tripletListA.begin(), tripletListA.end());
M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
M3.setFromTriplets(tripletListC.begin(), tripletListC.end());
for (int i=0; i<1000; i++) {
//Fill triplets
M1.setFromTriplets(tripletListA.begin(), tripletListA.end()); //or use coeffRef?
M2.setFromTriplets(tripletListB.begin(), tripletListB.end());
M3.setFromTriplets(tripletListC.begin(), tripletListC.end());
//solve
//update
}
谢谢你和问候,亚历克斯
更新:
谢谢您的回答。最初,我访问非零的顺序是非常随意的。但是由于我对迭代方案感兴趣,所以我考虑记录这种随机排序并构建一个处理此问题的运算符。这个运算符可以从最初构造的三元组构造(至少在我看来)。
SparseMatrix<double> mat(rows,cols);
std::vector<double> valuevector(nnz);
//Initially construction
std::vector< Eigen::Triplet<double> > tripletList;
//naive fill of tripletList
//Sorting of entries and identifying double entries in tripletList from col and row values
//generating from this information operator P
for (int i=0; i<1000; i++)
{
//naive refill of tripletList
valuevector= P*tripletList.value(); //constructing vector in efficient ordering from values of triplets (tripletList.value() call does not makes since for std::vector but i hope it is clear what i have in mind
for (int k=0; k<mat.outerSize(); ++k)
for (SparseMatrix<double>::InnerIterator it(mat,k); it; ++it)
it.valueRef() =valuevector(it);
}
我将运算符P
视为在适当位置具有 1 和 0 的矩阵。
问题仍然存在,这是否是一个更有效的程序?
UPDATE-2:基准:
我试图在代码片段中构建我的想法。我首先生成一个随机三元组列表。该列表被构造为获得 95% 的稀疏度,此外,列表中的一些值被复制以模仿三元组列表中的重复项,这些重复项写入稀疏矩阵中的相同位置。然后根据不同的概念插入这些值。第一个是setfromtriplet
方法,第二个和第三个尝试利用已知的结构。
第二种和第三种方法记录了三元组列表的排序。然后利用此信息直接将值写入纯mat1.coeffs()
向量中。
#include <iostream>
#include <Eigen/Sparse>
#include <random>
#include <fstream>
#include <chrono>
using namespace std::chrono;
using namespace Eigen;
using namespace std;
typedef Eigen::Triplet<double> T;
void findDuplicates(vector<pair<int, int> > &dummypair, Ref<VectorXi> multiplicity) {
// Iterate over the vector and store the frequency of each element in map
int pairCount = 0;
pair<int, int> currentPair;
for (int i = 0; i < multiplicity.size(); ++i) {
currentPair = dummypair[pairCount];
while (currentPair == dummypair[pairCount + multiplicity[i]]) {
multiplicity[i]++;
}
pairCount += multiplicity[i];
}
}
typedef Matrix<duration<double, std::milli>, Dynamic, Dynamic> MatrixXtime;
int main() {
//init random generators
std::default_random_engine gen;
std::uniform_real_distribution<double> dist(0.0, 1.0);
int sizesForTest = 5;
int measures = 6;
MatrixXtime timeArray(sizesForTest, measures);
cout << "TripletTime NestetTime LNestedTime " << endl;
for (int m = 0; m < sizesForTest; ++m) {
int rows = pow(10, m + 1);
int cols = rows;
std::uniform_int_distribution<int> distentryrow(0, rows - 1);
std::uniform_int_distribution<int> distentrycol(0, cols - 1);
std::vector<T> tripletList;
SparseMatrix<double> mat1(rows, cols);
// SparseMatrix<double> mat2(rows,cols);
// SparseMatrix<double> mat3(rows,cols);
//generate sparsity pattern of matrix with 10% fill-in
tripletList.emplace_back(3, 0, 15);
for (int i = 0; i < rows; ++i)
for (int j = 0; j < cols; ++j) {
auto value = dist(gen); //generate random number
auto value2 = dist(gen); //generate random number
auto value3 = dist(gen); //generate random number
if (value < 0.05) {
auto rowindex = distentryrow(gen);
auto colindex = distentrycol(gen);
tripletList.emplace_back(rowindex, colindex, value); //if larger than treshold, insert it
//dublicate every third entry to mimic entries which appear more then once
if (value2 < 0.3333333333333333333333)
tripletList.emplace_back(rowindex, colindex, value);
//triple every forth entry to mimic entries which appear more then once
if (value3 < 0.25)
tripletList.emplace_back(rowindex, colindex, value);
}
}
tripletList.emplace_back(3, 0, 9);
int numberOfValues = tripletList.size();
//initially set all matrices from triplet to allocate space and sparsity pattern
mat1.setFromTriplets(tripletList.begin(), tripletList.end());
// mat2.setFromTriplets(tripletList.begin(), tripletList.end());
// mat3.setFromTriplets(tripletList.begin(), tripletList.end());
int nnz = mat1.nonZeros();
//reset all entries back to zero to fill in later
mat1.coeffs().setZero();
// mat2.coeffs().setZero();
// mat3.coeffs().setZero();
//document sorting of entries for repetative insertion
VectorXi internalIndex(numberOfValues);
vector<pair<int, int> > dummypair(numberOfValues);
VectorXd valuelist(numberOfValues);
for (int l = 0; l < numberOfValues; ++l) {
valuelist(l) = tripletList[l].value();
}
//init internalindex and dummy pair
internalIndex = Eigen::VectorXi::LinSpaced(numberOfValues, 0.0, numberOfValues - 1);
for (int i = 0; i < numberOfValues; ++i) {
dummypair[i].first = tripletList[i].col();
dummypair[i].second = tripletList[i].row();
}
auto start = high_resolution_clock::now();
// sort the vector internalIndex based on the dummypair
sort(internalIndex.begin(), internalIndex.end(), [&](int i, int j) {
return dummypair[i].first < dummypair[j].first ||
(dummypair[i].first == dummypair[j].first && dummypair[i].second < dummypair[j].second);
});
auto stop = high_resolution_clock::now();
timeArray(m, 3) = (stop - start) / 1000;
start = high_resolution_clock::now();
sort(dummypair.begin(), dummypair.end());
stop = high_resolution_clock::now();
timeArray(m, 4) = (stop - start) / 1000;
start = high_resolution_clock::now();
VectorXi dublicatecount(nnz);
dublicatecount.setOnes();
findDuplicates(dummypair, dublicatecount);
stop = high_resolution_clock::now();
timeArray(m, 5) = (stop - start) / 1000;
dummypair.clear();
//calculate vector containing all indices of triplet
//therefore vector[k] is the vectorXi containing the entries of triples which should be written at dof k
int indextriplet = 0;
int multiplicity = 0;
vector<VectorXi> listofentires(mat1.nonZeros());
for (int k = 0; k < mat1.nonZeros(); ++k) {
multiplicity = dublicatecount[k];
listofentires[k] = internalIndex.segment(indextriplet, multiplicity);
indextriplet += multiplicity;
}
//========================================
//Here the nonlinear analysis should start and everything beforehand is prepocessing
//Test1 from triplets
start = high_resolution_clock::now();
mat1.setFromTriplets(tripletList.begin(), tripletList.end());
stop = high_resolution_clock::now();
timeArray(m, 0) = (stop - start) / 1000;
mat1.coeffs().setZero();
//Test2 use internalIndex but calculate listofentires on the fly
indextriplet = 0;
start = high_resolution_clock::now();
for (int k = 0; k < mat1.nonZeros(); ++k) {
multiplicity = dublicatecount[k];
mat1.coeffs()[k] += valuelist(internalIndex.segment(indextriplet, multiplicity)).sum();
indextriplet += multiplicity;
}
stop = high_resolution_clock::now();
timeArray(m, 1) = (stop - start) / 1000;
mat1.coeffs().setZero();
//Test3 directly use listofentires
start = high_resolution_clock::now();
for (int k = 0; k < mat1.nonZeros(); ++k)
mat1.coeffs()[k] += valuelist(listofentires[k]).sum();
stop = high_resolution_clock::now();
timeArray(m, 2) = (stop - start) / 1000;
std::ofstream file("test.txt");
if (file.is_open()) {
file << mat1 << '\n';
}
cout << "Size: " << rows << ": ";
for (int n = 0; n < measures; ++n)
cout << timeArray(m, n).count() << " ";
cout << endl;
}
return 0;
}
如果我在 i5-6600K 3.5Ghz 和 16GB 内存上运行此示例,我最终会得到以下结果。这是以秒为单位的时间。
Size Triplet Nested LessNested Sort_intIndex Sort_dum_pair findDuplica
10 1e-06 1e-06 2e-06 1e-06 1e-06 1e-06
100 2.8e-05 4e-06 1.4e-05 5e-05 4.2e-05 1e-05
1000 0.003 0.000416 0.001489 0.01012 0.00627 0.000635
10000 0.426 0.093911 0.48912 1.5389 0.780676 0.061881
100000 337.799 99.0801 37.3656 292.397 87.4488 0.79996
前三列表示不同方法的计算时间,第 4 到第 6 列表示不同预处理步骤的时间。
对于 100000 行和列的大小,我的 Ram 相对较快地变满,因此应小心处理最后一个表条目。这里最快的方法从 2 变为 3。
我的问题是这种方法是否朝着提高效率的正确方向发展?这是一个完全错误的方向吗,因为例如对于 10000 大小的情况,0.48 秒的组装时间似乎有点高?
此外,预处理步骤变得非常昂贵,是否有更好的方法来构建矩阵的排序?最后一个问题是以正确的方式进行基准测试吗?
谢谢你的时间,亚历克斯