我正在用 C++ 编写处理包含百万节点信息的大量流数据。我使用向量来存储每个节点的名称和索引映射。
现在的问题是 vector 占用的内存比预期的要多得多,而且它们的破坏是无法解释的。
假设某个文件包含 100 万行,每行超过 50 个字符。将它们读入两次,然后检查进程的内存使用情况和向量的估计内存使用情况。它们在 60 MB 上有所不同。这只是我遇到的更大问题的一个小预测,它可能在 GB 规模上有所不同。
我在 Windows7 SP1 Ultimate 64bit 上使用 VS2010 SP1 使用 x86 设置编译程序。
#include <iostream>
#include <string>
#include <vector>
#include <map>
#include <fstream>
#include <Windows.h>
#include <Psapi.h>
using namespace std;
//#define COUNT 500000
int COUNT = 0;
vector<string> namesVector;
map<string,int> namesMap;
void ProcessStatistics()
{
PROCESS_MEMORY_COUNTERS memCounter;
GetProcessMemoryInfo(GetCurrentProcess(),&memCounter,sizeof(memCounter));
cout<<"Mem Usage by Process: "<<memCounter.WorkingSetSize * 1.0e-6f<<" MB."<<endl;
}
void VectorMemUsage()
{
COUNT = namesVector.size();
int overhead = StringOverhead();
double mem = 0;
mem += sizeof(vector<string>);
mem += overhead*COUNT;
for(int i=0; i<COUNT; i++)
{
mem += namesVector[i].capacity();
}
cout<<"Calculated String Vector Usage: "<<mem * 1.0e-6f<<" MB of "<<COUNT<<" strings."<<endl;
}
int StringOverhead()
{
int overhead = sizeof(string);
cout<<"String overhead: "<<overhead<<" Bytes."<<endl;
return overhead;
}
void main(){
const std::string infile = "somefile";
ifstream infstream(infile);
string s;
while(getline(infstream,s) != NULL)
{
namesVector.push_back(s);
//namesMap.insert(pair<string,int>(s,namesVector.size()));
}
infstream.clear();
infstream.seekg(0,ios::beg);
while(getline(infstream,s) != NULL)
{
namesVector.push_back(s);
//namesMap.insert(pair<string,int>(s,namesVector.size()));
}
//Check process and vector memory usage:
ProcessStatistics();
VectorMemUsage();
System("pause");
//Release the vector.
cout<<"Now releasing the memory..."<<endl;
//vector<string>(namesVector).swap(namesVector);
//vector<string>().swap(namesVector); //Deallocate Vector
//map<string,int>().swap(namesMap); //Deallocate Map
cout<<"Capacity of vector "<<namesVector.capacity()<<endl;
ProcessStatistics();
}
x86 版本的程序输出如下:
Mem Usage by Process: 336.523 MB.
String overhead: 28 Bytes.
Calculated String Vector Usage: 301.599 MB of 3385108 strings.
Press any key to continue . . .
Now releasing the memory...
Mem Usage by Process: 7.64314 MB.
当我在向量上调用 namesVector.shrink_to_fit() 或 vector(namesVector).swap(namesVector) 成语时,向量容量确实减少了,但是内存使用率很高,有人知道解决这个问题吗?交换技巧应该是指针交换吗?为什么它会涉及内存复制和所有并导致这种情况?
Mem Usage by Process: 336.536 MB.
String overhead: 28 Bytes.
Calculated String Usage: 301.599 MB of 3385108 strings.
Vector Capacity is 3543306.
Calculated String Vector Usage: 315.693 MB of 3385108 strings.
Now releasing the memory...
Capacity of vector 3385108
Mem Usage by Process: 434.5 MB.
当我为字符串索引添加映射时,发生了意外行为。当我同时调用vector().swap(namesVector)和map().swap(namesMap)时,结果是这样的,这很好,因为内存被释放了。
Mem Usage by Process: 534.778 MB.
String overhead: 28 Bytes.
Calculated String Usage: 301.599 MB of 3385108 strings.
Vector Capacity is 3543306.
Calculated String Vector Usage: 315.693 MB of 3385108 strings.
Press any key to continue . . .
Now releasing the memory...
Capacity of vector 0
Mem Usage by Process: 8.2903 MB.
但是当我只调用vector().swap(namesVector)时,内存被部分释放了。部分我的意思是它释放的比上面的结果少,大约 336 MB。
Mem Usage by Process: **534.77** MB.
String overhead: 28 Bytes.
Calculated String Usage: 301.599 MB of 3385108 strings.
Vector Capacity is 3543306.
Calculated String Vector Usage: 315.693 MB of 3385108 strings.
Press any key to continue . . .
Now releasing the memory...
Capacity of vector 0
Mem Usage by Process: **440.459** MB.
或者map().swap(namesMap),内存几乎完全没有释放。
Mem Usage by Process: **534.774** MB.
String overhead: 28 Bytes.
Calculated String Usage: 301.599 MB of 3385108 strings.
Vector Capacity is 3543306.
Calculated String Vector Usage: 315.693 MB of 3385108 strings.
Press any key to continue . . .
Now releasing the memory...
Capacity of vector 3543306
Mem Usage by Process: **535.441** MB.
我无法解释发生了什么。有人知道这里发生了什么吗?
谢谢您的帮助。
最好的。