c++ - RapidXML 从文件中读取 - 这里有什么问题？

Question

这两种读取输入文件的方法有什么区别？

1）使用'ifstream.get()'

和

2）使用vector<char>with ifstreambuf_iterator<char> （我不太了解！）

（除了使用漂亮的向量方法的明显答案）

输入文件是 XML，如下所示，立即解析为 rapidxml 文档。（在别处初始化，参见示例主函数。）

首先，让我向您展示两种编写'load_config'函数的方法，一种使用ifstream.get()，一种使用vector<char>

方法 1ifstream.get()提供了工作代码，以及一个安全的 rapidXML 文档对象：

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   //read in config file
   char ch;
   char buffer[65536];
   size_t chars_read = 0;

   while(myfile.get(ch) && (chars_read < 65535)){
      buffer[chars_read++] = ch;
   }
   buffer[chars_read++] = '\0';

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(buffer);

   //debug returns as expected here
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

方法 2 导致另一个库破坏了 rapidXML 文档 - 具体来说，调用 curl_global_init(CURL_GLOBAL_SSL) [见下面的主要代码] - 但我还没有将其归咎于 curl_global_init。

rapidxml::xml_document<> *load_config(rapidxml::xml_document<> *doc){
   ifstream myfile("inputfile");

   vector<char> buffer((istreambuf_iterator<char>(inputfile)), 
                istreambuf_iterator<char>( ));
   buffer.push_back('\0');

   cout<<"file looks like:"<<endl;  //looks fine
   cout<<&buffer[0]<<endl;

   cout<<"clearing old doc"<<endl;
   doc->clear();

   doc->parse<0>(&buffer[0]);

   //debug prints as expected
   cout << "load_config: Name of my first node is: " << doc->first_node()->name() << "\n";

   return doc;
}

主要代码：

int main(void){
   rapidxml::xml_document *doc;
   doc = new rapidxml::xml_document;

   load_config(doc);

   // this works fine:
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n"; 

   curl_global_init(CURL_GLOBAL_SSL);  //Docs say do this first.

   // debug broken object instance:
   // note a trashed 'doc' here if using vector<char> method 
   //  - seems to be because of above line... name is NULL 
   //    and other nodes are now NULL
   //    causing segfaults down stream.
   cout << "Name of my first node is: " << doc->first_node()->name() << "\n";

我非常确定这一切都是在一个线程中执行的，但也许有一些事情超出了我的理解。

我也担心我只是解决了一个症状，而不是一个原因......通过简单地改变我的文件加载功能。在这里向社区寻求帮助！

问题：为什么从向量移到字符数组可以解决这个问题？

提示：我知道 rapidXML 使用了一些巧妙的内存管理，实际上直接访问输入字符串。

提示：上面的 main 函数创建了一个动态的（新的）xml_document。这不在原始代码中，而是调试更改的工件。原始（失败）代码声明了它并且没有动态分配它，但发生了相同的问题。

另一个全面披露的提示（尽管我不明白它为什么重要） - 在这个乱七八糟的代码中还有另一个向量实例，它由 rapidxml::xml_document 对象中的数据填充。

score 5 · Accepted Answer

两者之间的唯一区别是vector版本正常工作，char当文件长度超过 65535 个字符时，数组版本会导致未定义的行为（它将写入第 65535\0或第 65536 位置，这是越界的）。

两个版本共有的另一个问题是，您将文件读入内存比xml_document. 阅读文档：

该字符串必须在文档的生命周期内持续存在。

当load_config退出时vector被销毁并释放内存。尝试访问文档会导致读取无效内存（未定义行为）。

在char数组版本中，内存是在堆栈上分配的。它在存在时仍然被“释放” load_config（访问它会导致未定义的行为）。但是您看不到崩溃，因为它尚未被覆盖。

c++ - RapidXML 从文件中读取 - 这里有什么问题？

1 回答 1

Related

Reference