c++ - 从大文本文件读取到 Qt 中的结构数组？

Question

我必须将一个文本文件读入一个结构数组。我已经编写了一个程序，但是由于文件中有大约 13 个 lac 结构，所以花费了太多时间。请建议我用 C++ 做到这一点的最好和最快的方法。

这是我的代码：

std::ifstream input_counter("D:\\cont.txt");

/**********************************************************/
int counter = 0;
while( getline(input_counter,line) )
{
    ReadCont( line,&contract[counter]); // function to read data to structure
    counter++;
    line.clear();
}
input_counter.close();

score 1 · Accepted Answer

使您的“解析”尽可能简单：例如，您知道该领域的格式应用知识

ReadCont("|PE|1|0|0|0|0|1|1||2|0||2|0||3|0|....", ...)

应该将快速字符应用于整数转换，例如

ReadCont(const char *line, Contract &c) {
   if (line[1] == 'P' && line[2] == 'E' && line[3] == '|') {
     line += 4;
     for (int field = 0; field < K_FIELDS_PE; ++field) {
       c.int_field[field] = *line++ - '0';
       assert(*line == '|');
       ++line;
     }
   }

好吧，注意细节，但你明白了......

score 1 · Accepted Answer

在这种情况下，我会完全使用 Qt。

struct MyStruct {
    int Col1;
    int Col2;
    int Col3;
    int Col4;
    // blabla ...
};

QByteArray Data;
QFile f("D:\\cont.txt");
if (f.open(QIODevice::ReadOnly)) {
    Data = f.readAll();
    f.close();
}

MyStruct* DataPointer = reinterpret_cast<MyStruct*>(Data.data());
// Accessing data
DataPointer[0] = ...
DataPointer[1] = ...

现在您有了数据，您可以将其作为数组访问。

如果您的数据不是二进制数据并且您必须首先解析它，您将需要一个转换例程。例如，如果您读取 4 列的 csv 文件：

QVector<MyStruct> MyArray;
QString StringData(Data);
QStringList Lines = StringData.split("\n"); // or whatever new line character is
for (int i = 0; i < Lines.count(); i++) {
    String Line = Lines.at(i);
    QStringList Parts = Line.split("\t"); // or whatever separator character is
    if (Parts.count() >= 4) {
        MyStruct t;
        t.Col1 = Parts.at(0).toInt();
        t.Col2 = Parts.at(1).toInt();
        t.Col3 = Parts.at(2).toInt();
        t.Col4 = Parts.at(3).toInt();
        MyArray.append(t);
    } else { 
        // Malformed input, do something
    }
}

现在您的数据被解析并在MyArray向量中。

score 1 · Accepted Answer

正如 user2617519 所说，这可以通过多线程来加快速度。我看到您正在阅读每一行并对其进行解析。将这些行放入队列中。然后让不同的线程将它们从队列中弹出并将数据解析为结构。
一种更简单的方法（没有多线程的复杂性）是将输入数据文件拆分为多个文件并运行相同数量的进程来解析它们。然后可以稍后合并数据。

score 1 · Accepted Answer

QFile::readAll()可能会导致内存问题并且std::getline()速度很慢（原样::fgets()）。

我遇到了类似的问题，我需要在QTableView. 使用自定义模型，我解析文件以找到每行开头的偏移量。然后，当需要在表中显示数据时，我会读取该行并按需解析它。这会导致大量解析，但实际上速度足够快，不会注意到滚动或更新速度的任何滞后。

它还具有低内存使用率的额外好处，因为我不会将文件内容读入内存。使用这种策略，几乎任何大小的文件都是可能的。

解析代码：

m_fp = ::fopen(path.c_str(), "rb"); // open in binary mode for faster parsing
if (m_fp != NULL)
{
  // read the file to get the row pointers
  char buf[BUF_SIZE+1];

  long pos = 0;
  m_data.push_back(RowData(pos));
  int nr = 0;
  while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
  {
    buf[nr] = 0; // null-terminate the last line of data
    // find new lines in the buffer
    char *c = buf;
    while ((c = ::strchr(c, '\n')) != NULL)
    {
      m_data.push_back(RowData(pos + c-buf+1));
      c++;
    }
    pos += nr;
  }

  // squeeze any extra memory not needed in the collection
  m_data.squeeze();
}

RowData并且m_data特定于我的实现，但它们仅用于缓存有关文件中一行的信息（例如文件位置和列数）。

我采用的另一种性能策略是用于QByteArray解析每一行，而不是QString. 除非您需要 unicode 数据，否则这将节省时间和内存：

// optimized line reading procedure
QByteArray str;
char buf[BUF_SIZE+1];
::fseek(m_fp, rd.offset, SEEK_SET);
int nr = 0;
while ((nr = ::fread(buf, 1, BUF_SIZE, m_fp)))
{
  buf[nr] = 0; // null-terminate the string
  // find new lines in the buffer
  char *c = ::strchr(buf, '\n');
  if (c != NULL)
  {
    *c = 0;
    str += buf;
    break;
  }
  str += buf;
}

return str.split(',');

如果您需要使用字符串而不是单个字符来分割每一行，请使用::strtok().

c++ - 从大文本文件读取到 Qt 中的结构数组？

4 回答 4

Related

Reference