c++ - 如何读取具有不同数字行数的文件

Question

我正在尝试读取大约 2000 行的数据文件，该文件看起来像

1.1 1.2 1.3 1.4 1.5
1.6     1.7 1.8 1.9 
2.0
2.1 2.2 2.3 2.4 2.5

实际上有一个空白（空白）和 1.3/1.7 在同一列

我将其设置为存储的方式是一个结构向量，其中

struct num
{
    double d1, d2, d3, d4, d5;
};

我想要实现的是

num A;
vector<num> data
for (int i = 0; i < 4; i++)
{
    File >> A.d1 >> A.d2 >> A.d3 >> A.d4 >> A.d5;
    data.push_back(A);
}

并找到逻辑来识别第二行中的空格并存储 d1=1.6, d2=0, d3=1.7 等。第三行是 d1=2.0 和 d2,d3,d4,d5=0我只是对如何测试/获取实现这一点的逻辑感到困惑，如果可能的话，我在 C++ VS2010 中查看第一个答案后认为我应该提供更多信息，文件中的每一行都属于一个卫星，每个数字代表对特定波长的观察，所以如果它是空白的，则意味着它没有对该波长的观察。

因此，详细说明，第一行代表卫星 1 对所有 5 个波长都有观测，第 2 行代表卫星 2，对波长 1、3、4、5 有观测，而对波长 4 没有观测。

这就是为什么我试图将它分解为每一行作为一个单独的结构，因为每一行都是一个单独的卫星

score 2 · Accepted Answer

观察你的数据：

每个数据点都以以下模式存储：数据、空间。
如果数据点不存在，则用空格表示，除非它是所有其他输出被截断为换行符的最后一个不存在的数据点。

这就是我想出的：

#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <iomanip>
#include <cctype>
using namespace std;

//note all the lines are stored WITH newlines at the end of them.
//This is merely an artifact of the methodology I am using,
//as the newline is a flag that truncates output (as per your problem)
vector<string> preparse_input(const std::string& filename) {
    vector<string> lines;

    ifstream ifile;

    ifile.open(filename.c_str(), ios::in);
    if (!ifile.is_open()) {
        exit(1);
    }

    string temp, chars, line;
    char ch;

    while(getline(ifile, temp)) {
        temp += "\n";//getline removes the newline: because we need it, reinsert it
        istringstream iss(temp);

        //first read in the line char by char
        while(iss >> noskipws >> ch) {
            chars += ch;
        }

        bool replaced_newline = false;
        int nargs = 0;

        //I could have used iterators here, but IMO, this way is easier to read. Modify if need be.
        for (int i = 0; i < chars.size(); ++i) {
            if (isdigit(chars[i]) && chars[i+1] == ' ') {
                nargs += 1;
            }
            else if(isspace(chars[i]) && isspace(chars[i+1])) {
                if (chars[i+1] == '\n') {
                    replaced_newline = true;
                }
                //this means that there is no value set
                //hence, set the value to 0 for the value part:
                chars[i+1] = '0';
                line += chars[i];
                ++i;//now, skip to the next character since 1 is for spacing, the other is for the value
                nargs += 1;
            }

            //now rebuild the line:
            line += chars[i];

            if(isdigit(chars[i]) && chars[i+1] == '\n') {
                nargs += 1;
                //check nargs:
                for (int i = nargs; i < 5; ++i) {
                    line += " 0";
                    nargs += 1;
                }
            }

            if (replaced_newline) {
                line += '\n';
            }
            replaced_newline = false;
        }

        lines.push_back(line);
        chars.clear();
        line.clear();
    }
    ifile.close();

    return lines;
}

//this way, it's much easier to adapt to any type of input that you may have
template <typename T>
vector< vector<T> > parse_input (const vector<string>& lines) {
    vector< vector<T> > values;
    T val = 0;

    for(vector<string>::const_iterator it = lines.begin(); it != lines.end(); ++it) {
        vector<T> line;
        istringstream iss(*it);
        string temp;

        while(getline(iss, temp, ' ')) {
            if (istringstream(temp) >> val) {
                line.push_back(val);
            }
            else {
                line.push_back(0);//this is the value that badly parsed values will be set to.
                            //you have the option of setting it to some sentinel value, say -1, so you can go back and correct it later on, if need be. Depending on how you want to treat this error - hard or soft (stop program execution vs adapt and continue parsing), then you can adapt it accordingly
                            //I opted to treat it as a soft error but without a sentinel value - so I set it to 0 (-1 as that is probably more applicable in a general case), and informed the user that an error occurred
                            //The flipside of that is that I could have treated this as a hard error and have `exit(2)` (or whatever error code you wish to set).
                cerr << "There was a problem storing:\"" << temp << "\"\n";
            }
        }
        values.push_back(line);
    }
    return values;
}

int main() {
    string filename = "data.dat";
    vector<string> lines = preparse_input(filename);

    vector < vector<double> > values = parse_input<double>(lines);

    for (int i = 0; i < values.size(); ++i) {
        for (int j = 0; j < values[i].size(); ++j) {
            cout << values[i][j] << " ";
        }
        cout << endl;
    }

    return 0;
}

总而言之，我通过逐个字符读取每一行来分解字符串，然后通过替换空格来重建每一行，0以便于解析。为什么？因为没有这样的值，就无法判断哪个参数被存储或跳过（使用默认ifstream_object >> type方法）。

这样，如果我然后使用stringstream对象来解析输入，我可以正确地确定设置了哪个参数，或者没有设置；然后，存储结果，一切都很好。这就是你想要的。

并且，在以下数据上使用它：

1.1 1.2 1.3 1.4 1.5
1.6   1.7 1.8 1.9
2.0        
2.0
2.1 2.2 2.3 2.4 2.5
2.1     2.4

给你输出：

1.1 1.2 1.3 1.4 1.5
1.6 0 1.7 1.8 1.9
2 0 0 0 0
2 0 0 0 0
2.1 2.2 2.3 2.4 2.5
2.1 0 0 2.4 0

注意：第 3 行有 8 个空格（1 个无数据，1 个空格）。第 4 行是来自原始数据的行。第 6 行包含 5 个空格（遵循引用的模式）。

最后，让我说，这是迄今为止我遇到过的最疯狂的数据存储方法之一。

score 1 · Accepted Answer

鉴于您的文件格式是空格分隔的，您可以使用正则表达式提取列。我假设你可以使用 C++11 或者如果不是 Boost 正则表达式。

然后您可以使用以下函数将字符串拆分为标记。

std::vector<std::string> split(const std::string& input, const std::regex& regex) {
    // passing -1 as the submatch index parameter performs splitting
    std::sregex_token_iterator
        first(input.begin(), input.end(), regex, -1),
        last;
    return std::vector<std::string>(first, last);
}

例如，假设您的数据在“data.txt”中，我以这种方式使用它来获取值：

#include <iostream>
#include <fstream>
#include <string>
#include <regex>
#include <vector>

using namespace std;

std::vector<std::string> split(const string& input, const regex& regex) {
    // passing -1 as the submatch index parameter performs splitting
    std::sregex_token_iterator
        first(input.begin(), input.end(), regex, -1),
        last;
    return vector<std::string>(first, last);
}

int main()
{
    ifstream f("data.txt");

    string s;
    while (getline(f, s))
    {
        vector<string> values = split(s, regex("\\s"));
        for (unsigned i = 0; i < values.size(); ++i)
        {
            cout << "[" << values[i] << "] ";
        }
        cout << endl;
    }

    return 0;
}

这给出了以下结果：

[1.1] [1.2] [1.3] [1.4] [1.5]
[1.6] [] [1.7] [1.8] [1.9]
[2.0] [] [] []
[2.1] [2.2] [2.3] [2.4] [2.5]

请注意，第 4 行缺少一列，但那是因为我不太确定该行有多少空格。如果您知道不超过 5 列，则可以在输出阶段进行更正。

希望您发现这种方法很有帮助。

score 0 · Accepted Answer

为什么不只是std:vector用来保存浮点数组。

要向您使用的向量添加新元素：

std::vector::push_back

当您阅读每个字符时，请查看它是数字还是句点。

如果是，则将其添加到 astd::string中，然后使用atofwithmystring.c_str()作为参数将其转换为浮点数。

这也可能有助于将字符串转换为浮点数：

std::string 浮动或加倍

因此，读入一个字符串，然后将浮点数推送到一个向量，然后重复，跳过不是数字或句点的字符。

在行尾，您的向量包含所有浮点数，如果您想将它们加入带有自定义分隔符的字符串，您可以查看此问题的答案：

std::vector 到带有自定义分隔符的字符串

c++ - 如何读取具有不同数字行数的文件

3 回答 3

Related

Reference