c++ - 使用 C++ 根据标准对信息进行分箱

Question

我的信息看起来像这样

No.      ID        DATE_EVENT   TIME_EVENT    EVENT   CODE
102995   018159871 07/08/2014   09:01:57      9008    1111
20398    018159871 07/08/2014   09:01:58      1000    1402
105541   018159871 07/08/2014   09:01:58      9210    1111
63492    018253609 07/08/2014   09:54:26      9008    905
37552    018253609 07/08/2014   09:54:45      9008    1111
9627     018253609 07/08/2014   09:54:48      9210    1111
112700   018253609 07/08/2014   09:54:48      1000    1402
50555    018253609 07/08/2014   09:55:56      1000    1401
63634    018253609 07/08/2014   09:55:56      9210    1111 
34551    018330948 07/08/2014   09:21:51      9008    905
47252    018330948 07/08/2014   09:22:15      9008    1111
3975     018330948 07/08/2014   09:22:17      1000    1402
24196    018330948 07/08/2014   09:22:17      9210    1111
111150   018342571 07/08/2014   09:40:08      9008    905
17119    018342571 07/08/2014   09:40:19      9008    1111
18658    018342571 07/08/2014   09:40:21      9210    1111
25654    018342571 07/08/2014   09:40:21      1000    1402

如您所见，信息按时间和 ID 排序。我希望能够做的是在进行下一步之前计算花费在9008 905&上的时间9008 1111

我正在像这样阅读它

#include <iostream>
#include <fstream>
#include <vector>
#include <sstream>

using namespace std;

vector<string> &SplitString(const string &s, char delim, vector<string> &elems)
{
    stringstream ss(s);
    string item;

    while (getline(ss, item, delim))
    {
        elems.push_back(item);
    }

    return elems;
}

int main(int argc, const char * argv[])
{

    ifstream CustJ("/Users/Rr/p/B/Sample 1.txt");

    string str;
    string elements;

    CustJ.seekg(0, ios::end);
    str.reserve(CustJ.tellg());
    CustJ.seekg(0, ios::beg);

    str.assign((istreambuf_iterator<char>(CustJ)),
               istreambuf_iterator<char>());    

    if (str.length() > 0)
    {

        vector<string> lines;
        SplitString(str, '\n', lines);

        vector<vector<string> > LineElements;

        for (auto it : lines)
        {

            vector<string> elementsInLine;

            SplitString(it, ',', elementsInLine);

            LineElements.push_back(elementsInLine);
         }

        //this displays each element in an organized fashion

        //for each line
        for (auto it : LineElements)
        {
            //for each element IN that line
            for (auto i : it)
            {
                //if it is not the last element in the line, then insert comma 
                if (i != it.back())
                    std::cout << i << ',';
                else
                    std::cout << i;//last element does not get a trailing comma
            }
            //the end of the line
            std::cout << '\n';
        }
    }
    else
    {
        std::cout << "File Is empty" << std::endl;
        return 1;
    }



    system("PAUSE");
    return 0;
}

我不确定这是否是解决此问题的最佳方法。

谢谢。

score 0 · Accepted Answer

你重新表述了这个问题，这使它更容易理解。在我看来，代码在这里并不是最重要的。您要做的就是将整个任务分解为可操作的项目，这将使任务可以解决。

在 C++ 以外的语言中可能有一个超级优雅的答案——在 Perl、Python、Ruby 中。我将用 C# 写一个答案，因为典型的基础架构 (IDE) 可能对您有所帮助，而LINQ（语言集成查询）是您在此类任务中的朋友。

不能保证代码的正确性，因为您的问题的答案有太多部分。代码不健壮，因为如果输入不合适等，它将在许多地方抛出异常。由您来定义错误处理策略。在任何情况下，您都可能希望用不同的语言重新实现它。

第一个组件是来自文件的输入。声明形式：

var lines = File
    .ReadAllLines("input.txt", Encoding.Default)
    .Skip(1);

我们需要从相邻的日期时间计算时间跨度，因此我们将它们配对：

var event_tuples = lines
    .Zip(lines.Skip(1), (start, end) => new { Start = start, End = end });

然后我们可以构造数据以进行更清晰的查询：

var entries = event_tuples
    .Select(x => {
        var start_data = x.Start.ParseColumns();
        var end_data = x.End.ParseColumns();
        var duration = end_data.ToDateTime() - start_data.ToDateTime();

        return new
        {
            No=start_data[0],
            Id=start_data[1],
            Duration = duration,
            Event = start_data[4],
            Code = start_data[5]
        };
    })
;

这里可以看到之前结构化查询输出的使用：.Start和.End. 更多关于ParseColumns和ToDateTime以后。

现在到您的示例查询：

统计在 9008 905 & 9008 1111 上花费的时间首先找到对应的事件

var query = entries
    .Where(x => x.Event == "9008"
                && new[] { "905", "1111" }.Contains(x.Code))
;

Console.WriteLine("{0} events found",query.Count());

然后，计算总持续时间：

var total_duration = query
    .Select(x=>x.Duration)
    .Aggregate((a, b) => a + b);

Console.WriteLine("total duration: {0}", total_duration);

如您所见，这里有很多问题：文件输入、解析字符串、日期时间解析、查询、聚合。每个都需要特别小心。您绝对不想做的是将时间花在低级细节上，例如端线处理。考虑在足够高的抽象级别上使用适当的工具。

回到ParseColumns和ToDateTime。我已经将它们编写为扩展方法，它们是 LINQ 的基础并有助于编写声明性代码，即使它们的使用在这里也可能是推测性的。在其他语言中，还有其他机制可以实现这种可读性。

此处的示例，特定于问题的实现：

static class Extensions {
    public static string[] ParseColumns(this String line)
    {
        return line.Split(new char[] { ' ' },
                          StringSplitOptions.RemoveEmptyEntries);
    }

    public static DateTime ToDateTime(this String[] line)
    {
        const string datetime_format = "dd/MM/yyyy H:mm:ss";
        return DateTime.ParseExact(
            line[2] + " " + line[3], 
            datetime_format, 
            CultureInfo.InvariantCulture
        );
    }
}

这部分隐藏了代码中一些“让它工作”的丑陋部分。如果您正在编写的软件将被使用并在以后扩展，那么这些部分将在代码的其他地方找到它们的方式，最好是在抽象后面。

如果您坚持使用 C++，您可能想看看cpplinq。

可在rextester运行

额外阅读：Martin Fowler：收集管道

c++ - 使用 C++ 根据标准对信息进行分箱

1 回答 1

Related

Reference