c++ - C++ 从字符串中提取数据

Question

什么是从字符串中提取数据的优雅方法（可能使用 boost 库）？

Content-Type: text/plain
Content-Length: 15
Content-Date: 2/5/2013
Content-Request: Save

hello world

假设我有上面的字符串并想提取所有字段，包括 hello world 文本。有人可以指出我正确的方向吗？

score 4 · Accepted Answer

尝试

http://pocoproject.org/

附带 HTTPServer 和 Client 实现
http://cpp-netlib.github.com/

带有请求/响应处理

Boost Spirit演示： http: //liveworkspace.org/code/3K5TzT

您必须指定一个简单的语法（或者复杂的，如果您想“捕捉”HTTP 的所有细微之处）

#include <boost/fusion/adapted.hpp>
#include <boost/spirit/include/qi.hpp>
#include <boost/spirit/include/karma.hpp>

typedef std::map<std::string, std::string> Headers;
typedef std::pair<std::string, std::string> Header;
struct Request { Headers headers; std::vector<char> content; };

BOOST_FUSION_ADAPT_STRUCT(Request, (Headers, headers)(std::vector<char>, content))

namespace qi    = boost::spirit::qi;
namespace karma = boost::spirit::karma;

template <typename It, typename Skipper = qi::blank_type>
    struct parser : qi::grammar<It, Request(), Skipper>
{
    parser() : parser::base_type(start)
    {
        using namespace qi;

        header = +~char_(":\n") > ": " > *(char_ - eol);
        start = header % eol >> eol >> eol >> *char_;
    }

  private:
    qi::rule<It, Header(),  Skipper> header;
    qi::rule<It, Request(), Skipper> start;
};

bool doParse(const std::string& input)
{
    auto f(begin(input)), l(end(input));

    parser<decltype(f), qi::blank_type> p;
    Request data;

    try
    {
        bool ok = qi::phrase_parse(f,l,p,qi::blank,data);
        if (ok)   
        {
            std::cout << "parse success\n";
            std::cout << "data: " << karma::format_delimited(karma::auto_, ' ', data) << "\n";
        }
        else      std::cerr << "parse failed: '" << std::string(f,l) << "'\n";

        if (f!=l) std::cerr << "trailing unparsed: '" << std::string(f,l) << "'\n";
        return ok;
    } catch(const qi::expectation_failure<decltype(f)>& e)
    {
        std::string frag(e.first, e.last);
        std::cerr << e.what() << "'" << frag << "'\n";
    }

    return false;
}

int main()
{
    const std::string input = 
        "Content-Type: text/plain\n"
        "Content-Length: 15\n"
        "Content-Date: 2/5/2013\n"
        "Content-Request: Save\n"
        "\n"
        "hello world";

    bool ok = doParse(input);

    return ok? 0 : 255;
}

score 4 · Accepted Answer

这是一个用 C 编写的非常紧凑的代码： https ://github.com/openwebos/nodejs/blob/master/deps/http_parser/http_parser.c

score 2 · Accepted Answer

有几种解决方案。如果格式很简单，您可以简单地逐行读取文件。如果该行以键开头，您可以简单地将其拆分以获取值。如果不是，则该值就是该行本身。使用 STL 可以非常轻松且非常优雅地完成它。

如果语法更复杂，并且当您向标签添加 boost 时，您可以考虑使用 Boost Spirit 来解析它并从中获取值。

score 2 · Accepted Answer

我认为最简单的解决方案是使用正则表达式。C++ 11中有一个标准的正则表达式，有些可以在boost中找到。

score 1 · Accepted Answer

您可以使用string::find空格来查找它们的位置，然后从该位置复制，直到找到'\n'

score 1 · Accepted Answer

如果您想自己编写代码来解析它，请先查看HTTP 规范。这将为您提供语法：

    generic-message = start-line
                      *(message-header CRLF)
                      CRLF
                      [ message-body ]
    start-line      = Request-Line | Status-Line

所以我要做的第一件事是在 CRLF 上使用split()将其分解为复合行。然后您可以遍历生成的向量。直到你得到一个空白 CRLF 的元素，你正在解析一个标题，所以你在第一个 ':' 上拆分以获取键和值。

一旦你点击了空元素，你就在解析响应体。

警告：过去我自己做过这件事，我可以告诉你，并不是所有的网络服务器都与行尾一致（你可能在某些地方只找到一个 CR 或一个 LF），并且并非所有的浏览器/其他抽象层都与什么一致他们传给你。因此，您可能会在您不期望的地方找到额外的 CRLF，或者在您期望的地方缺少 CRLF。祝你好运。

score 0 · Accepted Answer

如果您准备手动展开循环，则可以使用std::istringstream提取运算符的正常重载（使用适当的操纵器，例如get_time()用于处理日期）以简单的方式提取数据。

另一种可能性是用于std::regex匹配所有模式，例如<string>:<string>并遍历所有匹配项（egrep如果您有几行要处理，语法似乎很有希望）。

或者，如果你想用难的方式来做，并且你的字符串有特定的语法，你可以使用Boost.Spirit轻松定义语法并生成解析器。

score 0 · Accepted Answer

如果您可以访问 C+11，则可以使用 std::regex ( http://en.cppreference.com/w/cpp/regex )。

std::string input = "Content-Type: text/plain";
std::regex contentTypeRegex("Content-Type: (.+)");

std::smatch match;

if (std::regex_match(input, match, contentTypeRegex)) {
     std::ssub_match contentTypeMatch = match[1];
     std::string contentType = contentTypeMatch.str();
     std::cout << contentType;
}
//else not found

在此处编译/运行版本：http: //ideone.com/QTJrue

这个正则表达式是一个非常简单的例子，但它对于多个字段的原理是一样的。

c++ - C++ 从字符串中提取数据

8 回答 8

Related

Reference