c++ - Getting sub-match_results with boost::regex

Question

Hey, let's say I have this regex: (test[0-9])+

And that I match it against: test1test2test3test0

const bool ret = boost::regex_search(input, what, r);

for (size_t i = 0; i < what.size(); ++i)
    cout << i << ':' << string(what[i]) << "\n";

Now, what[1] will be test0 (the last occurrence). Let's say that I need to get test1, 2 and 3 as well: what should I do?

Note: the real regex is extremely more complex and has to remain one overall match, so changing the example regex to (test[0-9]) won't work.

score 10 · Accepted Answer

我认为 Dot Net 有能力制作单个捕获组集合，以便 (grp)+ 将在 group1 上创建一个集合对象。boost 引擎的 regex_search() 就像任何普通的匹配函数一样。你坐在一个 while() 循环中，匹配最后一个匹配停止的模式。您使用的表单不使用出价迭代器，因此该函数不会在最后一个匹配停止的地方开始下一个匹配。

您可以使用迭代器形式：（
编辑-您也可以使用令牌迭代器，定义要迭代的组。在下面的代码中添加）。

#include <boost/regex.hpp> 
#include <string> 
#include <iostream> 

using namespace std;
using namespace boost;

int main() 
{ 
    string input = "test1 ,, test2,, test3,, test0,,";
    boost::regex r("(test[0-9])(?:$|[ ,]+)");
    boost::smatch what;

    std::string::const_iterator start = input.begin();
    std::string::const_iterator end   = input.end();

    while (boost::regex_search(start, end, what, r))
    {
        string stest(what[1].first, what[1].second);
        cout << stest << endl;
        // Update the beginning of the range to the character
        // following the whole match
        start = what[0].second;
    }

    // Alternate method using token iterator 
    const int subs[] = {1};  // we just want to see group 1
    boost::sregex_token_iterator i(input.begin(), input.end(), r, subs);
    boost::sregex_token_iterator j;
    while(i != j)
    {
       cout << *i++ << endl;
    }

    return 0;
}

输出：

test1
test2
test3
test0

score 6 · Accepted Answer

Boost.Regex 为这个特性提供了实验性支持（称为重复捕获）；但是，由于它对性能的影响很大，因此默认情况下禁用此功能。

要启用重复捕获，您需要重新构建 Boost.Regex 并BOOST_REGEX_MATCH_EXTRA在所有翻译单元中定义宏；最好的方法是取消注释 boost/regex/user.hpp 中的这个定义（参见参考资料，它位于页面的最底部）。

使用此定义编译后，您可以通过调用/使用和使用标志来使用regex_search此功能。regex_matchregex_iteratormatch_extra

查看对Boost.Regex的参考以获取更多信息。

score 3 · Accepted Answer

在我看来，您需要创建一个regex_iterator，使用(test[0-9])正则表达式作为输入。然后您可以使用结果regex_iterator来枚举原始目标的匹配子字符串。

如果您仍然需要“一个整体匹配”，那么也许该工作必须与查找匹配子字符串的任务分离。你能澄清你的要求的那部分吗？

c++ - Getting sub-match_results with boost::regex

3 回答 3

Related

Reference