c++ - C++11 VS12 正则表达式搜索

Question

我正在尝试从字符串中检索数字。字符串格式，例如_0_1_and 我想得到0and 1。

这是我的代码：

std::tr1::regex rx("_(\\d+)_");
tstring fileName = Utils::extractFileName(docList[i]->c_str());                 
std::tr1::smatch res;
std::tr1::regex_search(fileName, res, rx);

但结果我得到了（更新：这是调试器监视的奇怪输出）：

res[0] = 3
res[1] = 1

从哪里来3，我做错了什么？

更新：我将结果输出到屏幕：

for (std::tr1::smatch::iterator it = res.begin(); it < res.end(); ++it){
    std::cout << *it << std::endl;
}

和程序输出：

_0_
0

score 2 · Accepted Answer

正则表达式通常会返回所有不重叠的匹配项，因此如果您_在数字的前面和后面都添加，您将不会获得所有数字，因为第一个数字之后的下划线也不能用于匹配之前的下划线第二个数字

_123_456_
    ^
    This cannot be used twice

只需使用(\\d+)as 表达式来获取所有数字（正则表达式默认为“贪婪”，因此无论如何都会找到所有可用数字）。

score 2 · Accepted Answer

这似乎是预期的输出。第一个匹配应该是匹配的整个子字符串，然后第二个（等等）应该是捕获组。

如果您想查看所有比赛，则需要regex_search多次调用才能获得每场比赛：

auto it = fileName.cbegin();
while (std::tr1::regex_search(it, fileName.cend(), res, rx)) {
    std::cout << "Found matching group:" << std::endl;
    for (int mm = 1; mm < res.size(); ++mm) {
        std::cout << std::string(res[mm].first, res[mm].second) << std::endl;
    }

    it = res[0].second; // start 1 past the end
}

如果您确实只需要下划线中“包裹”的数字，则可以使用肯定的断言(?=_)来确保发生这种情况：

// positive assertions are required matches, but are not consumed by the
// matching group.
std::tr1::regex rx("_(\\d+)(?=_)");

其中，当针对运行时"//abc_1_2_3.txt"，检索 1 和 2，但不检索 3。

score 1 · Accepted Answer

regex_token_iterator解决方案：谢谢大家，在and的帮助下重写(\\d+)。现在它起作用了：

std::regex_token_iterator<tstring::iterator> rend;
tstring fileName = Utils::extractFileName(docList[i]->c_str());                   
std::tr1::regex_search(fileName, res, rx);              
for (std::regex_token_iterator<std::string::iterator> it(fileName.begin(), fileName.end(), rx); it != rend; ++it) {
        std::cout << " [" << *it << "]";
}

c++ - C++11 VS12 正则表达式搜索

3 回答 3

Related

Reference