c++ - C++ RegEx 内存不足

Question

我正在使用正则表达式从 html 页面中的 div 之间检索字符串，但是我遇到了内存不足错误。我正在使用 Visual Studio 2012 和 C++。

正则表达式 is"class=\"ListingDescription\">((.*|\r|\n)*?(?=</div>))"并且 regxbuddy 估计它需要 242 步（比原来的 ~5000 步要好得多）。我试图从中删除信息的网站是http://www.trademe.co.nz/Browse/Listing.aspx?id=557211466

这是代码：

typedef match_results<const char*> cmatch;
tr1::cmatch results;
try {
    tr1::regex regx("class=\"ListingDescription\">((.*|\\r|\\n)*?(?=</div>))");

    tr1::regex_search(data.c_str(), results, regx);

        cout << result[1];

} 
catch (const std::regex_error& e) {
    std::cout << "regex_error caught: " << e.what() << '\n';
    if (e.code() == std::regex_constants::error_brack) {
        std::cout << "The code was error_brack\n";
       }
}

这是我得到的错误：

regex_error caught: regex_error(error_stack): There was insufficient memory to d
etermine whether the regular expression could match the specified character sequ
ence.

Regexbuddy 工作正常，一些在线正则表达式工具也能正常工作，但不是我的代码 :( 请帮助

score 2 · Accepted Answer

您.在可能发生多次的地方使用 a ，因此它将匹配 all <，包括 before </div>，这可能是您不想要的。

现在强制链接RegEx 匹配除了 XHTML 自包含标签之外的开放标签。

使用正则表达式解析 HTML通常是个坏主意。您应该改用HTML解析器

score 0 · Accepted Answer

我现在明白了。正则表达式在某些领域非常有限。我将看看解析器并尝试一下。我在此期间所做的是：

std::string startstr = "<div id=\"ListingDescription_ListingDescription\" class=\"ListingDescription\">";
unsigned startpos = data.find(startstr) + strlen(startstr.c_str()); 
unsigned endpos = data.find("</div>",
startpos); 
std::string desc = data.substr (startpos,endpos - startpos);

大声笑，我知道它不是很好，但它有效。

谢谢克莱门特·贝洛特

c++ - C++ RegEx 内存不足

2 回答 2

Related

Reference