php - 使用正则表达式解析 html 标签中的内容

Question

我想解析内容

<td>content</td>
and
<td *?*>content</td>
and 
<td *specific td class*>content</td>

我怎样才能用正则表达式、php 和 preg 匹配来做到这一点？

score 4 · Accepted Answer

我认为这总结得很好。

简而言之，不要使用正则表达式来解析 HTML。相反，请查看 DOM 类，尤其是DOMDocument::loadHTML

score 3 · Accepted Answer

如果你有一个 HTML 文档，你真的不应该使用正则表达式来解析它：HTML 只是不够“常规”。

更好的解决方案是使用 DOM 解析器加载 HTML 文档——例如，DOMDocument::loadHTMLXpath查询通常做得非常好！

score 0 · Accepted Answer

<td>content</td>：<td>([^<]*)</td>

<td *specific td class*>content</td>：<td[^>]*class=\"specific_class\"[^>]*>([^<]*)<

score 0 · Accepted Answer

@OP，这是一种方法

$str = <<<A
<td>content</td>
<td *?*>content</td>
<td *specific td class*>content</td>
<td *?*> multiline
content </td>
A;

$s = explode("</td>",$str);
foreach ($s as $a=>$b){
    $b=preg_replace("/.*<td.*>/","",$b);
    print $b."\n";
}

输出

$ php test.php
content

content

content

 multiline
content

php - 使用正则表达式解析 html 标签中的内容

4 回答 4

Related

Reference