php - 忽略 preg_match 上的 html 标签

Question

我用以下 html 报废了一个网站

<a class="name" href="/link" data-hovercard-id="charshere"><span class="highlighted">War</span> World</a> 

<a class="name" href="/link" data-hovercard-id="charshere"> World of <span class="highlighted">fun</span></a> 

<a class="name" href="/link" data-hovercard-id="charshere">Save the<br>world</a> 

<a class="name" href="/link" data-hovercard-id="charshere">world of warcraft</a>

使用此代码，我得到链接的值

preg_match_all('/<a class="name" href=".*?" data-hovercard-id=".*?">(.*)<\/a>/i', $file_string, $titles);

但结果是

<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft

我如何忽略其中的 html 标签？所以它看起来像这样

 War World
 World of fun
 Save the world
 world of warcraft

DomDocument 可能会更好。谢谢。一直在尝试使用 domDocument，但我不熟悉如何使用它的 xquery。

score 3 · Accepted Answer

使用strip_tags(). 这里有一个例子：

$html = <<<EOF
<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft
EOF;

echo strip_tags($html);

输出：

War World
 World of fun
Save theworld
world of warcraft

score 0 · Accepted Answer

获取文本后只需删除标签：

<?php
$string = '<span class="highlighted">War</span> World
 World of <span class="highlighted">fun</span>
Save the<br>world
world of warcraft';
$convert = preg_replace('/<.*?>/','', $string);
print $convert;

印刷：

War World
 World of fun
Save theworld
world of warcraft

score 0 · Accepted Answer

在匹配链接的字符串后，您可以删除 HTML 标记。例如

$str = preg_replace('/<[^<]+>/', '', $html);

php - 忽略 preg_match 上的 html 标签

3 回答 3

Related

Reference