0

我必须替换 html 中的一些无效链接,如下所示:

<td><a title="Michel Blanc" href="http://www.mysite.com/index.php?title=Michel_Blanc&amp;action=edit&amp;redlink=1">Michel Blanc</a></td>
<td><a title="Pierre Schöller" href="http://www.mysite.com/index.php?title=Pierre_Sch%C3%B6ller&amp;action=edit&amp;redlink=1">Pierre Schöller</a></td>
<td><a title="Focus Features" href="http://www.mysite.com/w/Focus_Features">Focus Features</a><br />
<a title="Olivier Treiner" href="http://www.mysite.com/index.php?title=Olivier_Treadfadfadfiner&amp;action=edit&amp;redlink=1">Olivier Treiner</a>
<td>1600</td>

我想删除所有<a>标签,但<a></a>如果 href 以

    http://www.mysite.com/index.php?title=

<a>如果 href以

    http://www.mysite.com/w/

这是我的正则表达式

    (<a title="([\s\S])*?" href="http://www\.mysite\.com/index\.php\?title=([\s\S])*?&amp;action=edit&amp;redlink=1">([\s\S])*?</a>)

但它涉及我要保留的第三行。我在http://regexpal.com/中对其进行了测试

有人帮我吗?

4

2 回答 2

0

这个对我有用:

(<a title="[^>]*?" href="http://www\.mysite\.com/index\.php\?title=([\s\S])*?&amp;action=edit&amp;redlink=1">([\s\S])*?</a>)
于 2013-08-16T17:11:47.940 回答
0
$subject = <<<'LOD'
<td><a title="Michel Blanc" href="http://www.mysite.com/index.php?title=Michel_Blanc&amp;action=edit&amp;redlink=1">Michel Blanc</a></td>
<td><a title="Pierre Schöller" href="http://www.mysite.com/index.php?title=Pierre_Sch%C3%B6ller&amp;action=edit&amp;redlink=1">Pierre Schöller</a></td>
<td><a title="Focus Features" href="http://www.mysite.com/w/Focus_Features">Focus Features</a><br />
<a title="Olivier Treiner" href="http://www.mysite.com/index.php?title=Olivier_Treadfadfadfiner&amp;action=edit&amp;redlink=1">Olivier Treiner</a>
<td>1600</td>
<a href="http://remove.me.com">remove.me</a>
LOD;

正则表达式方式:

$pattern = <<<'LOD'
~
# definitions
(?(DEFINE)
  # all the content from the "a" tag begining until the content 
  # of the "href" attribute 
  (?<atohref>
      <a\b (?> [^h>]++ | \Bh | h(?!ref) )++ href\s*+=\s*+['"]?+
  )

  # all the content until the closing "a" tag 
  (?<untilclosea>
      (?> [^<]++ | <(?!/a>) )++
  )
)

# pattern
    \g<atohref>
    \Qhttp://www.mysite.com/\E
    (?>
        \Qindex.php?title=\E
        [^>]*+>
        ( \g<untilclosea> ) # third group (because of the two named groups)
        </a>
      |
        w/ \g<untilclosea>
        </a> \K      # reset the match (to preserve it)
    )
  | 
    <a\b \g<untilclosea> </a> # all other "a" tags
~x
LOD;

$replacement = '$3';
$result = preg_replace($pattern, $replacement, $subject);
echo htmlspecialchars($subject).'<br><br>';
echo htmlspecialchars($result);
于 2013-08-16T18:44:07.483 回答