我需要删除一个字符串中的 HTML 标签并只保留其中一种类型。我有一个包含这个的字符串:
<!-- comment --> <div id="55"> text </div> <span name=annotation value=125> 2 text </span> <p id="55"> text 3</p><span>text 4 <span>
我需要这个:
text <span name=annotation value=125> 2 text </span> text 3text4
所以我需要删除所有 HTML 标签,除了那些有这种形式的标签
"/(<span[^>]*annotation[^>]*value=.?(\w*).?[^>]*>)(.*?)<\/span>/"
我用它作为另一个表达的一部分,但想知道
我怎样才能做到这一点?
我知道它可以用 来完成preg_replace()
,但我不知道我需要什么模式。
一个例子:
$str='<!-- comment --><p><b>Deoxyribonucleic acid</b> (<b>DNA</b>) is
a molecule encoding the <a href="/wiki/Genetics" title="Genetics">genetic</a> instructions
used in the development and functioning of all known living <a href="/wiki/Organism" title="Organism">organi
sms</a> and many <a href="/wiki/Virus" title="Virus">viruses</a>. Along with <a href="/wiki/RNA" title="RNA">RNA</a> and <a href="/wiki/Proteins" title="Proteins" class="mw-redirect">proteins</a>, DNA is one of the three major
<a href="/wiki/Macromolecules" title="Macromolecules" class="mw-redirect">macromolecules</a>
that are essential for all known forms of <a href="/wiki/Life" title="Life">life</a>.
Genetic information is encoded<span id="200120131815150"
class="mymetastasis" value="247" name="annotation"> as a sequence of nucleotides (</span><a href="/wiki/Guanine" title="Guanine"><span id="200120131815151" class="mymetastasis" value="247" name="annotation">
guanine</span></a><span id="200120131815152" class="mymetastasis" value="247" name="annotation">, </span><a href="/wiki/Adenine" title="Adenine"><span id="200120131815153" class="mymetastasis" value="247"
name="annotation">adenine</span></a><span id="200120131815154" class="mymetastasis" value="247" name="annotation">,
</span><a href="/wiki/Thymine" title="Thymine"><span id="200120131815155" class="mymetastasis" value="247"
name="annotation">thymine</span></a><span id="200120131815156" class="mymetastasis" value="247" name="annotation">,
and </span><a href="/wiki/Cytosine" title="Cytosine">
<span id="200120131815157" class="mymetastasis" value="247" name="annotation">cytosine</span></a><span id="200120131815158" class="mymetastasis" value="247" name="annotation">)
recorded using the letters G, A, T, and C. Most DNA molecules are double-strande</span>d helices, consisting of two long <a href="/wiki/Polymers" title="Polymers" class="mw-redirect">polymers</a> of simple units called <a href="/wiki/Nucleotide"
title="Nucleotide">nucleotides</a>, molecules with <a href="/wiki/Backbone_chain" title="Backbone chain">backbones</a>
made of alternating <a href="/wiki/Monosaccharide" title="Monosaccharide">sugars<
/a> (<a href="/wiki/Deoxyribose" title="Deoxyribose">deoxyribose</a>) and <a href="/wiki/Phosphate"
title="Phosphate">phosphate</a> groups (related to phosphoric acid), with the <a href="/wiki/Nucleobases" title="Nucleobases" class="mw-redirect">nucleobases</a> (G, A, T, C) attached to the sugars. DNA is well-suited for biological information storage, since the DNA backbone is resistant to cleavage and the double-stranded structure provides the molecule with a
built-in duplicate of the encoded information.</p>';
PD:换行符、制表符等是无意的。源文本的一部分。