3

我正在使用 JavaScript RegExp 对 HTML 内容进行搜索突出显示。

为此,我正在使用:

data.replace( new RegExp("("+search+")", 'g'), "<b id='searchHighlight'>$1</b>" );

其中data是整个 HTML 内容,search是搜索字符串。

例如,在搜索时h,它会突出显示单词(the、there 等)中的 h 以及诸如"<h1 id="title"> Something </h1>"等标签中的实例。

我不能采用替代方法,因为我需要以相同的样式突出显示相同的 HTML 内容。

我已阅读以下解决方案:

var input = "a dog <span class='something'> had a  </span> and a cat";
// Remove anything tag-like
var temp = input.replace(/<.+?>/g, "");
// Perform the search
var matches = new RegExp(exp, "g").exec(temp);

但是由于我需要在相同的 HTML 内容中突出显示搜索文本,所以我不能简单地去掉现有的标签。有什么方法可以在 RegExp 中进行包含和排除搜索,例如,我可以h“the”中突出显示"t<b id='searchHighlight'>h</b>e"
并且不允许"<h1 id="title">Test</h1>"因此被破坏:"<<b id='searchHighlight'>h</b>1 id="title">Test</<b id='searchHighlight'>h</b>1>"

HTML 内容是静态的,如下所示:

    <h1 id="title">Samples</h1>
        <div id="content">
            <div  class="principle">
        <h2 id="heading">           
            PRINCIPLE</h2>


        <p>
            FDA recognizes that samples are an important part of ensuring that the right drugs are provided to the right patients. Under the Prescription Drug Marketing Act (PDMA), a sales representative is permitted to provide prescription drug samples to eligible healthcare professionals (HCPs). In order for BMS to provide this service, representatives must strictly abide by all applicable compliance standards pertaining to the distribution of samples.</p></div>
<h2 id="heading">           
            WHY DOES IT MATTER?</h2>
        <p>
            The Office of Inspector General (OIG) recognizes that samples can have monetary value to HCPs and, when used improperly, may have implications under the Federal False Claims Act and the Federal Anti-kickback Act. To minimize risk of such liability, the OIG requires the clear and conspicuous labeling of individual samples as units that cannot be sold.&nbsp; BMS and its business partners label every sample package to meet this requirement.&nbsp; Additionally, the HCP signature statement acknowledges that the samples will not be sold, billed or provided to family members or friends.</p>
        <h2 id="heading">

            WHO IS YOUR SMaRT PARTNER?</h2>
        <p>
            SMaRT is an acronym for &ldquo;Samples Management and Representatives Together&rdquo;.&nbsp; A SMaRT Partner has a thorough understanding of BMS sample requirements and is available to assist the field with any day-to-day policy or procedure questions related to sample activity. A SMaRT Partner will also:</p>

        <ul>
            <li style="margin-left:22pt;"> Monitor your adherence to BMS&rsquo;s sample requirements.</li>
            <li style="margin-left:22pt;"> Act as a conduit for sharing sample compliance issues and best practices.</li>
            <li style="margin-left:22pt;"> Respond to day-to-day sample accountability questions within two business days of receipt.</li>
        </ul>
        <p>

            Your SMaRT Partner can be reached at 888-475-2328, Option 3.</p>
        <h2 id="heading">

            BMS SAMPLE ACCOUNTABILITY POLICIES &amp; PROCEDURES</h2>
        <p>
            It is the responsibility of each sales representative to read, understand and follow the BMS Field Sample Accountability Procedures, USPSM-SOP-101. The basic expectations are:</p>
        <ul>
            <li style="margin-left:22pt;"> Transmit all sample activity by communicating your tablet to the host server on a <strong>daily</strong> basis.</li>
            <li style="margin-left:22pt;"> Maintain a four to six week inventory of samples rather than excessive, larger inventories that are more difficult to manage and increase your risk of non-compliance.</li>
            <li style="margin-left:22pt;"> Witness all HCP&rsquo;s signatures to confirm request and receipt of samples.</li>
        </ul>
</div>

内容都是分散的,而不是一个标签。所以 DOM 操作对我来说不是一个解决方案。

4

3 回答 3

4

如果您可以确定标签的属性不存在<或存在,则可以使用>

data = data.replace( 
    new RegExp( "(" + search + "(?![^<>]*>))", 'g' ),
        "<b id='searchHighlight'>$1</b>" );

如果字符串中出现在前面,则负前瞻会(?![^<>]*>)阻止替换,就像在标签内一样。><

这远非万无一失,但它可能已经足够好了。

顺便说一句,当您在全球范围内匹配时,即进行多个替换,id='searchHighlight'可能应该是class='searchHighlight'.

并且您需要注意search不包含任何正则表达式特殊字符。

于 2013-03-13T15:18:28.287 回答
1

您可能已经意识到您尝试使用错误的工具来完成这项工作,所以这只是为了记录(如果您不是,您可能会发现这很有见地)。

您可能(肯定会?)在具有基本任意文本内容的 html 属性上遇到一个基本问题,即title(工具提示属性)和data-...(通过设计保存任意数据的通用用户定义属性) - 无论您在文本部分找到什么你的 html 代码,你也可以在那里找到,替换会破坏气球帮助和/或破坏一些应用程序逻辑。另请注意,文本内容的任何字符都可以编码为命名或数字实体(例如&-> &amp;&#x26;&#38;),原则上可以处理,但会使动态正则表达式复杂化(如果您的变量search将保存直接文本)。

说了这么多,除非要突出显示的搜索结果可能包含在正则表达式规范中具有语义的字符,否则您可能会相处融洽,例如,也许;这些你必须正确逃脱。data.replace( new RegExp("([>]?)[^><]*("+search+")[^><]*([<]?)", 'g'), "<b id='searchHighlight'>$1$2$3</b>" );.+*|([{}])\-

总而言之:修改您的设计以使您免于麻烦

顺便说一句,你为什么不选择 dom 遍历?你不需要知道实际存在的 html 标签来做到这一点。

于 2013-03-13T14:49:18.157 回答
0

这不是一个纯粹的 RegExp 解决方案,但是,如果您无法遍历 DOM,那么带有功能替换和循环的字符串操作可能对您有用。

  1. 声明您需要的变量并获取文档正文的 innerHTML。
  2. 现在查看提取任何标签并将它们保存在数组中的数据。留下一个占位符,以便您知道以后将它们放回哪里。
  3. 将字符串中的所有标签替换为临时占位符后,您可以使用原始代码替换所需的字符,但将结果分配回data.
  4. 然后,您需要通过反转之前的过程来恢复标签。
  5. 将 new 指定data为文档正文的 innerHTML。

这是在行动的过程

这是代码:

var data = document.body.innerHTML, // get the DOM as a string
    tagarray = [], // a place to temporarily store all your tags
    tagmatch = /<[^>]+>/g, // for matching tags
    tagplaceholder = '<>', // could be anything but should not match the RegExp above, and not be the same as the search string below
    search = 'h'; // for example; but this could be set dynamically

while (tagmatch.test(data)) {
    data = data.replace(tagmatch, function (str) {
        tagarray.push(str); // store each matched tag in your array
        return tagplaceholder; // whatever your placeholder should be
    });
}

data = data.replace( new RegExp("("+search+")", 'g'), "<b id='searchHighlight'>$1</b>" ); // now search and replace the string of your choice

while (new RegExp(tagplaceholder, 'g').test(data)) {
    data = data.replace(tagplaceholder, function (str) {
        return tagarray.shift(str); // replace the placeholders with the tags you saved earlier to restore them
    });
}

document.body.innerHTML = data; // assign the changed `data` string to the body

显然,如果您可以将所有这些都放在一个自己的函数中,那就更好了,因为您真的不希望像上面这样的全局变量到处存在。

于 2013-03-13T15:09:04.400 回答