java - 字符串操作 - 富文本编辑器

Question

我有一个要求。我有一个字符串，其值为例如：

<p>We are pleased <a href="http://www.anc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">to present the new product type</a>. This new product type is the best thing since sliced bread. We are pleased to present the new product type. This new product <a href="mailto:abc@gmail.com">type is the best</a> thing since sliced bread.</p>

上述文本将存储为单个字符串值。检查条件后，我需要将某些参数附加到 href。让我知道如何仅提取 href 并附加参数并显示字符串而不损坏（仅供参考：字符串是通过 RTE - 富文本编辑器输入的值）

尝试了这种方法，但没有成功。

String tmpStr = "href=\"http://www.abc.com\">design";

StringBuffer tmpStrBuff = new StringBuffer();
String[] tmpStrSpt = tmpStr.split(">");
if (tmpStrSpt[0].contains("abc.com")) {
    String[] tmpStrSpt1 = tmpStrSpt[0].split("\"");
    tmpStrBuff.append(tmpStrSpt1[0]);
    if (tmpStrSpt1[1].contains("?")) {
        tmpStrBuff.append("\"" + tmpStrSpt1[1] + "&s_cid=abcd_xyz\">");
    } else {
        tmpStrBuff.append("\"" + tmpStrSpt1[1] + "?s_cid=abcd_xyz\">");
    }
    tmpStrBuff.append(tmpStrSpt[1]);
    tmpStrBuff.append("</a>");
    System.out.println(" <p>tmpStr1:::: " + tmpStrBuff.toString() + "</p>");
}

使用的另一种方法是：

String[] tmpTxtArr = text.split("\\s+");
StringBuffer tmpStrBuff = new StringBuffer();
for (String tmpTxt : tmpTxtArr) {
    descTxt += (tmpTxt.contains("abc.com") && !tmpTxt.contains("?")) ? tmpTxt
            .replace("\">", "?s_cid=" + trackingCode + "\">" + " ")
            : tmpTxt + " ";
}

score 2 · Accepted Answer

描述

这个正则表达式将：

在锚标签中找到 href 属性
要求 href 有http://abc.com. 它还将允许https并www.abc.com在各自的位置。
如果字符串包含 a ?then 也将被捕获并放入组捕获 3

<a\b[^<]*\bhref=(['"])(https?:\/\/(?:www[.])?abc[.]com[^"'?]*?([?]?)[^"'?]*?)\1[^<]*<\/a>

在此处输入图像描述

团体

<a第 0 组将拥有从开盘到收盘的整个锚点</a>。如果您发现这过多或与嵌套的锚标记冲突，则只需[^<]*<\/a>从该表达式的末尾删除。

获取稍后引用的打开报价，\1以确保我们有相同的收盘报价
获取href值
如果有问号，则在此处捕获

Java 代码示例：

给定示例文本：

<p>Some <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">text</a>. I like kittens <a href="mailto:abc@gmail.com">email us</a>Dogs are nice.</p><a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value">remember to vote</a>

这段代码

import java.util.regex.Pattern;
import java.util.regex.Matcher;
class Module1{
  public static void main(String[] asd){
  String sourcestring = "source string to match with pattern";
  Pattern re = Pattern.compile("<a\\b[^<]*\\bhref=(['\"])(https?:\\/\\/(?:www[.])?abc[.]com[^\"'?]*?([?]?)[^\"'?]*?)\\1[^<]*<\\/a>",Pattern.CASE_INSENSITIVE | Pattern.MULTILINE);
  Matcher m = re.matcher(sourcestring);
  int mIdx = 0;
    while (m.find()){
      for( int groupIdx = 0; groupIdx < m.groupCount()+1; groupIdx++ ){
        System.out.println( "[" + mIdx + "][" + groupIdx + "] = " + m.group(groupIdx));
      }
      mIdx++;
    }
  }
}

产量

$matches Array:
(
    [0] => Array
        (
            [0] => <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html">text</a>
            [1] => <a href="http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value">remember to vote</a>
        )

    [1] => Array
        (
            [0] => "
            [1] => "
        )

    [2] => Array
        (
            [0] => http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html
            [1] => http://www.abc.com/content/cy-tech/global/en/cq5-reference-materials.s_cid_123.html?attribute=value
        )

    [3] => Array
        (
            [0] => 
            [1] => ?
        )

)

从这里开始，遍历所有匹配项是一件简单的事情，如果第 3 组有值，则插入 a &，如果没有，则?在新文本和第 2 组的 href 值之间插入 a。

免责声明

从长远来看，使用正则表达式解析 HTML 可能不是最容易维护的事情。但是，如果您可以控制输入文本，则文本仍然非常简单，并且您愿意遇到常规表达式可能失败的周期性边缘情况，那么正则表达式将为您工作。

一些讨厌的人会指出像下面这样的字符串不会正确匹配。虽然是真的，但在 HTML 中，这些可能性要么是非法的，要么是不切实际的，因此不太可能遇到。

<a href="http://abc.com?attrib=</a>">link</a>额外的特殊符号< /并>在 HTML 中工作，它们需要被转义。如此处所示，这将违反 HTML 标准。
<a href="http://abc.com?attrib=value">outside<a href="http://abc.com?attrib=value2">inside</a></a>嵌套链接可能是合法的，但是它会强制浏览器选择跟随哪个锚标记，而且我从未见过使用这种格式。

java - 字符串操作 - 富文本编辑器

1 回答 1

描述

团体

Java 代码示例：

免责声明

Related

Reference