java - 匹配字符串中不区分大小写的短语

Question

我的工作是遍历一棵树并在目标词周围添加 html 标签，并具有以下约束：

可标记的单词是不属于另一个单词的字母序列，并且可能具有以下标点之一作为最后一个字符：句点'.'，逗号'，'，问号'？'，感叹号' !'、冒号 ':' 和分号 ';'。

请注意，可标记词可能嵌入在较长的文本中（例如在“The quick brown fox”中标记“quick”），并且可能在包含文本中出现多次。

再举一个例子，如果您被要求在“cow”周围添加一个粗体标签，您将在以下情况下标记整个单词以及最后一个字符的标点：“cow”、“cow!”、“cow?”、 “牛”、“牛”、“牛：”、“牛”、“牛”。那就是你会有“cow”、“cow!”、“cow?”、“cow.”、“cow:”、“cow;”、“COW”、“cOw”（最后两个是不区分大小写的匹配项） .

但是你不会用这些词标记“cow”：“cows”、“cowabunga”（在这两种情况下，不是一个词本身，而是一个更大的词的一部分）、“?cows”，（不仅仅是字母，标点符号不是last character) "cow?!!", (只接受一个标点字符), "cow's" (撇号不是字母)。

遍历树没问题，但是我无法想出一个块来确定添加标签的正确位置：

private void inorderAdd(TagNode root, String tag){
      if (root == null){
          return;
      }

      //Test if the tag is in the string at all
      if(root.tag.contains(tag)){
          String text = root.tag;
          String[] pieces =  text.split(" ");

          //check each array item for the target sequence
          for(int i = 0; i < pieces.length-1; i++){ 
              if(pieces[i].contains(tag)){

              }
          }
      }

      inorderAdd(root.firstChild, tag);
      inorderAdd(root.sibling, tag);

}

在这一点上，我知道包含标签的短语被拆分成一个数组，每个单词都分开。我不知道从哪里开始，因为我需要在某些时候考虑大小写以及一些标点符号。

score 0 · Accepted Answer

你应该看看模式

就像是：

Pattern reg = Pattern.compile( "\\b(" + tag + "[!.:?]?)\\b", Pattern. CASE_INSENSITIVE)

然后你应该能够像这样检查条件：

Matcher m = reg.matcher(text)
    ...
if (m.matches()) //true if found a match

m.start(1) / m.end(1) //get the exact location of matched word

java - 匹配字符串中不区分大小写的短语

1 回答 1

Related

Reference