regex - 检查字符串的最后一个单词是否（不区分大小写）包含在另一个字符串中

Question

我正在使用regexSPARQL 函数，并以这种方式将两个变量传递给它：

FILTER regex(?x, ?y, "i")

例如，我想比较这两个字符串：Via de' cerretani和via dei Cerretani. 通过提取第一个字符串的重要单词，cerretani在这种情况下通常是最后一个单词，并检查它是否包含在第二个字符串中。如您所见，我将这两个字符串作为变量传递。我怎样才能做到这一点？

score 2 · Accepted Answer

起初，我认为这是您之前的问题的副本，即Comparing two strings with SPARQL，但这是在询问返回编辑距离的函数。这里的任务更加具体：检查一个字符串的最后一个单词是否包含在另一个字符串中（不区分大小写）。只要我们采用您的规格

字符串的重要单词……通常是最后一个

严格且始终只使用字符串的最后一个单词（因为通常无法确定“字符串的重要单词”是什么），我们可以这样做。不过，您最终不会使用该regex功能。相反，我们将使用replace、contains和lcase（或ucase）。

诀窍是我们可以?x通过使用replace删除最后一个单词（以及一个之前的空格）的所有单词来获取字符串的最后一个单词，然后可以使用strcontains检查这个最后一个单词是否包含在另一个字符串中。使用大小写规范化函数（在下面的代码中，我使用了lcase，但ucase也应该可以工作）我们可以不区分大小写地进行包含检查。

select ?x ?y ?lastWordOfX ?isMatch ?isIMatch where { 
  # Values gives us some test data.  It just means that ?x and ?y
  # will be bound to the specified values.  In your final query, 
  # these would be coming from somewhere else.
  values (?x ?y) {
    ("Via de' cerretani" "via dei Cerretani")
    ("Doctor Who" "Who's on first?")
    ("CaT" "The cAt in the hat")
    ("John Doe" "Don't, John!")
  }

  # For "the significant word of the string which is
  # usually the last one", note that the "all but the last word" 
  # is matched by the pattern ".* ".  We can replace "all but the
  # last word to leave just the last word.  (Note that if the
  # pattern doesn't match, then the original string is returned.
  # This is good for us, because if there's just a single word, 
  # then it's also the last word.)
  bind( replace( ?x, ".* ", "" ) as ?lastWordOfX )

  # When you check whether the second string contains the first, 
  # you can either leave the cases as they are and have a case
  # sensitive check, or you can convert them both to the same 
  # case and have a case insensitive match.
  bind( contains( ?y, ?lastWordOfX ) as ?isMatch )
  bind( contains( lcase(?y), lcase(?lastWordOfX) ) as ?isIMatch )
}

---------------------------------------------------------------------------------
| x                   | y                    | lastWordOfX | isMatch | isIMatch |
=================================================================================
| "Via de' cerretani" | "via dei Cerretani"  | "cerretani" | false   | true     |
| "Doctor Who"        | "Who's on first?"    | "Who"       | true    | true     |
| "CaT"               | "The cAt in the hat" | "CaT"       | false   | true     |
| "John Doe"          | "Don't, John!"       | "Doe"       | false   | false    |
---------------------------------------------------------------------------------

这可能看起来像很多代码，但因为有注释，并且最后一个单词绑定到另一个变量，并且我已经包含了区分大小写和不区分大小写的匹配项。当你实际使用它时，它会短得多。例如，仅选择那些?x和?y以这种方式匹配的：

select ?x ?y {
  values (?x ?y) {
    ("Via de' cerretani" "via dei Cerretani")
    ("Doctor Who" "Who's on first?")
    ("CaT" "The cAt in the hat")
    ("John Doe" "Don't, John!")
  }
  filter( contains( lcase(?y), lcase(replace( ?x, ".* ", "" ))))
}

----------------------------------------------
| x                   | y                    |
==============================================
| "Via de' cerretani" | "via dei Cerretani"  |
| "Doctor Who"        | "Who's on first?"    |
| "CaT"               | "The cAt in the hat" |
----------------------------------------------

这是真的

contains( lcase(?y), lcase(replace( ?x, ".* ", "" )))

比类似的东西长一点

正则表达式（？x，？y，“一些特殊标志”）

但我认为它相当短。如果您愿意使用最后一个单词 of?x作为正则表达式（这可能不是一个好主意，因为您不知道它不包含特殊的正则表达式字符），您甚至可以使用：

regex( replace( ?x, ".* ", "" ), ?y, "i" )

但我怀疑它使用起来可能更快contains，因为regex还有很多东西要检查。

regex - 检查字符串的最后一个单词是否（不区分大小写）包含在另一个字符串中

1 回答 1

Related

Reference