sql - 需要根据单词列表在 Teradata SQL 中搜索文本字段并返回该单词

Question

我正在一个大型数据库中搜索长度在 5-7 个字符之间的单词列表。到目前为止，我有：

Select *
  from sometable
 Where upper("Description") like any ("%ABC_123%", "%ABC_124%", "%DE_25%")

我还想返回在查询中找到的单词，但我坚持如何在不复制 subtr 函数中的单词列表的情况下执行此操作。

可能有更好的方法可以做到这一点，我会很感激一些方向。

score 5 · Accepted Answer

正如 Rob Paller 已经提到的，TD14 中有正则表达式：

Select sometable.*,
       REGEXP_SUBSTR(Description,'((ABC_)(123|124)|(DE_(25)))') AS match
  from sometable
 Where match <> '';

这应该比数百个 LIKE 更有效。

此外，这可能会被简化，例如，如果您需要 ABC_ 或 DE_ 后跟任意两位或三位数字：

       REGEXP_SUBSTR(Description,'(ABC_|DE_)([0-9]{2,3})') AS match

score 3 · Accepted Answer

你试过POSITION函数和多个OR条件吗？

SELECT *
FROM TABLE
WHERE POSITION('ABC_123' IN UPPER("Description") > 0
   OR POSITION('ABC_124' IN UPPER("Description") > 0
   OR POSITION('DE_25' IN UPPER("Description") > 0;

我认为无论哪种方式，这将是 Teradata 上昂贵的 CPU/IO 进程。我不知道 Teradata 13.x 或更早版本中的本机功能会促进这一点。Teradata 14.x（我认为 14.10）应该在本机引入正则表达式支持，这可能会使这成为一个更简单的解决方案。

你在说多少个列表词？

如果您使用带有LIKE谓词的子查询怎么办？

SELECT *
FROM myTable
WHERE UPPER("Description") 
 LIKE (SELECT ListWord
       FROM myListWords);

您可能必须使列表词在子查询中显示为模式：

SELECT *
FROM myTable
WHERE UPPER("Description") 
 LIKE (SELECT '%' || ListWord || '%' AS ListWordPattern
       FROM myListWords);

score 1 · Accepted Answer

4 年后，但无论如何......与 LIKE 加入很好地配合（在多场比赛的情况下合格）

例如

SELECT a.orig_pattern
,b.my_keyword
FROM table_with_data a
JOIN table_with_keywords b
ON a.orig_pattern LIKE '%'||b.my_keyword||'%'
QUALIFY row_number() OVER (partition by orig_pattern ORDER by orig_pattern) = 1

score 0 · Accepted Answer

要查找行，您还可以使用 REGEXP：

Select *
from sometable
Where Description REGEXP 'ABC_123|ABC_124|DE_25'

我无法弄清楚您的 B 部分，返回找到的单词，除非该单词是“描述”字段中的唯一单词。如果它是唯一的词，您可以简单地返回描述字段的内容。

sql - 需要根据单词列表在 Teradata SQL 中搜索文本字段并返回该单词

4 回答 4

Related

Reference