2

我需要从表中检索名称以[:space:]或其他特殊字符开头或结尾的行[:punct:],不包括名称末尾的单个点 ( .)。这个想法是提取可能不一致的名称。

必须出现的例子:

  1. 'GEORGE & SON '- 最后有一个额外的空间。
  2. '-GEORGE & SON'- 一开始有一个额外-的。
  3. '&GEORGE & SON'- 一开始有一个额外&的。
  4. '-GEORGE & SON S.A.'- 一开始有一个额外-的。最后的点.不是问题。
  5. 'GEORGE & SON..'- 结尾不是一个点,而是两个点。对于以多个结尾的字符串来说,这是一个例外.;他们也是坏名字。

不得出现的例子:

  1. 'GEORGE & SON.'- 最后只有一个额外的'.'。

我正在使用表达式:

REGEXP_LIKE(col, '(^[[:punct:]]|[[:punct:]]$)|(^[[:space:]]|[[:space:]]$)')

但是,尽管检索以空格或特殊字符开头或结尾的名称,但也会拉出带有点 '.' 的名称。作为最后一个字符。

我怎样才能改变它以获得我需要的结果?

4

2 回答 2

0

只需{2}在第二个之后添加[[:punct:]]。这意味着该点应该至少出现 2 次

with tab as(
  select 'GEORGE & SON ' as s from dual union all
  select '-GEORGE & SON'  as s from dual union all
  select '&GEORGE & SON'  as s from dual union all
  select 'GEORGE & SON..'  as s from dual union all
  select 'GEORGE & SON.'  as s from dual union all
  select '-GEORGE & SON S.A.' as s from dual  
)
select * from  tab 
where REGEXP_LIKE(s, '(^[[:punct:]]|[[:punct:]]{2}$)|(^[[:space:]]|[[:space:]]$)') 
于 2019-04-04T11:00:11.793 回答
0

由于预定义的标点字符类不适用于字符串的末尾,因此使用自定义字符类代替。故意留下点。单独添加单引号(因为转义它不起作用并且在这种情况下可能很难为q运算符找到正确的字符)。自行添加右方括号,因为 Oracle 在转义时似乎无法正确处理它。最后明确添加尾随的连续点:

WITH T (id, col) AS (
  SELECT 1, 'GEORGE & SON ' FROM DUAL UNION ALL
  SELECT 2, '-GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 3, '&GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 4, 'GEORGE & SON..'  FROM DUAL UNION ALL
  SELECT 5, 'GEORGE & SON.'  FROM DUAL UNION ALL
  SELECT 6, '-GEORGE & SON S.A.' FROM DUAL UNION ALL
  SELECT 7, 'GEORGE & SON!' FROM DUAL UNION ALL
  SELECT 8, 'GEORGE & SON"' FROM DUAL UNION ALL
  SELECT 9, 'GEORGE & SON#' FROM DUAL UNION ALL
  SELECT 10, 'GEORGE & SON$' FROM DUAL UNION ALL
  SELECT 11, 'GEORGE & SON%' FROM DUAL UNION ALL
  SELECT 12, 'GEORGE & SON&' FROM DUAL UNION ALL
  SELECT 13, 'GEORGE & SON(' FROM DUAL UNION ALL
  SELECT 14, 'GEORGE & SON)' FROM DUAL UNION ALL
  SELECT 15, 'GEORGE & SON*' FROM DUAL UNION ALL
  SELECT 16, 'GEORGE & SON+' FROM DUAL UNION ALL
  SELECT 17, 'GEORGE & SON,' FROM DUAL UNION ALL
  SELECT 18, 'GEORGE & SON\' FROM DUAL UNION ALL
  SELECT 19, 'GEORGE & SON-' FROM DUAL UNION ALL
  SELECT 20, 'GEORGE & SON\' FROM DUAL UNION ALL
  SELECT 21, 'GEORGE & SON/' FROM DUAL UNION ALL
  SELECT 22, 'GEORGE & SON:' FROM DUAL UNION ALL
  SELECT 23, 'GEORGE & SON;' FROM DUAL UNION ALL
  SELECT 24, 'GEORGE & SON<' FROM DUAL UNION ALL
  SELECT 25, 'GEORGE & SON=' FROM DUAL UNION ALL
  SELECT 26, 'GEORGE & SON>' FROM DUAL UNION ALL
  SELECT 27, 'GEORGE & SON?' FROM DUAL UNION ALL
  SELECT 28, 'GEORGE & SON@' FROM DUAL UNION ALL
  SELECT 29, 'GEORGE & SON[' FROM DUAL UNION ALL
  SELECT 30, 'GEORGE & SON^' FROM DUAL UNION ALL
  SELECT 31, 'GEORGE & SON_' FROM DUAL UNION ALL
  SELECT 32, 'GEORGE & SON`' FROM DUAL UNION ALL
  SELECT 33, 'GEORGE & SON{' FROM DUAL UNION ALL
  SELECT 34, 'GEORGE & SON|' FROM DUAL UNION ALL
  SELECT 35, 'GEORGE & SON}' FROM DUAL UNION ALL
  SELECT 36, 'GEORGE & SON~' FROM DUAL UNION ALL
  SELECT 37, 'GEORGE & SON''' FROM DUAL UNION ALL
  SELECT 38, 'GEORGE & SON]' FROM DUAL)
SELECT
  * FROM T
 WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)')
 ORDER BY id
;

更新要求

标点后跟一个点

在特殊字符集中添加一个可选的点;从

'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']$'

'[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$'

如在

WITH T (id, col) AS (
  SELECT 40, 'GEORGE & SON^.'FROM DUAL UNION ALL
  SELECT 41, 'GEORGE & SON_.'FROM DUAL UNION ALL
  SELECT 42, 'GEORGE & SON`.'FROM DUAL UNION ALL
  SELECT 43, 'GEORGE & SON{.'FROM DUAL UNION ALL
  SELECT 44, 'GEORGE & SON|.'FROM DUAL UNION ALL
  SELECT 45, 'GEORGE & SON}.'FROM DUAL UNION ALL
  SELECT 46, 'GEORGE & SON~.'FROM DUAL UNION ALL
  SELECT 47, 'GEORGE & SON''.'FROM DUAL UNION ALL
  SELECT 48, 'GEORGE & SON].'FROM DUAL)
SELECT
  * FROM T
 WHERE REGEXP_LIKE(col, '([-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]\.?$')
 ORDER BY id
;

字符串中空格和特殊字符(组合)的重复

最初,只要求出现前导和尾随事件...... ;-)

两个或多个空格/标点字符的序列被

[[:space:][:punct:]]{2,}

如果你想在字符串中明确地使用它,只需要 - 用单词字符包围它们:

\w[[:space:][:punct:]]{2,}\w

当找到单个空格时,前导/尾随连续空格已经匹配 - 无需明确担心它们。
这使:

WITH T (id, col) AS (
  SELECT 50, 'GEORGE & SON  ' FROM DUAL UNION ALL
  SELECT 51, 'GEORGE & SON   '  FROM DUAL UNION ALL
  SELECT 52, '  GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 53, '    GEORGE & SON'  FROM DUAL UNION ALL
  SELECT 54, 'GEORGE &  SON'  FROM DUAL UNION ALL
  SELECT 55, 'GEORGE  & SON S.A.' FROM DUAL UNION ALL
  SELECT 56, 'GEORGE & SON    S.A.' FROM DUAL UNION ALL
  SELECT 60, '  GEORGE and SON'  FROM DUAL UNION ALL
  SELECT 61, ' ,GEORGE and SON' FROM DUAL UNION ALL
  SELECT 62, ', GEORGE and SON'  FROM DUAL UNION ALL
  SELECT 63, 'GEORGE -- SON' FROM DUAL UNION ALL
  SELECT 64, 'GEORGE --SON' FROM DUAL UNION ALL
  SELECT 65, 'GEORGE & SON' FROM DUAL UNION ALL
  SELECT 66, 'GEORGE + SON' FROM DUAL UNION ALL
  SELECT 67, 'GEORGE and  , SON' FROM DUAL UNION ALL
  SELECT 68, 'GEORGE and , SON' FROM DUAL UNION ALL
  SELECT 69, 'GEORGE and SON ,'  FROM DUAL UNION ALL
  SELECT 70, 'GEORGE and SON. '  FROM DUAL UNION ALL
  SELECT 71, 'GEORGE and+-SON'  FROM DUAL)
SELECT
  * FROM T
--  WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|[[:space:][:punct:]]{2,}')
  WHERE REGEXP_LIKE(col, '(^[[:punct:]]|[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']\.?$)|]$|\.\.$|(^[[:space:]]|[[:space:]]$)|\w[[:space:][:punct:]]{2,}\w')
  ORDER BY id
;

但这会产生误报,最突出的是GEORGE & SON。在某种程度上,可以通过将[:punct:]替换为包含较少的集合来避免这种情况。(最终)选择将取决于假阴性或假阳性是否更令人担忧。

看看它的实际效果:

捕获任意序列的标点符号和空格字符 - 但允许单个字母后跟一个点和一个空格

如前所述,误报需要与误报相平衡。一种方式或另一种方式。但是,这可能是考虑将整体问题分解为较小问题并单独处理它们的好时机。即使GEORGE 和 P. SON是完全可以接受的,您也可能想要查看,例如-GEORGE 和 P. SON。因此,让我们关注字符串中间的杂散字符序列——甚至记住之前的 ** 和 **,并允许枚举(因此允许使用逗号):

WHERE
  REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
  AND
  NOT REGEXP_LIKE(col, ' [[:upper:]]\. \w')
  AND
  NOT INSTR(col, ', ') > 0
  AND
  NOT INSTR(col, ' & ') > 0

大概紧随其后

  WHERE
  REGEXP_LIKE(col, '\w[[:space:][:punct:]]{2,}\w')
  AND
  (REGEXP_LIKE(col, ' [[:upper:]]\. \w')
   OR
   INSTR(col, ', ') > 0
   OR
   INSTR(col, ' & ') > 0
  )

为了在许多有效值之间找到例如GEORGE 和 , SON 。INSTR可能比 REGEX 更快 - 取决于整体情况......</p>

多说几句力学

(i) [[:punct:][:space:]]本质上将[[:punct:]][[:space:]]组合成一个字符类。就该类的选择而言,顺序无关紧要。

(二)

[-!"#$%&()*+,\/:;<=>?@[^_`{|}~' || '''' || ']

[-!"#$%&()*+,\/:;<=>?@[^_`{|}~]

添加了单引号。如果直接尝试,Oracle 会考虑使用单引号来结束参数值。并且用反斜杠转义单引号不起作用......所以基本上,这就是上面所说的“单独添加单引号”。

如果需要调整/进一步详细信息,请发表评论。

于 2019-04-04T14:39:10.567 回答