0

So I am working on a text analytics problem and I am trying to remove all the numbers between 0 and 999 with regular expression in Python. I have tried Regex Numeric Range Generator to get the regular expression but I didn't succed. I can only remove all the numbers.

I have tried several Regex but it didn't work. here's what I tried

# Remove numbers starting from 0 ==> 999
data_to_clean = re.sub('[^[0-9]{1,3}$]', ' ', data_to_clean)

I have tried this also:

# Remove numbers starting from 0 ==> 999
data_to_clean = re.sub('\b([0-9]|[1-8][0-9]|9[0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9])\b', ' ', data_to_clean)  

this one:

^([0-9]|[1-8][0-9]|9[0-9]|[1-8][0-9]{2}|9[0-8][0-9]|99[0-9])$

and this:

def clean_data(data_to_clean):
    # Remove numbers starting from 0 ==> 999
    data_to_clean = re.sub('[^[0-9]{1,3}$]', ' ', data_to_clean)  
    return data_to_clean

I have a lot of numbers but I need to delete just the ones under 3 decimals and keep the other.

Thank You for your help

4

3 回答 3

1

您需要在模式字符串前面加上 anr以防止转义,因此解释器不会\b与退格键交换。另外,您可以像这样简化模式:

data_to_clean = re.sub(r'\b([0-9]|[1-9][0-9]{1,2})\b', ' ', data_to_clean)
于 2019-02-12T14:32:24.103 回答
0

我认为您可以将尝试与单词边界 ( \b) 和最后一次尝试 ( [0-9]{1,3}) 结合使用。

所以生成的正则表达式应该如下所示:\b[0-9]{1,3}\b

如果您查看演示:regex101.com/r/qDrobh/6 它应该替换所有 1 位、2 位和 3 位数字,并忽略更高的数字和其他单词。

于 2019-02-12T14:17:05.227 回答
0

从 0 到 999 的数字是

  1. 单个字符 [0-9]
  2. 两个字符 [1-9][0-9]
  3. 三个字符 [1-9][0-9][0-9]

这给出了一个天真的正则表达式,/\b(?:[0-9]|[1-9][0-9]|[1-9][0-9][0-9])\b/但是我们在选项中复制了字符类,因此我们可以将它们分解出来

/(?!\b0[0-9])\b[0-9]{1,3}\b/

这通过使用否定前瞻(?!\b0[0-9])来检查单词的开头,然后是 0,然后是数字以忽略 01 等,然后查找 1 到三个 0 - 9 字符。因为负前瞻需要至少 2 个字符,所以单个0仍然有效。

于 2019-02-12T14:22:34.217 回答