3

我想我对这{3,5}部分的工作方式有错误的想法。

据我了解,它指定了数字必须遵守的范围以返回搜索结果?

例如,3,5表示返回搜索的 3-5 位数字。经过一些实验,我意识到我的逻辑并不完全正确。

它似乎适用于 3-5 个字符,然后是 8,9 和 10 个字符。

我在这里错过了一个模式吗?或者更简单地说,有人可以向我解释其背后的逻辑吗?它只是 3 或 5 的倍数吗?包括3-5的范围?这里真是一头雾水。谢谢!

user@matrix:~> echo 1234567891234 | grep '[0-9]{3,5}'

1234567891234

以上匹配成功,包含12个字符...

4

4 回答 4

2

It's working exactly as you've written it:

grep '[0-9]\{3,5\}'  - Is there 3 to 5 sequential numeric characters in this string?

If the string is 1234567891234, there is a sub-string in there that contains 3 - 5 numeric characters.

If you are only interested in strings that only contain 3 - 5 numeric characters and no more than 5 characters, you have to put some boundaries in your regular expression. You should also use the -E flag which uses the more modern version of the regular expressions:

$ echo 12345678901234 | grep -E "(^|[^0-9])[0-9]{3,5}([^0-9]|$)"

This will not print anything, but this will:

$ echo 1234 | grep -E "(^|[^0-9])[0-9]{3,5}([^0-9]|$)"

And this:

$ echo 12345aaa6789aaa01234 | grep -E "(^|[^0-9])[0-9]{3,5}([^0-9]|$)"

The first (^|[^0-9]) says either at the beginning of the line (That's the leading ^), or anything besides the characters 0-9. (That's the [^0-9]). Using the (...|...) in an extended regular expression means either the expression on the left or the expression on the right. The same goes for the ending ([^0-9]|$) which says either non numerics or the end of a line.

In the middle is your [0-9]{3,5} (no backslash needed for the extended expression). This says between 3 to 5 digits. And, since it is bound on either side by non-digits, or the beginning or end of the string, this will do what you want.

A couple of things:

$ echo 12345aaa6789aaa01234 | grep -E "(^|[^0-9])[0-9]{3,5}([^0-9]|$)"

and

$ grep -E "(^|[^0-9])[0-9]{3,5}([^0-9]|$)" <<<"12345aaa6789aaa01234"

Mean pretty much the same thing. However, the second is more efficient since only a single process has to run, and there's no piping. Plus, it's shorter to type.

Also, you can use (and it's preferred to use) character classes:

$ grep -E "(^|[^[[:digit:]])[[:digit:]]{3,5}([^[:digit:]]|$)"<<<"12345aaa6789aaa01234"

This will allow your regular expression to work even if you aren't in a place that uses Latin alphanumeric characters. This is a shorter way to do the same since \d is the same class as [:digit:]:

$ grep -E "(^|[^\d])\d{3,5}([^\d]|$)"<<<"12345aaa6789aaa01234"
于 2013-08-14T21:37:31.800 回答
2

您可以使用该-o选项来可视化 grep 的工作方式:

echo 1234567891234 | grep -o '[0-9]\{3,5\}'

输出:

12345
67891
234

-o将在每次匹配后添加一个新行以输出。如果没有该选项, grep 将仅打印发生匹配的整行 - 输入字符串本身又将是什么。这样,您将无法看到 grep 如何与字符串完全匹配。

但是现在您可以看到 grep 在该行中找到了多个匹配项,其中 2 次是 5 位字符串,1 次是 3 位字符串。

{}除非您使用该-E选项,否则您还需要在括号前加上斜杠。

于 2013-08-14T21:00:08.757 回答
2

假设 {3,5} 定义了在它之前的类中选择的字符的重复,你是正确的 - 在 3 到 5(包括两者)重复之间。您也可以执行类似{3,}的操作 - “至少 3 次”

使用-Ex选项,E- 这样您就不必在括号前使用斜杠,也不必x为了整行:

[alfasin@otrs ~]$ echo 1234567891234 | grep -Ex '[0-9]{3,5}'
[alfasin@otrs ~]$ echo 1234567891234 | grep -Ex '[0-9]{3,13}'
1234567891234

从 grep 手册:

-E, --extended-regexp 将 PATTERN 解释为扩展的正则表达式(ERE,见下文)。(-E 由 POSIX 指定。)

-x, --line-regexp 仅选择与整行完全匹配的匹配项。(-x 由 POSIX 指定。)

于 2013-08-14T21:05:36.547 回答
1

当您使用该特定正则表达式时,它会匹配输入字符串中的前 5 个字符(请参阅http://regexpal.com/?flags=g®ex=[0-9]{3%2C5}&input=1234567891234%0A可视化)。一旦grep找到匹配项,它就会停止处理并返回匹配的行。它甚至不注意那场比赛之外的任何事情。

如果您正在寻找仅匹配3-5 位数字的孤立序列的内容,请尝试使用如下正则表达式

\b[0-9]{3,5}\b

' \b' 将匹配单词边界,表示单词字符(字母、数字等)和非单词字符(空格、标点符号等)之间的转换。这将生成匹配1234,但不匹配12or 1234567891234

您还可以使用环视作为一种更强大的方法来确保您的匹配前后没有数字。然而,grep 对环视的支持似乎并不完整,因此您可能不得不改用 perl 之类的东西。

于 2013-08-14T22:03:23.470 回答