regex - 正则表达式获取分钟而不是秒

Question

我有很多次来自 apache 日志...

96.99.193.124 - - [10/May/2012:22:59:29 +0000] 0 "GET / " 200 123 "-" "-"
96.29.193.124 - - [10/May/2012:22:59:56 +0000] 0 "GET / " 200 123 "-" "-"
96.29.193.125 - - [10/May/2012:22:59:56 +0000] 0 "GET / " 200 123 "-" "-"
96.29.193.125 - - [10/May/2012:23:00:00 +0000] 0 "GET / " 200 123 "-" "-"
96.29.193.125 - - [10/May/2012:23:00:00 +0000] 0 "GET / " 200 123 "-" "-"

要提取日期时间戳，我会：

sed -e 's;^.*\(\[.*\]\).*$;\1;' inputFileName > outputFileName

这给了我

[10/May/2012:22:59:29 +0000]
[10/May/2012:22:59:56 +0000]
[10/May/2012:22:59:56 +0000] 
[10/May/2012:22:59:56 +0000]
[10/May/2012:23:00:00 +0000] 
[10/May/2012:23:00:00 +0000]

我想删除秒部分、方括号和秒，然后得到：

10/May/2012:22:59 
10/May/2012:22:59 
10/May/2012:22:59 
10/May/2012:23:00
10/May/2012:23:00

从原始文件中......有什么提示吗？

score 2 · Accepted Answer

为什么不只是

 echo '96.99.193.124 - - [10/May/2012:22:59:29 +0000] 0 "GET / " 200 123 "-" "-""' \
 | sed 's/^.*\[//;s/ .*$//;s/...$//'

输出

10/May/2012:22:59

解释

       96.99.193.124 - - [10/May/2012:22:59:29 +0000] 0 "GET / " 200 123 "-" "-""'
      ^........pt1.......[                    ...............pt2.................$
                                           :.. (pt3)

每个部分消除一大块多余的字符串

 pt1 s/^.*\]\[// 
     match/deletes everything up to the first [. 
     I use to `\[' to escape the normal meaning of that char in sed 
       as the beginning of a character class, i.e. `[a-z]` (for 1 example)
 pt2 s/ .*$//
     match/deletes everything from the first space char to the end of the line
 pt3 s/...$//
     match/deletes the last 3 chars form the end of the line.

回想一下，在sed

's/matchpattern/replacepattern/' 带有开头的 's' = 替代，是可用的主要工具之一。
正则表达式中的 ^ 字符将匹配锚定到行首
$ char 将正则表达式的匹配锚定到行尾。

您应该只执行 pt1，然后添加 pt2 和 pt3 以轻松查看正在实现的目标。

我希望这有帮助。

score 2 · Accepted Answer

尝试这个

sed -e 's;^.*\[\([^+]*\).*\].*$;\1;'

解释：

1-我把括号放在组外 2-把 +something 放在外面

它完成了。

score 2 · Accepted Answer

这可能对您有用：

sed 's/.*\[\(.*\):.*/\1/' file

您可以利用贪婪来发挥自己的优势，即$.*$:在最后一次之前抓住一切:

score 2 · Accepted Answer

sed -e 's;^.*\[\(.\{17\}\).*\].*$;\1;'

此版本定位起始括号，然后在提取的组中显式包含接下来的 17 个字符（感兴趣的字符串）。

score 1 · Accepted Answer

另一种方式grep -oP：

grep -oP "\[\K[^\]\[ ]+" FILE

如果您的 grep 没有-P开关，请尝试pcregrep

score 1 · Accepted Answer

这是一个模式：

\[(\d+/\w+/\d+:\d+:\d+)

支架用作锚。这里的匹配器很一般。例如，使用\w+which 将匹配包含字母或数字的任何单词来捕获月份，但是使用这种 Apache 行的这种顺序组合的所有匹配器给出了一个健壮的模式。

您在整行上使用此模式，因此不需要先捕获括号内的部分。只需捕获您想要的最终数据。

score 1 · Accepted Answer

sed 's/.*\[//;s/:.. .*//' infile > outfile

在 [ 之前删除，然后从空白处删除。两个命令。

regex - 正则表达式获取分钟而不是秒

7 回答 7

Related

Reference