bash - 通过在行中向后打印两个字符之间的单词

Question

我在从一行中提取单词时遇到问题。我想要的是它选择符号 # 之前但 / 之后的第一个单词。这是唯一突出的分隔符。

一行看起来像这样：

,["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]

我想要这个词Programming。

为了得到那条线，我正在使用它来缩小范围。

sed -n '/.*picasa.*.jpg/p' 5743548866439293105

所以我希望它几乎可以找到#然后向后退，直到它碰到第一个/。然后打印出来。在这种情况下，这个词应该是Programming但可以是任何东西。

我希望它尽可能短并尝试过

sed -n '/.*picasa.*.jpg/p' 5743548866439293105 | awk '$0=$2' FS="/" RS="[$#]"

score 1 · Accepted Answer

你可以这样做sed（稍微缩短格式化，但也适用于你的原始字符串）：

pax> echo ',["https://p.g.com/111/Prog#574' | sed 's/^[^#]*\/\([^#]*\)#.*$/\1/'
Prog
pax>

更详细地解释：

    /---+------------------> greedy capture up to '/'.
   /    |
   |    | /------+---------> capture the stuff between '/' and '#'.
   |    |/       |
   |    ||       | /-+-----> everything from '#' to end of line.
   |    ||       |/  |
   |    ||       ||  |
's/^[^#]*\/\([^#]*\)#.*$/\1/'
                      ||
                      \+---> replace with captured group.

它基本上搜索具有您想要的模式的整行（首先#跟随 a /），同时捕获（使用\(和\)括号）只是和之间的/东西#。

然后，替换将整行替换为您感兴趣的捕获文本（通过\1）。

score 1 · Accepted Answer

grep与一些 Perl 正则表达式扩展一起使用：

echo $string | grep -P -o "(?<=/)[^/]+(?=#)"

-P告诉grep使用 Perl 扩展。-o告诉grep只显示匹配的文本。要了解匹配的内容，请将正则表达式分成三个部分：(?<=/)、[^/]+?和(?=#). 第一部分说匹配的文本必须跟随一个'/'，而不包括'/'在匹配中。第二部分匹配一串非'/'字符。最后一部分说匹配的文本必须紧跟一个'#'，匹配中不包括'#'。

另一个 grep，使用“\K”功能将匹配“丢弃”到 '#' 之前的最后一个 '/'：

# Match as much as possible up to a '/', but throw it away, then match as much as you can
# up to the first #
echo $string | grep -oP ".*/\K.+(?=#)"

使用cutandawk获取第一个字段（在 # 上拆分），然后是最后一个字段（在 / 上拆分）：

echo $string | cut -d# -f1 | awk -F/ '{print $NF}'

使用一些临时变量和 bash 的参数扩展工具：

$ FOO=["https://picasaweb.google.com/111560558537332305125/Programming#5743548966953176786",1,["https://lh6.googleusercontent.com/-Is8rb8G1sb8/T7UvWtVOTtI/AAAAAAAAG68/Cht3FzfHXNc/s0-d/Geek.jpg",1920,1200]
$ BAR=${FOO%#*}      # Strip the last # and everything after
$ echo $BAR
[https://picasaweb.google.com/111560558537332305125/Programming
$ BAZ=${BAR##*/}     # Strip everything up to and including the last /
$ echo $BAZ
Programming

score 0 · Accepted Answer

0

这可能对您有用：

sed '/.*\/\([^#]*\)#.*/{s//\1/;q};d' file

于 2012-05-31T06:12:04.643 回答

bash - 通过在行中向后打印两个字符之间的单词

3 回答 3

Related

Reference