perl - extract a substring of 11 characters from a line using sed,awk or perl

Question

I have a file with many lines, in each line there is either substring

whatever_blablablalsfjlsdjf;asdfjlds;f/watch?v=yPrg-JN50sw&amp,whatever_blabla

or

whatever_blablabla"/watch?v=yPrg-JN50sw&amp" class=whatever_blablablavwhate

I want to extract a substring, like the "yPrg-JN50s" above

the matching pattern is the 11 characters after the string "/watch?="

how to extract the substring

I hope it is sed, awk in one line if not, a pn line perl script is also ok

score 4 · Accepted Answer

你可以做

grep -oP '(?<=/watch\?v=).{11}'

如果你grep知道 Perl 正则表达式，或者

sed 's/.*\/watch?v=\(.\{11\}\).*/\1/g'

score 3 · Accepted Answer

$ cat file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
$
$ awk 'match($0,/\/watch\?v=/) { print substr($0,RSTART+RLENGTH,11) }' file
yPrg-JN50sw
yPrg-JN50sw

score 2 · Accepted Answer

只需通过shell的参数扩展，提取“watch?v=”后的11个字符：

while IFS= read -r line; do
    tmp=${line##*watch?v=}
    echo ${tmp:0:11}
done < filename

score 1 · Accepted Answer

您可以使用 sed 删除无关信息：

sed 's/[^=]\+=//; s/&.*$//' file

或者使用 awk 和合理的字段分隔符：

awk -F '[=&]' '{print $2}' file

内容file：

cat <<EOF > file
/watch?v=yPrg-JN50sw&amp
"/watch?v=yPrg-JN50sw&amp" class=
EOF

输出：

yPrg-JN50sw
yPrg-JN50sw

编辑以适应评论中提到的新要求

cat <<EOF > file
<div id="" yt-grid-box "><div class="yt-lockup-thumbnail"><a href="/watch?v=0_NfNAL3Ffc" class="ux-thumb-wrap yt-uix-sessionlink yt-uix-contextlink contains-addto result-item-thumb" data-sessionlink="ved=CAMQwBs%3D&amp;ei=CPTsy8bhqLMCFRR0fAodowXbww%3D%3D"><span class="video-thumb ux-thumb yt-thumb-default-185 "><span class="yt-thumb-clip"><span class="yt-thumb-clip-inner"><img src="//i1.ytimg.com/vi/0_NfNAL3Ffc/mqdefault.jpg" alt="Miniature" width="185" ><span class="vertical-align"></span></span></span></span><span class="video-time">5:15</span> 
EOF

将 awk 与合理的记录分隔符一起使用：

awk -v RS='[=&"]' '/watch/ { getline; print }' file

请注意，您应该为此类任务使用适当的 XML 解析器。

score 0 · Accepted Answer

0

grep --perl-regexp --only-matching --regexp="(?<=/watch\\?=)([^&]{0,11})"

于 2012-10-30T13:05:55.397 回答

score 0 · Accepted Answer

假设您的行具有您引用的格式，这应该有效。

awk '{print substr($0,10,11)}'

编辑：从另一个答案的评论中，我猜你的行比这更长更复杂，在这种情况下需要更全面的东西：

gawk '{if(match($0, "/watch\\?v=(\\w+)",a)) print a[1]}'

perl - extract a substring of 11 characters from a line using sed,awk or perl

6 回答 6

编辑以适应评论中提到的新要求

Related

Reference