unix - 从单行文本中获取模式的所有实例，编辑，管道输出到行分隔的文本文件

Question

我有一个文本块（单行），它是由标签和一堆其他垃圾分隔的 URL 列表。我想解析与“http.*">RSS”匹配的 URL 的整个块，编辑该模式的所有实例（以删除 glob 之后的所有内容），并将整个内容作为 line- 输出到文件中分开的条目。

我想我可以用 GREP 做到这一点（然后用 SED 编辑和添加新行），但是 GREP 抓取匹配的行，而不是匹配的模式。我应该使用不同的命令吗？我还尝试使用 SED 在模式之前添加一个换行符 (\n)，无论它发生在哪里，但这也不起作用。

编辑：这是我正在使用的数据的示例：

OUT</a> (<a href="https://evilcakes.wordpress.com/rss">RSS</a>)</li><li><a href="http://eater.com/" title="">Eater National</a> (<a href="http://feeds.feedburner.com/EaterNational">RSS</a>)</li><li><a href="http://www.foodtechconnect.com" title="">Food+Tech Connect</a> (<a href="http://feeds.feedburner.com/foodtechconnect">RSS</a>)</li><li><a href="http://www.innatthecrossroads.com" title="">Inn at the Crossroads</a> (<a href="http://innatthecrossroads.com/feed/">RSS</a>)</li><li><a href="http://www.seriouseats.com/" title="">Serious Eats</a> (<a href="http://feeds.seriouseats.com/seriouseatsfeaturesvideos">RSS</a>)</li><li><a href="http://www.thatsnerdalicious.com" title="">That's Nerdalicious!</a> (<a href="http://www.thatsnerdalicious.com/feed/">RSS</a>)</li><li><a href="http://thedrunkenmoogle.com/" title="">The Drunken Moogle</a> (<a href="http://www.thedrunkenmoogle.com/rss">RSS</a>)</li></ul></li><li><h2 class="entry-title">Comics</h2><ul class="opmlGroup"><li><a

score 3 · Accepted Answer

3

这可能对您有用（GNU sed）：

sed '/https\?:[^"]*/!d;s//\n&\n/;s/^[^\n]*\n//;P;D' file

于 2012-11-24T07:15:29.300 回答

score 3 · Accepted Answer

这是一种使用方法GNU grep：

grep -oP 'http[^"]*(?=">RSS)' file

结果：

https://evilcakes.wordpress.com/rss
http://feeds.feedburner.com/EaterNational
http://feeds.feedburner.com/foodtechconnect
http://innatthecrossroads.com/feed/
http://feeds.seriouseats.com/seriouseatsfeaturesvideos
http://www.thatsnerdalicious.com/feed/
http://www.thedrunkenmoogle.com/rss

选项：

-o, --only-matching
    Print only the matched (non-empty) parts of a matching line, with each such 
    part on a separate output line.
-P, --perl-regexp
    Interpret PATTERN as a Perl regular expression. This is highly experimental
    and grep -P may warn of unimplemented features.

此外，您可能想阅读环视断言。HTH。

编辑：

这是另一种使用方式awk：

awk -F\" '{ for(i=1;i<NF;i++) if ($(i+1) ~ /RSS/) print $i }' file

结果：

https://evilcakes.wordpress.com/rss
http://feeds.feedburner.com/EaterNational
http://feeds.feedburner.com/foodtechconnect
http://innatthecrossroads.com/feed/
http://feeds.seriouseats.com/seriouseatsfeaturesvideos
http://www.thatsnerdalicious.com/feed/
http://www.thedrunkenmoogle.com/rss

score 1 · Accepted Answer

我将您的示例数据放在 urls.dat 中。

cat urls.dat | awk '{n=split($0,a,"\""); for (i=1;i<=n;i++) if ( match( a[i], "^http" ) ) print a[i]; }'

score 1 · Accepted Answer

这是一种适用于 GNU 和 BSD grep 的方法：

<infile grep -Eo 'https?://[^"]+">RSS*' | grep -Eo '^[^"]+'

输出：

https://evilcakes.wordpress.com/rss
http://feeds.feedburner.com/EaterNational
http://feeds.feedburner.com/foodtechconnect
http://innatthecrossroads.com/feed/
http://feeds.seriouseats.com/seriouseatsfeaturesvideos
http://www.thatsnerdalicious.com/feed/
http://www.thedrunkenmoogle.com/rss

unix - 从单行文本中获取模式的所有实例，编辑，管道输出到行分隔的文本文件

4 回答 4

Related

Reference