0

我正在尝试使用 sed 删除多行的开头。目标是删除每一行中的所有字符,直到一个具有两个连续大写字母的单词。

输入将始终类似于:

1 where did you get ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
2 I got your letter: RECEIVE, be sent, be in receipt of, be given.
3 your tea is getting cold: BECOME, grow, turn, go.
4 get the children from school: FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
5 the chairman gets £650,000 a year: EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
6 have the police got their man?: APPREHEND, catch.

我希望输出是:

ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
APPREHEND, catch.

我必须建立这个:

sed -n 's/^.*[A-Z]\{2\}//p'

但是这个表达式也删除了大写的单词。关于如何做到这一点的任何线索?

4

2 回答 2

1

这应该可以工作awk,但它在线给出了错误的输出5

awk '{print substr($0,match($0,/[[:upper:]][[:upper:]]/))}' file
ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
5 the chairman gets
APPREHEND, catch.

match找到两个第一个大写字母,然后substr使用它来打印该行的最后一部分。

于 2013-10-08T09:28:47.393 回答
1

的问题在于缺乏前瞻性和非贪婪选项。解决此问题的一种方法是进行两次替换。第一个获取您想要的文本,将其保存为第 1 组并将其附加到换行符之后,然后删除该换行符之前的所有数据,如下所示:

sed 's/\([A-Z]\{2,\}.*\)/\n\1/; s/[^\n]*\n//' infile

它产生:

ACQUIRE, obtain, come by, receive, gain, earn, win, come into, take 
RECEIVE, be sent, be in receipt of, be given.
BECOME, grow, turn, go.
FETCH, collect, go for, call for, pick up, bring, deliver, convey, ferry, transport.
EARN, be paid, take home, bring in, make, receive, collect, gross; informal pocket, bank, rake in, net, bag.
APPREHEND, catch.
于 2013-10-08T09:07:00.237 回答