5

我有一个以 9 位大学代码开头并以 5 位课程代码结尾的文本文件。

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017

如上面的 3 示例所示,有一些条目有换行符。我需要将第 3 行和第 4 行合并为一个,就像第 1 行和第 2 行一样,以便我可以轻松地使用 grep、awk 等命令。

更新:

凯文的回答似乎不起作用。

cat todel.txt
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

cat todel.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }' 
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531ege of,
4

8 回答 8

1

假设您的数据在“file.txt”中,这是一个扫描,可以将这些行重新组合在一起:

cat file.txt | perl -ne 'chomp; if (/^\d{9}/) { print "\n$_" } else { print "$_\n" }'

这假定所有有效记录都以 9 位数字开头。“chomp”最初会删除换行符,并且模式决定换行符应该出现在输出中的哪个位置。

于 2012-06-25T03:12:20.660 回答
1

关于分割线:此sed脚本假设您在前导数字后至少有一个空格(在分割的第一行),在尾随数字之前有一个空格(在分割的最后一行),并且只有每个分割线一个分割。

修改为接受带有 Windows CRLF 换行符*nix LF 的输入。但请注意,输出是 *nix \n

sed -nr 's/\r?$// # allow for '\r\n' newlines
         /^([0-9]{9}) .* ([0-9]{5})$/{p;b}
         /^([0-9]{9}) /{h;b}
         / ([0-9]{5})$/{x;G; s/\n//; p}' 

或者,更短,但可能不太可读:

sed -nr 's/\r?$//; /^([0-9]{9}) /{/ ([0-9]{5})$/{p;b};h;b};/ ([0-9]{5})$/{x;G; s/\n//; p}' 

我确实希望第一个更快,因为最频繁的测试(对于整行)只涉及一个正则表达式,而第二个(较短的)脚本需要两个正则表达式测试来进行最频繁的测试。

这是我得到的输出;使用GNU sed 4.2.1

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,Pune 61220 enter code hereMechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
于 2012-06-25T03:39:48.850 回答
1

这可能对您有用:

sed ':a;$!N;/ [0-9]\{5\}\n[0-9]\{9\} /!s/\n//;ta;P;D' file

解释:

  • 如果该行没有以空格结尾,后跟五位数字,然后是九位数字,然后是空格,请删除换行符。

编辑:

测试数据:

cat <<\! >/tmp/codel.txt
> 112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
> Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
> !
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/codel.txt 
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
sed ':a;$!N;/\s[0-9]\{5\}\n[0-9]\{9\}\s/!s/\n//;ta;P;D' /tmp/{codel.txt,codel.txt,codel.txt} 
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531
于 2012-06-25T03:54:32.717 回答
0

也许尝试删除逗号后出现的所有换行符,如下所示:

perl -i -pe 's/,\n/,/g' file.txt

也许您想在逗号后留出空格:

perl -i -pe 's/(,\s*)\n/$1/g' file.txt
于 2012-06-25T03:01:49.780 回答
0

试试这个

sed '/^[0-9]\{9\}/{h;};/^[0-9]\{9\}/!{x;G;s/\n//g;}' test | grep -E '[0-9]{5}$'
于 2012-06-25T04:26:56.643 回答
0
awk '! ($1 ~ /^[[:digit:]]/) {$0 = save " " $0} $1 ~ /^[[:digit:]]/ {save = $0} $NF ~ /[[:digit:]]$/ {print}' inputfile
于 2012-06-25T04:55:22.693 回答
0
cat todel.txt |awk 'BEGIN {i=0} {first[i]=$1; lines[i++] = $0;} END {for (x=0; x<i; x++) { if ( x==(i - 1) || (first[x + 1] ~ /^[0-9]+$/ && length(first[x + 1])==9) ) {printf("%s: %s\n", x, lines[x]);} else {printf("%s: %s%s\n", x, lines[x], lines[x + 1]); x++;} } }'
于 2012-06-25T05:18:45.843 回答
0

通过假设有效记录以五位数字结尾,这适用于包含的数据集:

use Modern::Perl;

my $data = do{local $/; <DATA>};
$data =~ s/([^\d]{5})\n/$1 /sg;
say $data;


__DATA__
512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering,
Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of,
Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 93531

输出:

512161000 EN5121 K. K. Jorge Institute of Engineering Education and Research, Nashik 61220 Mechanical Engineering [Second Shift] XOPENH 1 116 16978
517261123 EN5172 R. C. Rustom Institute of Technology, Shirpur 61220 Mechanical Engineering [Second Shift] YOPENH 1 100 29555
617561234 EN6175 abc xyz Education Trust, abc xyz College of Engineering, Pune 61220 Mechanical Engineering [Second Shift] ZOPENH 2 105 25017
112724510 EN1127 Jagadambha Bahuuddeshiya Gramin Vikas Sanstha's Jagdambha College of, Engineering and Technology, Yavatmal 24510 Computer Engineering LSCO 1 55 935315
于 2012-06-25T05:37:47.887 回答