6

TaskID_1我想用一个开始的序列替换字符串,1001TaskID_1可以在我的输入文件中存在任意多行。同样,我需要TASKID_2用下一个序列值替换输入文件中所有出现的1002

输入文件:

12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12

输出文件应如下所示:

12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
4

5 回答 5

9

这是一种使用方法awk

awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR }1' file

或者不那么冗长:

awk -F '|' '{ $3=1000 + NR }1' OFS='|' file

结果:

12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1002|dksj|kdjfdsjf|12
1234|25345|1003|dksj|kdjfdsjf|12
123425|65345|1004|dksj|kdjfdsjf|12
123425|15325|1005|dksj|kdjfdsjf|12
11345|55315|1006|dksj|kdjfdsjf|12
6345|15345|1007|dksj|kdjfdsjf|12
72345|25345|1008|dksj|kdjfdsjf|12
9345|411345|1009|dksj|kdjfdsjf|12

对于第一个示例,文件分隔符和输出文件分隔符设置为单个管道字符。这是在BEGIN块中设置的,因此它只执行一次,而不是在每一行输入。然后我们将第三列设置为等于 1000 加上一个递增变量。我们可以将++i其用作此变量,但我们可以改为使用NR(记录号/行号的缩写),因此这将避免创建额外变量的需要。最后1的默认情况下启用打印。更详细的解决方案如下所示:

awk 'BEGIN { FS=OFS="|" } { $3=1000 + NR; print }' file

编辑:

使用更新的数据文件,尝试:

awk 'BEGIN { FS=OFS="|" } { sub(/.*_/,"",$3); $3+=1000 }1' file

结果:

12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
于 2012-12-19T12:26:06.693 回答
4

使用 Steve 的加 1000 逻辑的 Perl 解决方案:

perl -pne 's/TaskID_(\d+)/$1+1000/e;' file

这会将“TaskID_n”替换为 1000+n。'e' 用于评估替换。

于 2012-12-19T14:03:40.600 回答
2

我想不出比 awk 中建议的一个 steve 更好的解决方案。

所以这是一个更糟糕的解决方案,只使用 bash。

#!/bin/bash

IFS='|'

while read f1 f2 f3 f4 f5 f6; do
    printf '%s|%s|%d|%s|%s|%s\n' "$f1" "$f2" "$((${f3#*_}+1000))" "$f4" "$f5" "$f6"
done < input

它“更糟”只是因为它会比 awk 慢得多,而 awk 处理这类问题既快速又高效。

于 2012-12-19T12:36:07.463 回答
2

替换TaskID_为,对于单个数字 ID 100,这非常容易:sed

$ sed 's/TaskID_/100/' file
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12

要将此更改存储回文件,请使用以下-i选项:

sed -i 's/TaskID_/100/' file

注意:这适用于TaskID_[0-9]如果你想TaskID_23映射到1023然后这不会,这将映射TaskID_2310023.

于 2012-12-19T14:07:39.300 回答
0
perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",@F)' your_file

测试如下:

> cat temp
12345|45345|TaskID_1|dksj|kdjfdsjf|12
1245|425345|TaskID_1|dksj|kdjfdsjf|12
1234|25345|TaskID_2|dksj|kdjfdsjf|12
123425|65345|TaskID_2|dksj|kdjfdsjf|12
123425|15325|TaskID_1|dksj|kdjfdsjf|12
11345|55315|TaskID_2|dksj|kdjfdsjf|12
6345|15345|TaskID_3|dksj|kdjfdsjf|12
72345|25345|TaskID_4|dksj|kdjfdsjf|12
9345|411345|TaskID_3|dksj|kdjfdsjf|12
> perl -F"\|" -lane '$F[2]=~s/.*_/100/g;print join("|",@F)' temp
12345|45345|1001|dksj|kdjfdsjf|12
1245|425345|1001|dksj|kdjfdsjf|12
1234|25345|1002|dksj|kdjfdsjf|12
123425|65345|1002|dksj|kdjfdsjf|12
123425|15325|1001|dksj|kdjfdsjf|12
11345|55315|1002|dksj|kdjfdsjf|12
6345|15345|1003|dksj|kdjfdsjf|12
72345|25345|1004|dksj|kdjfdsjf|12
9345|411345|1003|dksj|kdjfdsjf|12
> 
于 2012-12-20T06:46:38.640 回答