2

我是 awk 的新手,并且一直在努力尝试让它发挥作用。我正在尝试获取“image.list”中的文件列表并从中创建一个“信息”文件。我需要从文件名中间获取匹配正则表达式(8-11 位数字长)的字符串,然后将匹配项打印到我的“信息文件”中的指定位置。最后一部分是我无法完成的部分。希望得到一些帮助来解决这个问题。

这是我的测试文件列表:

SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

这是我当前的代码:

awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
> {print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER: ",($1~/^[0-9]{8,11}$/);}
> END{print "END REPORT";
> }' image.list > upload.tag

这是我当前的输出:

-----TEST TAG FILE      ENCOUNTERS-----
FILE:  /tmp/imagetest/SURGERY0001275678image1.jpg        ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY11134900211image2.jpg       ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY19257012image3.jpg          ENCOUNTER:  0
FILE:  /tmp/imagetest/SURGERY273142590image4.jpg         ENCOUNTER:  0
END REPORT

我需要它显示的是“ENCOUNTER:”之后文件名中间的 8-11 位数字。到目前为止,我尝试过的所有内容都输出整个文件名或“0”。

我可能偏离了路线,所以我很想从你们的专家那里得到一些帮助!

4

7 回答 7

5

重新使用现有代码:

$ awk '
BEGIN {
    print "-----TEST TAG FILE\tENCOUNTERS-----";
}
match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
    print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
}
END { 
    print "END REPORT";
}' testfile

测试:

$ cat testfile
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

$ awk '
> BEGIN {
>     print "-----TEST TAG FILE\tENCOUNTERS-----";
> }
> match($0,/[^0-9]+([0-9]+)[^0-9]+/,ary) {
>     print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER:"ary[1]
> }
> END { 
>     print "END REPORT";
> }' testfile
-----TEST TAG FILE      ENCOUNTERS-----
FILE:  /tmp/imagetest/SURGERY0001275678image1.jpg        ENCOUNTER:0001275678
FILE:  /tmp/imagetest/SURGERY11134900211image2.jpg       ENCOUNTER:11134900211
FILE:  /tmp/imagetest/SURGERY19257012image3.jpg          ENCOUNTER:19257012
FILE:  /tmp/imagetest/SURGERY273142590image4.jpg         ENCOUNTER:273142590
END REPORT

正如 Ed Morton 在评论中所建议的那样,使用数组参数来匹配()这个解决方案只是 GNU awk。

于 2013-06-14T20:34:38.037 回答
3

GNU sed

sed -r -e 's#(.*)#FILE:\t/tmp/imagetest/\1#;s/([0-9]*)(i[^i]*)$/\1\2\tENCOUNTER:\1/;1i -----TEST TAG FILE      ENCOUNTERS-----' -e '$aEND REPORT' file
-----测试标签文件遭遇-----
文件:/tmp/imagetest/SURGERY0001275678image1.jpg 遭遇:0001275678
文件:/tmp/imagetest/SURGERY11134900211image2.jpg 遭遇:11134900211
文件:/tmp/imagetest/SURGERY19257012image3.jpg 遭遇:19257012
文件:/tmp/imagetest/SURGERY273142590image4.jpg 遭遇:273142590
结束报告
于 2013-06-14T21:16:57.433 回答
2

下面是常用的 awk 函数“extract()”,用于提取与 RE 匹配的字符串:

awk -v re='<whatever>' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
  return RSTART
}
extract($0,re) { print RMATCH }
'

只需将“re”设置为您想要匹配的任何内容,例如:

$ cat file
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

$ awk -v re='[[:digit:]]{8,11}' '
function extract(str,regexp)
{ RMATCH = (match(str,regexp) ? substr(str,RSTART,RLENGTH) : "")
  return RSTART
}
extract($0,re) { print RMATCH }
' file
0001275678
11134900211
19257012
273142590

或者,如果您更喜欢使用相同 match()+substr() 方法的更具体的解决方案:

$ awk '
BEGIN{ print "-----TEST TAG FILE\tENCOUNTERS-----" }
{ printf "FILE:  %s\tENCOUNTER: %d\n", $0, (match($0,/[[:digit:]]{8,11}/) ? substr($0,RSTART,RLENGTH) : 0) }
END{ print "END REPORT" }
' file
-----TEST TAG FILE      ENCOUNTERS-----
FILE:  SURGERY0001275678image1.jpg      ENCOUNTER: 1275678
FILE:  SURGERY11134900211image2.jpg     ENCOUNTER: 11134900211
FILE:  SURGERY19257012image3.jpg        ENCOUNTER: 19257012
FILE:  SURGERY273142590image4.jpg       ENCOUNTER: 273142590
END REPORT

请注意,如果您的所有文件名都遵循相同的模式并且在您关心的 8-11 位数字流之前没有其他数字,您可以只使用[[:digit:]]+匹配的 RE,而不是根据需要显式指定范围[[:digit:]]{8,11}

于 2013-06-15T11:27:59.883 回答
0

尝试这个:

$ cat input
SURGERY0001275678image1.jpg
SURGERY11134900211image2.jpg
SURGERY19257012image3.jpg
SURGERY273142590image4.jpg

$ awk '{split($1,a,/[[:alpha:]]*/);print a[2]}' input
0001275678
11134900211
19257012
273142590
于 2013-06-14T20:18:34.697 回答
0
awk '{encounter=$1; sub("^[^0-9]*([0-9]{8,11}).*", "\\1", encounter);
      print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER: ",encounter;}'
于 2013-06-14T20:29:52.363 回答
0

尝试以下操作:

awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{print "FILE:  /tmp/imagetest/"$1,"\t","ENCOUNTER: ",gensub(/[^0-9]*([0-9]*).*/, "\\1", 1, $1);}
END{print "END REPORT";
}' image.list > upload.tag
于 2013-06-14T20:27:26.250 回答
0

这个

awk 'BEGIN {print "-----TEST TAG FILE\tENCOUNTERS-----";}
{printf "FILE:  /tmp/imagetest/"$1"\tENCOUNTER: ";if($1~/[0-9]{8,11}/){sub(/
[0-9]+\.jpg$/,"",$1); gsub(/[a-zA-Z]/,"",$1);print $1}}
END{print "END REPORT";
}' image.list

将打印

-----TEST TAG FILE      ENCOUNTERS-----
FILE:  /tmp/imagetest/SURGERY0001275678image1.jpg        ENCOUNTER: 0001275678
FILE:  /tmp/imagetest/SURGERY11134900211image2.jpg       ENCOUNTER: 11134900211
FILE:  /tmp/imagetest/SURGERY19257012image3.jpg          ENCOUNTER: 19257012
FILE:  /tmp/imagetest/SURGERY273142590image4.jpg         ENCOUNTER: 273142590
END REPORT
于 2013-06-14T20:41:35.750 回答