0

我已经完成了研究,但无法找到解决问题的方法。我正在尝试提取字符串中的所有有效单词(以字母开头)并将它们与下划线(“_”)连接起来。我正在寻找 awk、sed 或 grep 等的解决方案。

就像是:

echo "The string under consideration" | (awk/grep/sed) (pattern match)

示例 1

输入:

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11

期望的输出:

L2_Traffic_house_seen_during_ABCD_from

示例 2

输入:

XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi

期望的输出:

XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi

示例 3

输入:

ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

期望的输出:

ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
4

4 回答 4

2

这可能对您有用(GNU sed):

sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file
于 2012-07-19T10:13:46.030 回答
1

一种使用awk, 内容的方法script.awk

BEGIN {
    FS="[^[:alnum:]_]"
}

{
    for (i=1; i<=NF; i++) {
        if ($i !~ /^[0-9]/ && $i != "") {
            if (i < NF) {
                printf "%s_", $i
            }
            else {
                print $i
            }
        }
    }
}

像这样运行:

awk -f script.awk file.txt

或者,这是一个班轮:

awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt

结果:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
于 2012-07-19T11:22:13.927 回答
1

单线perl。它搜索任何字母字符,后跟包含在单词边界中的任意数量的单词字符。使用该/g标志为每一行尝试多个匹配项。

内容infile

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

Perl命令:

perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile

输出:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE
于 2012-07-19T10:25:44.493 回答
0

这个解决方案需要一些调整,我认为需要 gawk 将正则表达式作为“记录分隔符” http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records
gawk -v ORS='_' -v RS='[-: \"()]' '/^[a-zA-Z]/' file.dat

于 2012-07-19T10:07:56.960 回答