0

见下文我有我想要格式化的这些数据,以便我可以将这些数据插入到 mysql 表中。


  fdsfsfrerwrwrwr_Core             55000:10011608      ipv4        Vl1162
  xcvbdtykgjfhffghrfhfhffghCore    55000:10014478      ipv4        Vl1447
  675436346ore                     55000:1004868       ipv4
  545_Core                         55000:1004128       ipv4
  345235345                        55000:2728          ipv4
  534534e                          55000:1002108       ipv4        Vl2105
  C8567566756Core                  55000:10021038      ipv4        Vl2103
  C546346453554_Core               55000:2105898       ipv4        Vl664
                                                                   Vl896
  ewttt_WAN_Core                   55000:1007967       ipv4        Vl2552
  tetrewCore                       55000:1001708       ipv4        Vl905
  gsdgdsfgore                      55000:100106        ipv4
  65434533_Core                    55000:1009418       ipv4        Vl941
                                                                   Vl1028
  I2222222-11                      55000:10008         ipv4
  I666r555-12                      55000:20002         ipv4        Vl749
                                                                   Vl874
                                                                   Vl894
                                                                   Vl942
                                                                   Vl1172
                                                                   Vl1439
                                                                   Vl2553
  345345353Core                    55000:1004068       ipv4        Vl50
  5345345Core                      55000:1004498       ipv4        Vl617
  S534534                          55000:1002798       ipv4        Vl779
  534535335Core                    55000:1004278       ipv4
  test                             55000:500500        ipv4

我想使用 AWK 或任何其他工具来产生类似于以下的结果,以便我可以将此数据插入 mysql 数据库。我使用了以下 AWK 命令。猫样本.data | awk '{打印 $1","$2","$3","$4}'


TEST1_Core,990010:10011608,ipv4,Vl1162
AA2_Autism_Core,990010:10014478,ipv4,Vl1447
6753312_Core,990010:1004868,ipv4,
542343423,990010:1004128,ipv4,
Bgdfdfgdf,990010:2728,ipv4,
gfgCore,990010:1002108,ipv4,Vl2105
fgdfgfgdfg_Core,990010:10021038,ipv4,Vl2103
42342342342342Core,990010:2105898,ipv4,Vl664

请注意,最后一列有多行关联到一条记录。


  ghfghfhdfCore                    990010:1009418       ipv4        Vl941
                                                                   Vl1028
  hfghfghdf11                      990010:10008         ipv4
  yreyryer-12                      990010:20002         ipv4        Vl749
                                                                   Vl874
                                                                   Vl894
                                                                   Vl942
                                                                   Vl1172
                                                                   Vl1439
                                                                   Vl2553

如何将最后一列转换为多行,以便我可以跟踪结果。


ghfghfhdfCore,990010:1009418,ipv4,Vl941 Vl1028
hfghfghdf11,990010:10008,ipv4,
yreyryer-12,990010:20002,ipv4,Vl749 Vl874 Vl894 Vl942 Vl1172 Vl1439 Vl2553

我相信一定有人能够帮助我解决这个问题。

4

4 回答 4

1

在看到后续行有多少字段之前,不要打印行尾换行符:

awk -v OFS="," '{$1=$1;printf "%s%s",(NF>1?n:" "),$0;n=ORS}END{print ""}' file

例如:

$ cat file
  ghfghfhdfCore                    990010:1009418       ipv4        Vl941
                                                                   Vl1028
  hfghfghdf11                      990010:10008         ipv4
  yreyryer-12                      990010:20002         ipv4        Vl749
                                                                   Vl874
                                                                   Vl894
                                                                   Vl942
                                                                   Vl1172
                                                                   Vl1439
                                                                   Vl2553

$ awk -v OFS="," '{$1=$1;printf "%s%s",(NF>1?n:" "),$0;n=ORS}END{print ""}' file
ghfghfhdfCore,990010:1009418,ipv4,Vl941 Vl1028
hfghfghdf11,990010:10008,ipv4
yreyryer-12,990010:20002,ipv4,Vl749 Vl874 Vl894 Vl942 Vl1172 Vl1439 Vl2553

我刚刚注意到你可以有一些只有 3 个字段的行,在这种情况下需要在末尾添加一个逗号:

$ awk -v OFS="," 'NF==3{sub(/$/,OFS)} {$1=$1;printf "%s%s",(NF>1?n:" "),$0;n=ORS} END{print ""}' file
ghfghfhdfCore,990010:1009418,ipv4,Vl941 Vl1028
hfghfghdf11,990010:10008,ipv4,
yreyryer-12,990010:20002,ipv4,Vl749 Vl874 Vl894 Vl942 Vl1172 Vl1439 Vl2553

如果您的字段实际上是制表符分隔的,或者第三个字段后有空格,则还有另一种解决方案。

于 2013-02-25T13:51:50.493 回答
0

尝试这样做:

awk '
    BEGIN{OFMT="INSERT INTO `table` \
    VALUES(\n\t`%s`,\n\t`%s`,\n\t`%s`,\n\t`%s`\n);\n"
    }
    NF==3{printf OFMT, $1, $2, $3, $4v}
    NF==4{if (length(v)){printf OFMT, $1, $2, $3, $4v;v=""}
        else{printf OFMT, $1, $2, $3, $4}
    }
    NF==1{v=v","$1}
' file | mysql

没有

    INSERT INTO `table`     VALUES(
        `TEST1_Core`,
        `990010:10011608`,
        `ipv4`,
        `Vl1162`
);
INSERT INTO `table`     VALUES(
        `AA2_Autism_Core`,
        `990010:10014478`,
        `ipv4`,
        `Vl1447`
);
INSERT INTO `table`     VALUES(
        `6753312_Core`,
        `990010:1004868`,
        `ipv4`,
        ``
);
INSERT INTO `table`     VALUES(
        `542343423`,
        `990010:1004128`,
        `ipv4`,
        ``
);
INSERT INTO `table`     VALUES(
        `Bgdfdfgdf`,
        `990010:2728`,
        `ipv4`,
        ``
);
INSERT INTO `table`     VALUES(
        `gfgCore`,
        `990010:1002108`,
        `ipv4`,
        `Vl2105`
);
INSERT INTO `table`     VALUES(
        `fgdfgfgdfg_Core`,
        `990010:10021038`,
        `ipv4`,
        `Vl2103`
);
INSERT INTO `table`     VALUES(
        `42342342342342Core`,
        `990010:2105898`,
        `ipv4`,
        `Vl664`
);
INSERT INTO `table`     VALUES(
        `24234234N_Core`,
        `990010:1007967`,
        `ipv4`,
        `Vl2552,Vl896`
);
INSERT INTO `table`     VALUES(
        `C86765Core`,
        `990010:1001708`,
        `ipv4`,
        `Vl905`
);
INSERT INTO `table`     VALUES(
        `Dhyhyh_Core`,
        `990010:100106`,
        `ipv4`,
        ``
);
INSERT INTO `table`     VALUES(
        `ghfghfhdfCore`,
        `990010:1009418`,
        `ipv4`,
        `Vl941`
);
INSERT INTO `table`     VALUES(
        `hfghfghdf11`,
        `990010:10008`,
        `ipv4`,
        `,Vl1028`
);
INSERT INTO `table`     VALUES(
        `yreyryer-12`,
        `990010:20002`,
        `ipv4`,
        `Vl749,Vl1028`
);
INSERT INTO `table`     VALUES(
        `42342342_Core`,
        `990010:1004068`,
        `ipv4`,
        `Vl50,Vl874,Vl894,Vl942,Vl1172,Vl1439,Vl2553`
);
INSERT INTO `table`     VALUES(
        `gdfgdg_Core`,
        `990010:1004498`,
        `ipv4`,
        `Vl617`
);
INSERT INTO `table`     VALUES(
        `Spgdfggdf`,
        `990010:1002798`,
        `ipv4`,
        `Vl779`
);
INSERT INTO `table`     VALUES(
        `gdgdgdgdgCore`,
        `990010:1004278`,
        `ipv4`,
        ``
);
INSERT INTO `table`     VALUES(
        `test`,
        `990010:500500`,
        `ipv4`,
        ``
);
于 2013-02-25T13:34:06.720 回答
0

我建议您遵循通用规则“使用只执行一项任务的程序,但要做好”。因此,让我们从使您的数据保持一致开始:

sed -ne '1h;1!H;${;g;s/\n\s\+\(\S\+\)/,\1/g;p;}'

这是众所周知的单线车削sed多线。

现在只有您的幻想可能会阻止您以各种方式操作数据。

于 2013-02-25T13:56:13.360 回答
0

其他 awk 方法,更多的是“古怪的方式”:

'NF>1{print line;line=$1 "," $2 "," $3 "," $4} NF==1{line=line " " $1}'
于 2013-02-26T12:38:34.783 回答