3

我有多行,例如:

"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

我需要的是:

"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

如您所见,我需要在 from/to 标记上拆分 variable3(注意有时在“,”之间有一个空格)。

理想情况下,我需要结果输出:

"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"

我已经发现我可以通过 awk 进行拆分,但我不确定如何复制该行的其余部分:

awk -F\, '{                       
  for (i = 0; ++i <= NF;)
    print i, $i
  }' <<<'from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999'
1 from 4670000 to 4679999
2  from 4680000 to 4689999
3  from 9960000 to 9969999

对不起,这是我在这里的第一个问题,请随时指出我应该如何更正它以得到完全回答。

谢谢!

4

7 回答 7

4

对于以下输入:

"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

这段代码

#!/usr/bin/awk -f

BEGIN {
    FS = ";"
}

{
    t = $3
    gsub(/"/, "", t)
    n = split(t, a, /, /)
    for (i = 1; i <= n; ++i) {
        print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6
    }
}

会给

"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

浓缩形式(我不认为它真的可以称为真正的“单线”):

awk -F ";" -- '{ t = $3; gsub(/"/, "", t); n = split(t, a, /, /); for (i = 1; i <= n; ++i) print $1 ";" $2 ";\"" a[i] "\";" $4 ";" $5 ";" $6 }'

而这段代码

#!/usr/bin/awk -f

BEGIN {
    FS = ";"
}

{
    t = $3
    gsub(/"|from /, "", t)
    n = split(t, a, /, | to /)
    for (i = 1; i <= n; i += 2) {
        print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6
    }
}

会给

"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"

浓缩形式:

awk -F ";" -- '{ t = $3; gsub(/"|from /, "", t); n = split(t, a, /, | to /); for (i = 1; i <= n; i += 2) print $1 ";" $2 ";\"" a[i] "\";\"" a[i + 1] "\";"$4 ";" $5 ";" $6; }'

使用 gawk、nawk 和 mawk 测试脚本。

于 2013-08-25T12:34:43.083 回答
3

awk 单线:

awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' file

输出:

kent$  cat f
"390";"902";"from 4670000 to 4679999, from 4680000 to 4689999, from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999, from 9170000 to 9179999";"something3";"something4";"09.09.04"

kent$  awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];print}}' f
"390";"902";"from 4670000 to 4679999";"something1";"something2";"20.09.04"
"390";"902";"from 4680000 to 4689999";"something1";"something2";"20.09.04"
"390";"902";"from 9960000 to 9969999";"something1";"something2";"20.09.04"
"390";"903";"from 0770000 to 0779999";"something3";"something4";"09.09.04"
"390";"903";"from 9170000 to 9179999";"something3";"something4";"09.09.04"

编辑

如果你也想from...to解析,仍然是一个 awk oneliner:

awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++)
{$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' file

使用相同的输入文件进行测试:

kent$  awk -F'";"' -v OFS='";"' '{n=split($3,a,/,\s*/);for(i=1;i<=n;i++){$3=a[i];sub(/\s*to\s*/,"\";\"",$3);sub(/\s*from\s*/,"",$3);print}}' f                              
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
于 2013-08-25T12:31:02.673 回答
2
$ cat tst.awk
BEGIN{ FS=OFS="\";\"" }
{
    gsub(/from /,"",$3)
    split($3,a,/ *, */)
    for (i=1;i in a;i++) {
        $3 = a[i]
        sub(/ to /,OFS,$3)
        print
    }
}
$
$ awk -f tst.awk file
"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
于 2013-08-25T14:35:54.863 回答
2

这可能对您有用(GNU sed):

sed -r 's/, /","/g;s/^(([^;]*;){2})([^,]*),([^;]*)(.*)/\1\3\5\n\1\4\5/;P;D' file
于 2013-08-25T20:41:36.560 回答
1
#!/bin/bash

filename='file.txt'
temp=$(mktemp)

sed 's/, */";"/g' "$filename" > "$temp" # replace commas with ;

echo -n > "$filename" # clear our file
while read line; do
    IFS=';' read -a fields <<< "$line" # make an array out of the string

    for ((i=2; i<${#fields[@]}-3; i++)); do
        from=$(echo "${fields[$i]}" | cut -d ' ' -f2)
        to=$(echo "${fields[$i]}" | cut -d ' ' -f4)
        echo "${fields[0]};${fields[1]};\"$from\";\"$to;${fields[-3]};${fields[-2]};${fields[-1]}" >> "$filename"
    done
done < "$temp"

rm "$temp"

exit 0

它也将处理逗号前的空格。

于 2013-08-25T11:37:15.300 回答
1

这是在 Bash 中执行此操作的另一种方法:

#!/bin/bash

shopt -s extglob

IFS=';'

while read -a FIELDS; do
    TEMP=${FIELDS[2]//\"}
    read -a RANGES <<< "${TEMP//,?( )/;}"
    for A in "${RANGES[@]}"; do
        echo "${FIELDS[0]};${FIELDS[1]};\"$A\";${FIELDS[*]:3}"
    done
done

运行

bash script.sh < file

这将给出第一个预期的输出。

或者

#!/bin/bash

shopt -s extglob

IFS=';'

while read -a FIELDS; do
    TEMP=${FIELDS[2]//@(\"|from )}
    read -a RANGES <<< "${TEMP//@(,?( )| to )/;}"
    for (( I = 0; I < ${#RANGES[@]}; I += 2 )); do
        echo "${FIELDS[0]};${FIELDS[1]};\"${RANGES[I]}\";\"${RANGES[I + 1]}\";${FIELDS[*]:3}"
    done
done

这将获得第二个预期输出。

于 2013-08-25T14:22:21.110 回答
0

这是使用的一种方法。我知道你没有标记它,但我似乎更容易csv用一个好的解析器处理文件。它用逗号分割第三个字段 ( row[2]),然后将该字段的每个字符串拆分为空格并提取奇数 ( v.split()[1::2])。

内容script.py

#!/usr/bin/env python3

import csv
import sys
import copy

with open(sys.argv[1], 'r') as f:
        csvfile = csv.reader(f, delimiter=';')
        csvout = csv.writer(sys.stdout, delimiter=';', quoting=csv.QUOTE_ALL)
        for row in csvfile:
                v3 = row[2].split(r', ')
                for v in v3:
                        newrow = copy.deepcopy(row)
                        fields = v.split()[1::2]
                        newrow[2:3] = fields
                        csvout.writerow(newrow)

像这样运行它:

python3 script.py infile

这会产生:

"390";"902";"4670000";"4679999";"something1";"something2";"20.09.04"
"390";"902";"4680000";"4689999";"something1";"something2";"20.09.04"
"390";"902";"9960000";"9969999";"something1";"something2";"20.09.04"
"390";"903";"0770000";"0779999";"something3";"something4";"09.09.04"
"390";"903";"9170000";"9179999";"something3";"something4";"09.09.04"
于 2013-08-25T11:48:08.327 回答