bash - Bash Parse CSV - 格式化数据集

Question

我有一个格式如下的 csv：

Dataset1,

…

…

Dataset2,

..

..

Dataset3,

所有数据集都由空行分隔。我希望我的 bash 脚本将文件的格式更改为：

Dataset1           Dataset2           Dataset3

...                   …                     …

…                     …                     …

…                     …                     …

这是我的脚本：

#!/bin/bash
input="/path/to/csv/file/file.cvs"
while IFS=',' read -r f1 f2 f3; do
  if [ -z "$f1 $f2 $f3" ]; then
    awk 'BEGIN{getline to_add < "$f1 $f2 $f3"}{print $0,to_add}' f
  fi
  echo "$f1 $f2 $f3"
done < "$input"

score 0 · Accepted Answer

这是一个紧凑且可读但 hacky 的bash + awk +pr运行时差的解决方案，但它适用于任意集合。它用于awk使用其段落模式检索特定数据集，并pr使用进程替换将它们并排显示。

$ cat csv
Dataset1,
ds1foo1

Dataset2,
ds2foo1,ds2bar1
ds2foo2,ds2bar2

Dataset3,
ds3foo1,ds3bar1,ds3quux1
ds3foo2,ds3bar2,ds3quux2
ds3foo3,ds3bar3,ds3quux3

Dataset4,
ds3foo1,ds3bar1,ds3quux1,ds3quuux1
ds3foo2,ds3bar2,ds3quux2,ds3quuux2
ds3foo3,ds3bar3,ds3quux3,ds3quuux3
ds3foo4,ds3bar4,ds3quux4,ds3quuux4

$ ./columnize_paragraphs.sh csv
Dataset1,                Dataset2,                Dataset3,                Dataset4,
ds1foo1                  ds2foo1,ds2bar1          ds3foo1,ds3bar1,ds3quux1 ds3foo1,ds3bar1,ds3quux1
                         ds2foo2,ds2bar2          ds3foo2,ds3bar2,ds3quux2 ds3foo2,ds3bar2,ds3quux2
                                                  ds3foo3,ds3bar3,ds3quux3 ds3foo3,ds3bar3,ds3quux3
                                                                           ds3foo4,ds3bar4,ds3quux4

和代码：

#!/bin/bash

get_paragraph_count()
{
    awk 'BEGIN{RS=""}END{print NR}' "$1"
}

get_record()
{
    awk -v record="$2" 'BEGIN{RS=""}NR==record' "$1"
}

columnize_paragraphs()
{
    local file="$1"
    local paragraphs="$(get_paragraph_count "${file}")"
    local args=

    for i in $(seq 1 ${paragraphs}); do
        args="${args} <(get_record '${file}' '${i}')"
    done
    [ -n "${args}" ] && eval "pr -w100 -mt ${args}"
}

for file; do
    [ -e "${file}" ] || continue
    columnize_paragraphs "${file}"
done

根据您的文件的外观，如果您不关心换行，您将不得不在构造过程中使用-w参数 topr或注入 a 。foldargs

score 0 · Accepted Answer

下面的纯 shell（没有嵌入的 awk/perl）可以工作。但有局限性。仅适用于每组中相同数量的记录。要处理不同的数字，您需要维护每个集合中的记录计数并根据需要嵌入 emty 记录,,,,,,,。

set -u

ROW=0
SET=1
MAXROW=0
while read LINE
do
    if [ -z "$LINE" ]
    then
        # New data set
        if [ $ROW -gt $MAXROW ]
        then
            MAXROW=$ROW
        fi
        ROW=0
        SET=$(($SET+1))
    elif [ $SET -eq 1 ]
    then
        DATA[$ROW]="$LINE"
        ROW=$(($ROW+1))
    else
        DATA[$ROW]="${DATA[$ROW]},$LINE"
        ROW=$(($ROW+1))
    fi
done

if [ $ROW -gt $MAXROW ]
then
    MAXROW=$ROW
fi

ROW=0
while [ $ROW -lt $MAXROW ]
do
    echo "${DATA[$ROW]}"
    ROW=$(($ROW+1))
done

bash - Bash Parse CSV - 格式化数据集

2 回答 2

Related

Reference