bash - IFS 分隔一个字符串，如“Hello”、“World”、“this”、“is, a fucking”、“line”

Question

我正在尝试解析 .csv 文件，但 IFS 存在一些问题。该文件包含如下行：

"Hello","World","this","is, a boring","line"

列用逗号分隔，因此我尝试使用以下代码分解该行：

IFS=, read -r -a tempArr <<< "$line"

但我得到这个输出：

"Hello"
"World"
"this"
"is
a boring"
"line"

我明白为什么，所以我尝试了其他一些命令，但没有得到预期的输出。

IFS=\",\"
IFS=\",
IFS=',\"'
IFS=,\"

每次将第三个元素分成两部分。如何使用 IFS 将字符串分成 5 个这样的部分？

"Hello"
"World" 
"this" 
"is, a boring" 
"line"

score 0 · Accepted Answer

give this a try:

sed 's/","/"\n"/g' <<<"${line}"

sed has a search and replace command s which is using regex to search pattern.

The regex replaces , in "," with new line char.

As a consequence each element is on a separate line.

score 0 · Accepted Answer

bashlib提供了一个csvline函数。假设您已将其安装在 PATH 中的某个位置：

line='"Hello","World","this","is, a boring","line"'

source bashlib
csvline <<<"$line"
printf '%s\n' "${CSVLINE[@]}"

...上面的输出是：

Hello
World
this
is, a boring
line

引用实现（版权为lhunath，以下文本取自相关 git repo 的此特定修订版）：

#  _______________________________________________________________________
# |__ csvline ____________________________________________________________|
#
#       csvline [-d delimiter] [-D line-delimiter]
#
# Parse a CSV record from standard input, storing the fields in the CSVLINE array.
#
# By default, a single line of input is read and parsed into comma-delimited fields.
# Fields can optionally contain double-quoted data, including field delimiters.
#
# A different field delimiter can be specified using -d.  You can use -D
# to change the definition of a "record" (eg. to support NULL-delimited records).
#
csvline() {
    CSVLINE=()
    local line field quoted=0 delimiter=, lineDelimiter=$'\n' c
    local OPTIND=1 arg
    while getopts :d: arg; do
        case $arg in
            d) delimiter=$OPTARG ;;
        esac
    done

    IFS= read -d "$lineDelimiter" -r line || return
    while IFS= read -rn1 c; do
        case $c in
            \")
                (( quoted = !quoted ))
                continue ;;
            $delimiter)
                if (( ! quoted )); then
                    CSVLINE+=( "$field" ) field=
                    continue
                fi ;;
        esac
        field+=$c
    done <<< "$line"
    [[ $field ]] && CSVLINE+=( "$field" ) ||:
} # _____________________________________________________________________

score 0 · Accepted Answer

您可能希望使用 gawkFPAT来定义有效字符串的构成 -

输入：

“你好”、“世界”、“这个、是”

脚本：

gawk -n 'BEGIN{FS=",";OFS="\n";FPAT="([^,]+)|(\"[^\"]+\")"}{$1=$1;print $0}' somefile.csv

输出：

“你好”
“世界”
“这个，是”

bash - IFS 分隔一个字符串，如“Hello”、“World”、“this”、“is, a fucking”、“line”

3 回答 3

Related

Reference