如果只有一个输入文件,我会使用 Perl 单行。不幸的是,结果相当复杂:
perl -pe 'if(/"""/&&s/"""/"/.../"""/&&s/"""/"\n/){s/[\n\r]//;};if(/ = \[([^]]*)]/){$r=$1eq""?"\"\"":$1=~s/"\s*,\s*"/ /gr;s/ = \[([^]]*)]/ = $r/};s/"\s*#[^"\n]*$/"/' one.toml | perl -ne 'if(/^([^"]+) = "(.*)"/){push@k,$1;push@v,"\"$2\""}END{print((join",",@k),"\n",join",",@v)}'
如果我们需要同时操作多个 ( *
) 文件,情况只会变得更糟:
perl -ne 'if(/"""/&&s/"""/"/.../"""/&&s/"""/"\n/){s/[\n\r]//;};if(/ = \[([^]]*)]/){$r=$1eq""?"\"\"":$1=~s/"\s*,\s*"/ /gr;s/ = \[([^]]*)]/ = $r/};s/"\s*#[^"\n]*$/"/;print;print"-\n"if eof' *.toml | perl -ne 'if(/^-$/){push@o,join",",@k if scalar@o==0;push@o,join",",@v;@k=@v=()};if(/^([^"]+) = "(.*)"/){push@k,$1;push@v,"\"$2\""}END{print join"\n",@o}'
这两个因素需要结构化的脚本。这是在 Perl 中,但同样可以在 Python 或您喜欢的任何语言中完成:
#!/usr/bin/env perl
use strict; use warnings; my @output;
foreach my $filename (@ARGV) {
my $content, my @lines, my $replace, my @keys, my @values;
open my $fh, "<:encoding(utf8)", $filename or die "Could not open $filename: $!";
{local $/; $content = <$fh>;}
$content =~ s/"""([^"]*)"""/'"' . $1=~s#[\r\n]##rg . '"'/ge;
@lines = split (/[\r\n]/, $content);
foreach my $line (@lines) {
if ($line =~ m/ = \[([^]]*)]/) {
$replace = $1 eq "" ? '""' : $1 =~ s/"\s*,\s*"/ /gr;
$line =~ s/ = \[([^]]*)]/ = $replace/
}
$line =~ s/"\s*#[^"]*$/"/;
$line =~ m/^([^"]+) = "(.*)"/;
push @keys, $1;
push @values, '"' . $2 . '"'
}
push @output, join ",", @keys if scalar @output == 0;
push @output, join ",", @values
}
print join "\n", @output
笔记:
大部分复杂性是由于必须处理数组(!)、注释和多行字符串。每个都需要一些预处理,这就是解决方案长度的大部分内容。此外,还需要有关可能的极端情况以及如何处理它们的附加信息(例如,如何在 CSV 中拟合字符串数组)。所有这些都强调了输入数据质量和一致性的重要性。所提出的解决方案绝不是完整的或稳健的,因为它确实对输入数据和所需的输出格式做出了一些假设。以下是我解决上述问题的方法:
- 值应该只是字符串,因为它们在发布的示例文件中。该脚本不处理数字、日期和布尔值。
- 数组可以是空的,也可以是
[]
字符串数组["my", "array"]
。在 OP 没有明确规范的情况下,它们转换为单个字符串,该字符串是所有元素字符串的串联。数组中不允许换行,数组也不能包含其他数组。
- 仅当注释在字符串值之后内联时才被处理。没有仅注释行。
- 不处理缩进、空行和节标题
测试运行:
$ perl toml-to-csv.pl *.toml
"someID1","someVersionNumber1","someTag1","someOtherTag1","","long text1","more text1","- text- more text- so much text"
"someID2","someVersionNumber2","someTag2","someOtherTag2","Array","long text2","more text2","- text- more text- so much text"
"someID3","someVersionNumber3","someTag3","someOtherTag3","My array","long text3","more text3","- text- more text- so much text"