perl - 如何从 CSV 文件的不同列中分离出与字母相对应的单词？

Question

我有一个 CSV 文件，如下所示：

ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

第一列中的每个字符对应于第二列中的一个词，例如，在第一列中，A对应于“this”，对应于C“is”，对应D于“a”，以及B，对应于sentence。

给定变量character，它可以设置为第一列中出现的任何字符，我需要隔离与所选字母对应的单词，例如，如果我设置character="B"，那么上面的输出将是：

sentence
this
this another

如果我设置 `character="C"，那么上面的输出将是：

is
another
is

如何仅输出与所选字母位置相对应的单词？

该文件包含许多 UTF-8 字符。
对于第 1 列中的每个字符，第 2 列中总是有相同数量的单词。
第 2 列中的单词用空格分隔。

这是我到目前为止的代码：

while read line
do
    characters="$(echo $line | awk -F, '{print $1}')"
    words="$(echo $line | awk -F, '{print $2}')"
    character="B"
done < ./file.csv

score 1 · Accepted Answer

这是一个基本完成的臀部答案。

由于 SO 不是“为我做我的工作”网站，因此您需要填写一些琐碎的空白。

sub get_index_of_char {
   my ($character, $charset) = @_;
   # Homework: read about index() function
   #http://perldoc.perl.org/functions/index.html
}

sub split_line {
    my ($line) = @_;
    # Separate the line into a charset (before comma), 
    # and whitespace separated word list.
    # You can use a regex for that
    my ($charset, @words) = ($line =~ /^([^,]+),(?(\S+)\s+)+(\S+)$/g); # Not tested
    return ($charset, \@words);
}

sub process_line {
    my ($line, $character) = @_;
    chomp($line);
    my ($charset, $words) = split_line($line);
    my $index = get_index_of_char($character, $charset);
    print $words->[$index] . "\n"; # Could contain a off-by-one bug
}

# Here be the main loop calling process_line() for every line from input

score 1 · Accepted Answer

这似乎可以解决问题。它使用 DATA 文件句柄从源文件中读取数据，而您必须从自己的源中获取数据。您可能还需要注意没有与给定字母对应的单词（如此处第二个数据行中的“A”）。

use strict;
use warnings;

my @data;

while (<DATA>) {
  my ($keys, $words) = split /,/;
  my @keys = split //, $keys;
  my @words = split ' ', $words;
  my %index;
  push @{ $index{shift @keys} }, shift @words while @keys;
  push @data, \%index;
}

for my $character (qw/ B C /) {
  print "character = $character\n";
  print join(' ', @{$_->{$character}}), "\n" for @data;
  print "\n";
}

__DATA__
ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

输出

character = B
sentence
this
this another

character = C
is
another
is

score 1 · Accepted Answer

这可能对您有用：

x=B                                                      # set wanted key variable
sed '
:a;s/^\([^,]\)\(.*,\)\([^ \n]*\) *\(.*\)/\2\4\n\1 \3/;ta # pair keys with values
s/,//                                                    # delete ,
s/\n[^'$x'] [^\n]*//g                                    # delete unwanted keys/values
s/\n.//g                                                 # delete wanted keys
s/ //                                                    # delete first space
/^$/d                                                    # delete empty lines
' file
sentence
this
this another

或在 awk 中：

awk -F, -vx=B '{i=split($1,a,"");split($2,b," ");c=s="";for(n=1;n<=i;n++)if(a[n]==x){c=c s b[n];s=" "} if(length(c))print c}' file
sentence
this
this another

perl - 如何从 CSV 文件的不同列中分离出与字母相对应的单词？

3 回答 3

Related

Reference