-1

我有一个 CSV 文件,如下所示:

ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

第一列中的每个字符对应于第二列中的一个词,例如,在第一列中,A对应于“this”,对应于C“is”,对应D于“a”,以及B,对应于sentence。

给定变量character,它可以设置为第一列中出现的任何字符,我需要隔离与所选字母对应的单词,例如,如果我设置character="B",那么上面的输出将是:

sentence
this
this another

如果我设置 `character="C",那么上面的输出将是:

is
another
is

如何仅输出与所选字母位置相对应的单词?

  • 该文件包含许多 UTF-8 字符。
  • 对于第 1 列中的每个字符,第 2 列中总是有相同数量的单词。
  • 第 2 列中的单词用空格分隔。

这是我到目前为止的代码:

while read line
do
    characters="$(echo $line | awk -F, '{print $1}')"
    words="$(echo $line | awk -F, '{print $2}')"
    character="B"
done < ./file.csv
4

3 回答 3

1

这是一个基本完成的臀部答案。

由于 SO 不是“为我做我的工作”网站,因此您需要填写一些琐碎的空白。

sub get_index_of_char {
   my ($character, $charset) = @_;
   # Homework: read about index() function
   #http://perldoc.perl.org/functions/index.html
}

sub split_line {
    my ($line) = @_;
    # Separate the line into a charset (before comma), 
    # and whitespace separated word list.
    # You can use a regex for that
    my ($charset, @words) = ($line =~ /^([^,]+),(?(\S+)\s+)+(\S+)$/g); # Not tested
    return ($charset, \@words);
}

sub process_line {
    my ($line, $character) = @_;
    chomp($line);
    my ($charset, $words) = split_line($line);
    my $index = get_index_of_char($character, $charset);
    print $words->[$index] . "\n"; # Could contain a off-by-one bug
}

# Here be the main loop calling process_line() for every line from input
于 2012-04-20T02:41:00.513 回答
1

这似乎可以解决问题。它使用 DATA 文件句柄从源文件中读取数据,而您必须从自己的源中获取数据。您可能还需要注意没有与给定字母对应的单词(如此处第二个数据行中的“A”)。

use strict;
use warnings;

my @data;

while (<DATA>) {
  my ($keys, $words) = split /,/;
  my @keys = split //, $keys;
  my @words = split ' ', $words;
  my %index;
  push @{ $index{shift @keys} }, shift @words while @keys;
  push @data, \%index;
}

for my $character (qw/ B C /) {
  print "character = $character\n";
  print join(' ', @{$_->{$character}}), "\n" for @data;
  print "\n";
}

__DATA__
ACDB,this is a sentence
BECD,this is another sentence
BCAB,this is yet another

输出

character = B
sentence
this
this another

character = C
is
another
is
于 2012-04-20T03:52:37.077 回答
1

这可能对您有用:

x=B                                                      # set wanted key variable
sed '
:a;s/^\([^,]\)\(.*,\)\([^ \n]*\) *\(.*\)/\2\4\n\1 \3/;ta # pair keys with values
s/,//                                                    # delete ,
s/\n[^'$x'] [^\n]*//g                                    # delete unwanted keys/values
s/\n.//g                                                 # delete wanted keys
s/ //                                                    # delete first space
/^$/d                                                    # delete empty lines
' file
sentence
this
this another

或在 awk 中:

awk -F, -vx=B '{i=split($1,a,"");split($2,b," ");c=s="";for(n=1;n<=i;n++)if(a[n]==x){c=c s b[n];s=" "} if(length(c))print c}' file
sentence
this
this another
于 2012-04-20T10:52:04.040 回答