1

我有一个文本文件,在行的开头带有分隔符作为空格。

没有初始空格的行应该放在 CSV 文件的第一列;有两个空格的应该放在 CSV 的第二列;那些有四个空格的应该放在第三列。

这一切都按要求正常工作。

在以两个空格开头的行中,我希望只有日期应该放在第二列中,丢弃该行的其他数据。其余的都应该保持原样。

为了清楚起见,我在行首标出了空格#

文本文件:

Component1
##(111) Amar Sen <amar.sen@gmail.com> <No comment> 2013/04/01
####/Com/src/folder1/folder2/newfile.txt
##(1199) Prashant Singh <psinsgh@gmail.com> <No comment> 2013/04/24
####/Com/src/folder1/folder2/testfile24
####/Com/src/folder1/folder2/testfile25
####/Com/src/folder1/folder2/testfile26
##(1204) Anthony Li <anthon.li@gmail.com> <No comment> 2013/04/25
####/Com/src2
Component2(added)
Component3

输出格式:

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt
           2013/04/24,/Com/src/folder1/folder2/testfile24
                  /Com/src/folder1/folder2/testfile25
                      /Com/src/folder1/folder2/testfile26
           2013/04/25,/Com/src2
Component2(added)
Component3

这是代码。除了上述更改外,它工作正常。

use strict;
use warnings;

my $previous_count            = "-1"; #beginning, we will think, that no spaces.
my $current_count             = "0";  #current default value
my $maximum_count             = 3;
my $to_written                = "";
my $delimiter_between_columns = ",";
my $newline_separator         = ";";

my $file = 'C:\\textfile.txt';
open (my $fh, '<:encoding(UTF-8)', $file) or die "Could not open file '$file' $!";

while (my $row = <$fh>) {

  # ok, read.
  chomp($row);

  # print "row is : $row\n";
  if ($row =~ m/^(\s*)/) {

    #print length($1);
    $current_count = length($1) / 2;    #take number of spaces divided by 2
    $row =~ s/^\s+//;

    if ($previous_count >= $current_count || $previous_count == $maximum_count) {

      #output here
      print "$to_written" . $newline_separator . "\n";

      $previous_count = 0;
      $to_written     = "";
    }
    $previous_count = 0 if ($previous_count == -1);
    $to_written .= $delimiter_between_columns x ($current_count - $previous_count) . "$row";

    $previous_count = $current_count;

    #print"\n";
  }
}

print "$to_written" . $newline_separator . "\n";
4

1 回答 1

1

You seem to have got yourself tied up in knots a little with your solution.

This program seems to do what you need. I have added some commas to your "output format" as your example has no placeholders for initial empty fields.

I have kept the hash characters for this purpose. Obviously it is trivial to change them for spaces, replacing s/^(#*)// with s/^(\s*)//.

use strict;
use warnings;

my @row;

while (<DATA>) {

  chomp;
  s/^(#*)//;
  my $i = length($1) / 2;

  if ($i == 1 and m<(\d{4}/\d{2}/\d{2})>) {
    $row[$i] = $1;
  }
  else {
    $row[$i] = $_;
  }

  if ($i == 2) {
    print join(',', @row), ";\n";
    @row = ('') x 3;
  }
}


__DATA__
Component1
##(111) Amar Sen <amar.sen@gmail.com> <No comment> 2013/04/01
####/Com/src/folder1/folder2/newfile.txt
##(1199) Prashant Singh <psinsgh@gmail.com> <No comment> 2013/04/24
####/Com/src/folder1/folder2/testfile24
####/Com/src/folder1/folder2/testfile25
####/Com/src/folder1/folder2/testfile26
##(1204) Anthony Li <anthon.li@gmail.com> <No comment> 2013/04/25
####/Com/src2

output

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
,2013/04/24,/Com/src/folder1/folder2/testfile24;
,,/Com/src/folder1/folder2/testfile25;
,,/Com/src/folder1/folder2/testfile26;
,2013/04/25,/Com/src2;

Update

It makes more sense to cascade values from columns one and two into subsequent rows where they are not supplied. If you remove the line @row = ('') x 3 from my program it will do just that, with this output

Component1,2013/04/01,/Com/src/folder1/folder2/newfile.txt;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile24;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile25;
Component1,2013/04/24,/Com/src/folder1/folder2/testfile26;
Component1,2013/04/25,/Com/src2;
于 2013-06-12T12:59:07.443 回答