1

首先参考 Stack Overflow 问题Using Spreadsheet::WriteExcel

数据结构如下所示:

      col1      col2    col3   col4   col5
row1  School    1
row2  Dean      John
row3  No.stu.   55
row4  some irrelevant stuff
row5  School2   2
row6  Dean      Tony
row7  No. stu.  60
row8  some irrelevant stuff
row9  School    3
row10 Dean      James
row11 No.stu.   56
row12 No. teacher 20
row13 School    4
row14 Dean      Tom
row15 No.stu.   79
row16 No. teacher 21
row17 course
row18           math    2
row19           eng     4
row20 teacher   name    age   gender   race
row21           Jane    20    female   white
row22 student   name    Lee
row23           SAT     1434
row24           gender  male

我想要实现的输出是:

      col1  col2  col3     col4          col5         col6          col7        col8       col9
row1 School Dean No.stu.  No. teacher  course_math  course_eng  teacher_name  teacher_age  teacher_gender    teacher_race    student_name   student_SAT   student_gender
row2 1      John  55
row3 2      Tony  60
row4 3      James 56       20
row5 4      Tome  79       21              2           4            Jane        20          female                white         Lee         1434          male

多亏了gangabas,我得到的代码是:

use strict;
use warnings;

use Spreadsheet::ParseExcel;
use FindBin qw($Bin);

my ($infile) = @ARGV;

my $parser   = Spreadsheet::ParseExcel->new();
my $workbook = $parser->parse("$Bin/Test.xls");
die $parser->error unless defined $workbook;
my ($worksheet) = $workbook->worksheets();

my %data;
my $row    = 0;
my $school = "";
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value = $worksheet->get_cell($row++, 1)->value();

    if ($key eq "School") {

        $school = $value;
        next;
    }

    $data{$school}->{$key} = $value;
}
sleep 1;

我从解析row17-row19开始。我遇到的第一个问题是 (row17, col3) 中的空单元格。代码到达这里时出错。知道 Excel 单元格区分“空”和“空白”,我可以通过将原始 XLS 文件的单元格格式设置为“常规”以外的其他格式来玩弄它。但是,这只是一个临时解决方案。我想知道是否有任何命令可以用来获取空单元格。我已经尝试unformatted()添加:

my $unformattedvalue = $worksheet->get_cell( $row++, 1 )->unformatted();

然而,它行不通。

然后我尝试使用以下代码指定“课程”状态下的数据结构:

my %data;
my $row    = 0;
my $school = "";
my $course = ""; #Initial value for the state of course
while (1) {
    my $cell = $worksheet->get_cell($row, 0);
    last unless defined($cell);

    my $key = $cell->value();
    my $value  = $worksheet->get_cell( $row++, 1 )->value();
    my $value1 = $worksheet->get_cell( $row++, 2 )->value(); #Fetching the value in column 3

    if ($key eq "School") {

        $school = $value;
        next;
    }

    if ($key eq "course") { #Just mimicking the how we construct the structure of 'School'

         $course = $value1;
         next;
    }

    $data{$school}->{$key} = $value;  #Must be something wrong here, but can not figure out
}

代码没有通过并给出Can't call method "value" on an undefined value at xxx line of 'my $value1 = $worksheet->get_cell( $row++, 2 )->value()';

简而言之,我的问题是:

  1. 一般来说,如何在不干预过程的情况下获取 XLS 中的空单元格?
  2. 如何使用 Spreadsheet::ParseExcel 解析如下结构

    row17 course
    row18           math    2
    row19           eng     4
    
4

2 回答 2

2

你正在做一个$row++你不应该做的事情。

试试这个代码作为你的起点:

my %data;
my $state = "";
my $school = "";
my $student = "";
my ( $row_min, $row_max ) = $worksheet->row_range();
my $row = $row_min;
while ($row <= $row_max) {
    my $cell0 = $worksheet->get_cell( $row, 0 );
    my $cell1 = $worksheet->get_cell( $row, 1 );

    if (defined($cell0)) {
        my $key = $cell0->value();
        if ($key eq 'School') {
            $state = 'school';
            $school = $cell1->value();
        } elsif ($key eq 'course') {
            $state = 'course';
        } elsif ($key eq 'teacher') {
            $state = 'teacher';
        } elsif ($key eq 'student') {
            $state = 'student');
            $student = $worksheet->get_cell( $row, 2 )->value();
        } else {
            $data{$school}{$key} = $cell1->value();
        }
    } elsif ($state eq 'course') {
        # process columns for course
    } elsif ($state eq 'teacher') {
        # process columns for teacher
    } elsif ($state eq 'student') {
        # process columns for student
    }
    $row++;
}
use Data::Dumper;
print Dumper(\%data);

更新:

要处理教师行,我将首先在第一个 while 循环之外声明一个 %teacher_columns 散列,并使用 col_range() 方法声明 $col_min 和 $col_max。然后在$key eq 'teacher'elsif 子句中,我会做这样的事情:

%teacher_columns = (); # clear it out in case column names are different for this school
for my $col (2 .. $col_max) {
    my $cell = $worksheet->get_cell( $row, $col );
    $teacher_columns{$cell->value()} = $col if defined($cell) and $cell->value();
}

然后在state eq 'teacher'elsif 子句中,我会做这样的事情:

foreach my $key (keys %teacher_columns) {
    my $cell = $worksheet->get_cell($row, $teacher_columns{$key});
    if (defined($cell)) {
      # store cell data into proper location of your data hash
    }
}
于 2013-04-09T16:49:09.267 回答
0

这是我的作业:

我只是添加列的过程。

elsif ($state eq 'student') {
        my $key = $cell1->value();
        $data{$school}{$student}{$key} =$cell2->value();
}  elsif ($state eq 'course') {
       my $key = $cell1->value();
        $data{$school}{$course}{$key} =$cell2->value();

其中$cell2定义为:

my $cell2 = $worksheet->get_cell( $row, 2 );

它给了我:

$VAR1 = {
          '4' => {
                   'course' => {
                                 'math' => '2',
                                 'eng' => '4'
                               },
                   'No.Stu' => '79',
                   'No.Teacher' => '21',
                   'Lee' => {
                              'SAT' => '1434',
                              'gender' => 'male'
                            },
                   'Dean' => 'Tom'
                 },
          '1' => {
                   'No.Stu' => '55',
                   'Dean' => 'John'
                 },
          '3' => {
                   'No.Stu' => '56',
                   'No.Teacher' => '20',
                   'Dean' => 'James'
                 },
          '2' => {
                   'No.Stu' => '60',
                   'Dean' => 'Tony'
                 }
        };

看起来不错。

但我不知道如何处理教师的专栏,因为它有一些不同的结构:

        col1      col2    col3   col4   col5
  row20 teacher   name    age   gender   race
  rwo21           Jane    20    female   white

我试图遍历列以获取单元格值作为键但没有成功:

my ( $col_min, $col_max ) = $worksheet->col_range();
my $col = $col_min;
my $cell3 = $worksheet->get_cell(++$row, $col );

elsif ($state eq 'teacher') {
        while ($col <= $col_max) {
            my $key = $worksheet->get_cell($row, $col++ );
        }
           $data{$school}{$teacher}{$key} =$cell3->value();
    }

有什么建议么?

更新:我遵循@imran 的建议,它工作得很好,然后我只是使用以下代码将我解析的内容重写为一个新的 XLS 文件:

use Spreadsheet::WriteExcel;

$workbook = Spreadsheet::WriteExcel->new('Result.xls');
$worksheet = $workbook->add_worksheet();

$col = 0;
$row = 0;

$worksheet->write( $row++, $col,
    ["School", "No.Stu", "No.Teacher", "Dean","Course_math", "Course_eng", "student_SAT", "student_name", "student_gender", "teacher_race", "teacher_name", "teacher_age", "teacher_gender"] );

foreach my $school( sort keys %data ) {

     $worksheet->write( $row++, $col,
        [ $school, @{$data{$school}}{ "No.Stu", "No.Teacher", "Dean"}, @{$data{$school}{course}}{ "math", "eng"}, @{$data{$school}{student}}{ "SAT", "name", "gender"}, @{$data{$school}{teacher}}{ "race", "name", "age", "gender"} ]) ;
}

$workbook->close();

除了一件小事,一切都很好。的单元格teacher_age是空的,但它在哈希表中显示它已经被解析。我想不通...

散列的教师部分如下:

           'teacher' => {
                          'race' => 'white',
                          'name' => 'Jane',
                          'age ' => '25',
                          'gender' => ' female'
                        },

我使用的代码正是您建议的,但是重新格式化 xls 中的“teacher_age”单元格。文件为空。

更新 2:回到关于“状态”的问题,它是否总是从后续行开始解析?如果我有以下数据结构怎么办:

 School    1
 course    math
           eng
 ...
 School     2
 course    phy
 ...
 School     3
 course    chem
           gym
           music

如果我使用旧代码并定义课程状态,我只能解析engfrom school 1gym and musiccourse from school 2。好吧,我正在考虑提前一排定义课程状态,但没有得到任何运气。有什么建议么?

更新:

我遵循@imran 的建议,将课程代码更改为:

} elsif ($key eq 'course') {
            $state = 'course';
            $course = $worksheet->get_cell( $row, 1 )->value();
}

相应的哈希表为:

} elsif ($state eq 'course') {
            my $key = $cell1->value();
            $data{$school}{$course}{$key} =$cell1->value();
}

但是,它只能解析

 '1' => {
                   'math' => {
                               'eng' => 'eng'
                             },
       }
 '3' => {
   'chem' => {
                               'gym' => 'gym',
                               'music' => 'music
                             },
      }

并且没有解析school2.

该问题已通过course在到达密钥后立即使用简单的继续来解决。即,而不是有单独的块来定义course keys和处理课程表。我现在有:

  } elsif ($key eq 'course') {
            $state = 'course';
            $course = $state;
            my $key = $cell1->value();
            $data{$school}{$course}{$key} = $key;
}

它做到了这一点。但我仍然很好奇它是如何工作的。

于 2013-04-10T10:44:12.573 回答