1

我使用的 Perl 代码HTML::TableExtract不起作用。

这是我的代码

#!/usr/bin/perl
use strict;
use warnings;

use HTML::TableExtract;


## Exactract table from html file
my $te = new HTML::TableExtract( attribs => { border => 0} );
$te->parse_file("file_path.html");
my $table = $te->tables;

for my $row ($table->rows) {
    print join(',', @$row), "\n";
}

我一直有这个错误

Can't call method "rows" without a package or object reference at ./parse_table.pl line 13.

这是我的 HTML 文件,被截断以仅显示我感兴趣的表格。 http://phucnvo.myvnc.com/sandbox/out.html

<div>
  <form name="listAssignmentsForm" action="https://t-square.gatech.edu/portal/tool/3a34f619-99d1-4548-be57-9ee977fd8127?panel=Main"
    method="post">
    <input type="hidden" name="source" value="0"/>
    <table class="listHier lines nolines" border="0" cellspacing="0"
      summary="List of assignments. Column headers are also links which can be used to sort the table by that column. Column 1: Indicates if the assignment has attachments. Column 2: assignment title and links to edit, duplicate or grade(if allowed). Column 3: status. Column 4: opening date. Column 5: due date. The rest of the columns may or may not be present. Column 6: may have the number submitted and graded. Column 7: may have checkboxes to select and remove the assignment.">
      <tr>
        <th id="attachments" class="attach"> &nbsp; </th>
        <th id="title">
          <a href="#" onclick="location='url'; return false;" title="Sort by title"> Assignment title </a>
        </th>
        <th id="For">
          <a href="#" onclick="location='url'; return false;" title="Sort by audience">For</a>
        </th>
        <th id="status">
          <a href="#"
            onclick="location='https://t-square.gatech.edu/portal/tool/3a34f619-99d1-4548-be57-9ee977fd8127?criteria=assignment_status&amp;panel=Main&amp;sakai_action=doSort'; return false;"
            title="Sort by status"> Status </a>
        </th>
        <th id="openDate">
          <a href="#"
            onclick="location='https://t-square.gatech.edu/portal/tool/3a34f619-99d1-4548-be57-9ee977fd8127?criteria=opendate&amp;panel=Main&amp;sakai_action=doSort'; return false;"
            title="Sort by section"> Open </a>
        </th>
        <th id="dueDate">
          <a href="#"
            onclick="location='https://t-square.gatech.edu/portal/tool/3a34f619-99d1-4548-be57-9ee977fd8127?criteria=duedate&amp;panel=Main&amp;sakai_action=doSort'; return false;"
            title="Sort by due date"> Due </a>
        </th>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment1" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 7</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jul 24, 2013 12:24 am </td>
        <td headers="openDate"> Jul 19, 2013 12:00 pm </td>
        <td headers="dueDate"> Jul 26, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment2" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 6</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jul 19, 2013 4:33 am </td>
        <td headers="openDate"> Jul 11, 2013 12:00 pm </td>
        <td headers="dueDate"> Jul 18, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment3" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 5</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jul 10, 2013 11:37 pm </td>
        <td headers="openDate"> Jun 27, 2013 12:00 pm </td>
        <td headers="dueDate"> Jul 10, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment4" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Threads Practice </a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Not Started </td>
        <td headers="openDate"> Jun 27, 2013 12:00 pm </td>
        <td headers="dueDate"> Jun 27, 2013 12:05 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment5" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 4</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jun 27, 2013 4:58 am </td>
        <td headers="openDate"> Jun 20, 2013 1:00 am </td>
        <td headers="dueDate"> Jun 26, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment6" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 3</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jun 20, 2013 3:19 am </td>
        <td headers="openDate"> Jun 6, 2013 12:00 pm </td>
        <td headers="dueDate"> Jun 19, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment7" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 2</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted Jun 5, 2013 5:39 am </td>
        <td headers="openDate"> May 28, 2013 12:00 pm </td>
        <td headers="dueDate"> Jun 4, 2013 11:55 pm </td>
      </tr>
      <tr>
        <td headers="attachments" class="attach">
          <img id="attachment8" src="/library/image/sakai/attachments.gif?panel=Main" alt="Attachments" width="13" height="11" border="0"/>
        </td>
        <td headers="title">
          <h4><a href="url">Project 1: Processor Design</a></h4>
        </td>
        <td style="padding-bottom:0"> site </td>
        <td headers="status"> Submitted May 31, 2013 2:09 am </td>
        <td headers="openDate"> May 16, 2013 1:40 pm </td>
        <td headers="dueDate"> May 30, 2013 11:55 pm </td>
      </tr>
    </table>
  </form>
</div>

我希望看到的是作业标题、状态、打开日期和结束日期。

4

2 回答 2

3

正如 ysth 所建议的,您的问题就在这里:

my $table = $te->tables;

tables是复数,表明它应该在列表上下文中调用。您在标量上下文中调用它。在 Perl 中,如果在标量上下文中调用,许多返回列表的函数将返回该列表的长度。 tables是其中之一,因此$table设置为 1。您不能在数字上调用方法(好吧,不是没有autobox)。

试试这个:

my ($table) = $te->tables;

分配前的括号使其成为列表分配。$table获取找到的第一个表,并且丢弃任何其他表。

于 2013-10-02T06:52:44.930 回答
2

医生说:

表()

返回匹配的所有表的表对象。如果没有匹配的表,则返回一个空列表。

它期望被称为:

my @tables = $te->tables();

显然它没有找到任何东西,所以什么也没有返回。

也许您可以提供一个精简的 html 版本,它仍然可以演示问题并告诉您期望发生什么?

于 2013-10-02T05:17:30.503 回答