1

在一些讨厌的嵌套表上使用 Web::Scrape,没有 CSS 样式。必须学习 XPATH,然后被绊倒。

更新:修复了一些 XPATH 问题,现在只剩下一个关于属性的问题

#!perl
use warnings;
use Web::Scraper;
use Data::Dumper;

my $html = do { local $/; <DATA> };
my $scraper = scraper {
    # Wrong! The 'tbody' element does not exist.
    # process ".//[@id='cfg-surface-detail']/center/table/tbody/tr/td[2]/select",
    # I used Chrome to get the XPath, and it inserts tbody elements when rendering bad HTML
    # also, I changed the start of the XPATH from './/' to '//*'
    # which I think means "relative to anywhere" or something.
    process "//*[@id='cfg-surface-detail']/center/table/tr/td[2]/select",
        'sensorType[]' => 'TEXT';
};

my $res = $scraper->scrape($html);
print Dumper($res);

__DATA__
<html><head><title>...</title></head>
<body>
    <form action="/foo" method=post id=cfg-surface-detail name=cfg-surface-detail>
        <center>
        <table bgcolor="#FFFFFF">
            <tr><td>Sensor Type</td><td>
            <select name="cfg-sensor-type"  >
                <option value="1 Fred's Sensor" selected>Fred's Sensor
                <option value="2 Other">Other Sensor
            </select>
            </td></tr>
        </table>
        </center>
    </form>
</body>
</html>

现在输出

$VAR1 = {
      'sensorType' => [
                        'Fred\'s Sensor Other Sensor '
                      ]
    };

所以我越来越近了。现在如何指定<option>具有该selected属性的?

更新:已解决。Xpath 是//*[@id="cfg-surface-detail"]/center/table/tr/td[2]/select/option[@selected]

这有帮助:http ://www.w3schools.com/xpath/xpath_syntax.asp

4

3 回答 3

0
#!perl
use warnings;
use Web::Scraper;
use Data::Dumper;

my $html = do { local $/; <DATA> };
my $scraper = scraper {
      process '#cfg-surface-detail//select',
        'sensorType[]' => 'TEXT';
};

my $res = $scraper->scrape($html);
print Dumper($res);

__DATA__
<html><head><title>...</title></head>
<body>
    <form action="/foo" method=post id=cfg-surface-detail name=cfg-surface-detail>
        <center>
        <table bgcolor="#FFFFFF">
            <tr><td>Sensor Type</td><td>
            <select name="cfg-sensor-type"  >
                <option value="1 Fred's Sensor" selected>Fred's Sensor
                <option value="2 Other">Other Sensor
            </select>
            </td></tr>
        </table>
        </center>
    </form>
</body>
</html>
于 2013-01-29T07:02:45.000 回答
0

如果是我,我会选择css。所选选项的 CSS 解决方案是:

'select[name="cfg-sensor-type"] option[selected]'
于 2013-03-23T22:55:42.840 回答
0

答案与之前的两个答案都有些不同:

$scraper = scraper {
    process '//select[@name="cfg-sensor-type"]/option[@selected]', 'SensorType' => 'TEXT';
};
于 2015-01-05T18:27:16.370 回答