1

There is table that needs to be scraped with scrapy. The data in the the following format:

<table>

<tr class="colhead">
<td width="170">MON, NOV 11</td>
<td width="80">Item</td>
<td width="60" align="center"></td>
<td width="210">Item</td>
<td width="220">Item</td>
</tr>

<tr class="oddrow">
<td> Item </a></td>
<td> Item </td>
<td align="center"> Item </td>
<td></td>
<td> Item </td>
</tr>

<tr class="evenrow">
<td> Item </a></td>
<td> Item </td>
<td align="center"> Item </td>
<td></td>
<td> Item </td>
</tr>


</table>

I do get full list of items by

items = hxs.select('//table[@class="tablehd"]//td//text()').extract()

How would you split them to each item and then assign data td1 - td5

4

1 回答 1

2

根据教程shell 示例,您应该首先获取<tr>元素,然后<td>从中获取元素,如下所示:

rows = hxs.select('//tr')
for row in rows:
    print row.select('td/text()').extract()

rows将是HtmlXPathSelector您遍历的对象列表,然后从每个对象中提取当前的<td>文本<tr>

row.select('td/text()').extract()将是一个列表,其中包含给定行的每个单元格的文本:

[u'MON, NOV 11', u'Item', u'Item', u'Item']
[u' Item ', u' Item ', u' Item ', u' Item ']
[u' Item ', u' Item ', u' Item ', u' Item ']
于 2013-07-02T20:05:44.100 回答