python - Table needs to be scraped with scrapy

Question

There is table that needs to be scraped with scrapy. The data in the the following format:

<table>

<tr class="colhead">
<td width="170">MON, NOV 11</td>
<td width="80">Item</td>
<td width="60" align="center"></td>
<td width="210">Item</td>
<td width="220">Item</td>
</tr>

<tr class="oddrow">
<td> Item </a></td>
<td> Item </td>
<td align="center"> Item </td>
<td></td>
<td> Item </td>
</tr>

<tr class="evenrow">
<td> Item </a></td>
<td> Item </td>
<td align="center"> Item </td>
<td></td>
<td> Item </td>
</tr>


</table>

I do get full list of items by

items = hxs.select('//table[@class="tablehd"]//td//text()').extract()

How would you split them to each item and then assign data td1 - td5

score 2 · Accepted Answer

根据教程shell 示例，您应该首先获取<tr>元素，然后<td>从中获取元素，如下所示：

rows = hxs.select('//tr')
for row in rows:
    print row.select('td/text()').extract()

rows将是HtmlXPathSelector您遍历的对象列表，然后从每个对象中提取当前的<td>文本<tr>。

row.select('td/text()').extract()将是一个列表，其中包含给定行的每个单元格的文本：

[u'MON, NOV 11', u'Item', u'Item', u'Item']
[u' Item ', u' Item ', u' Item ', u' Item ']
[u' Item ', u' Item ', u' Item ', u' Item ']

python - Table needs to be scraped with scrapy

1 回答 1

Related

Reference