csv - 使用 spamwriter.writerow 在两列 csv 中输出使用美丽汤刮取的数据时出现问题

Question

我正在使用漂亮的汤从网站上抓取 2 组数据，我希望它们在 csv 文件中并排输出 2 列。我为此使用 spamwriter.writerow([x,y]) 参数，但我认为由于我的递归结构中的一些错误，我在我的 csv 文件中得到了错误的输出。以下是参考代码：

import csv
import urllib2
import sys  
from bs4 import BeautifulSoup
page = urllib2.urlopen('http://www.att.com/shop/wireless/devices/smartphones.html').read()
soup = BeautifulSoup(page)
soup.prettify()
with open('Smartphones_20decv2.0.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=',')        
    for anchor in soup.findAll('a', {"class": "clickStreamSingleItem"},text=True):
        if anchor.string:
            print unicode(anchor.string).encode('utf8').strip()         

    for anchor1 in soup.findAll('div', {"class": "listGrid-price"}):
        textcontent = u' '.join(anchor1.stripped_strings)
        if textcontent:
            print textcontent
            spamwriter.writerow([unicode(anchor.string).encode('utf8').strip(),textcontent])

我在 csv 中得到的输出是：

Samsung FocusÂ® 2 (Refurbished) $99.99
Samsung FocusÂ® 2 (Refurbished) $99.99 to $199.99 8 to 16 GB
Samsung FocusÂ® 2 (Refurbished) $0.99
Samsung FocusÂ® 2 (Refurbished) $0.99
Samsung FocusÂ® 2 (Refurbished) $149.99 to $349.99 16 to 64 GB

问题是我在第 1 列中只获得 1 个设备名称，而不是所有设备名称，而所有设备的价格都在上涨。请原谅我的无知，因为我是编程新手。

score 1 · Accepted Answer

您正在使用anchor.string, 而不是archor1. anchor是上一个循环中的最后一项，而不是当前循环中的项。

也许在这里使用更清晰的变量名有助于避免混淆；使用singleitem，gridprice也许？

可能是我误解了，您想将每个anchor1与相应的anchor. 您必须将它们循环在一起，也许使用zip()：

items = soup.findAll('a', {"class": "clickStreamSingleItem"},text=True)
prices = soup.findAll('div', {"class": "listGrid-price"})
for item, price in zip(items, prices):
    textcontent = u' '.join(price.stripped_strings)
    if textcontent:
        print textcontent
        spamwriter.writerow([unicode(item.string).encode('utf8').strip(),textcontent])

通常，循环遍历父表行应该更容易，然后在循环中找到该行中的单元格。但是zip()，只要clickStreamSingleItem单元格与listGrid-price匹配项对齐，它也应该起作用。

csv - 使用 spamwriter.writerow 在两列 csv 中输出使用美丽汤刮取的数据时出现问题

1 回答 1

Related

Reference