0

已经有一段时间了,我正在重新开始为一个研究项目编码,我目前正在做一个练习网站,看看我需要为实际网站做些什么。

我已经按照我的意愿进行了一切工作,但是当将抓取的数据输出到 csv 时,它会将值放入一个新行,而不是它所在行旁边的列。

我在下面的输出中添加了链接。让我知道我需要改变什么,因为我无法弄清楚。

import csv, re
import requests 
from bs4 import BeautifulSoup

URL = "https://realpython.github.io/fake-jobs/"
page = requests.get(URL)
with open('testScraperEX.csv', 'w') as f:
    write = csv.writer(f)
    soup = BeautifulSoup(page.content, "html.parser")
    write.writerow(['Title', 'Company', 'Location'])
    results = soup.find(id="ResultsContainer")
    job_elements = results.find_all("div", class_="card-content")
    for job_element in job_elements:
        title_element = job_element.find("h2", class_="title")
        company_element = job_element.find("h3", class_="company")
        location_element = job_element.find("p", class_="location")
        Title = title_element.text.strip()
        Company = company_element.text.strip()
        Location = location_element.text.strip()
        write.writerows([[Title],[Company],[Location]])

这是当前输出 1

这就是我希望输出的方式 2

谢谢 :)

4

1 回答 1

0
import csv
import requests 
from bs4 import BeautifulSoup

URL = "https://realpython.github.io/fake-jobs/"
page = requests.get(URL)
with open('testScraperEX.csv', 'w', newline='') as f:
    write = csv.writer(f)
    soup = BeautifulSoup(page.content, "html.parser")
    write.writerow(['Title', 'Company', 'Location'])
    results = soup.find(id="ResultsContainer")
    job_elements = results.find_all("div", class_="card-content")
    for job_element in job_elements:
        title_element = job_element.find("h2", class_="title")
        company_element = job_element.find("h3", class_="company")
        location_element = job_element.find("p", class_="location")
        Title = title_element.text.strip()
        Company = company_element.text.strip()
        Location = location_element.text.strip()
        write.writerow([Title, Company, Location])

在 csv 文件中输出:

Title,Company,Location
Senior Python Developer,"Payne, Roberts and Davis","Stewartbury, AA"
Energy engineer,Vasquez-Davidson,"Christopherville, AA"
Legal executive,"Jackson, Chambers and Levy","Port Ericaburgh, AA"
Fitness centre manager,Savage-Bradley,"East Seanview, AP"
Product manager,Ramirez Inc,"North Jamieview, AP"
... and many more lines ...

您的代码中的这一行

write.writerows([[Title],[Company],[Location]])

写 3 行,因为每个元素不必要地在一个列表中(即你有 3 个单元素列表的列表)。这

write.writerow([Title, Company, Location]) 

写一行 - 3 个元素的列表。

注意wrte.writerowsvswrite.writerow[[Title],[Company],[Location]]vs[Title, Company, Location]

于 2022-02-13T16:42:28.747 回答