Python 和 BeautifulSoup 的新手。非常感谢任何帮助
我知道如何建立一个公司信息列表,但那是在单击一个链接之后。
import requests
from bs4 import BeautifulSoup
url = "http://data-interview.enigmalabs.org/companies/"
r = requests.get(url)
soup = BeautifulSoup(r.content)
links = soup.find_all("a")
link_list = []
for link in links:
print link.get("href"), link.text
g_data = soup.find_all("div",{"class": "table-responsive"})
for link in links:
print link_list.append(link)
谁能给出一个想法,如何首先抓取链接,然后为该网站构建所有公司列表数据的 JSON?
我还附上了示例图像以获得更好的可视化效果。
我将如何抓取网站并像下面的示例一样构建 JSON,而无需单击每个单独的链接?
示例预期输出:
all_listing = [ {"Dickens-Tillman":{'Company Detail':
{'Company Name': 'Dickens-Tillman',
'Address Line 1 ': '7147 Guilford Turnpike Suit816',
'Address Line 2 ': 'Suite 708',
'City': 'Connfurt',
'State': 'Iowa',
'Zipcode ': '22598',
'Phone': '00866539483',
'Company Website ': 'lockman.com',
'Company Description': 'enable robust paradigms'}}},
`{'"Klein-Powlowski" ':{'Company Detail':
{'Company Name': 'Klein-Powlowski',
'Address Line 1 ': '32746 Gaylord Harbors',
'Address Line 2 ': 'Suite 866',
'City': 'Lake Mario',
'State': 'Kentucky',
'Zipcode ': '45517',
'Phone': '1-299-479-5649',
'Company Website ': 'marquardt.biz',
'Company Description': 'monetize scalable paradigms'}}}]
print all_listing`