0

我正在尝试解析推特所需的输出是推文的 URL、推文的日期、发件人和推文本身。没有错误,但结果为空。我找不到代码在下面的问题:如果你能帮助我,那就太好了,因此我将在我的论文中使用这些数据

from bs4 import BeautifulSoup
import urllib.request
import openpyxl
wb= openpyxl.load_workbook('dene1.xlsx')
sheet=wb.get_sheet_by_name('Sayfa1')
headers = {}
headers['User-Agent'] = "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"
url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()
soup = BeautifulSoup(respData , 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    try:
        items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'})
        items21=items2.get('href')
        items22=items2.get('title')
    except:
        pass
    try:
        items1 = item.find('span', {'class': 'username js-action-profile-name'}).text
    except:
        pass
    try:
        items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'}).text
        sheet1=sheet.append([items21, items22,items1,items3])
    except:
        pass
wb.save('dene1.xlsx')

问候

4

1 回答 1

0

你的异常中的每一行至少会导致一次错误,当你使用空白异常来捕获每个异常时,你永远不会看到它们:

import urllib.request
from bs4 import BeautifulSoup


headers = {
    'User-Agent': "Mozilla/5.0 (X11; Linux i686) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.27 Safari/537.17"}

url = 'https://twitter.com/search?q=TURKCELL%20lang%3Atr%20since%3A2012-01-01%20until%3A2012-01-09&src=typd&lang=tr'
req = urllib.request.Request(url, headers = headers)
resp = urllib.request.urlopen(req)
respData = resp.read()

soup = BeautifulSoup(respData, 'html.parser')
gdata = soup.find_all("div", {"class": "content"})
for item in gdata:
    items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
    if items2:
        items21 = items2.get('href')
        items22 = items2.get('title')
        print(items21)
        print(items22)
    items1 = item.find('span', {'class': 'username js-action-profile-name'})
    if items1:
        print(items1.text)
    items3 = item.find('p', {'class': 'TweetTextSize js-tweet-text tweet-text'})
    if items3:
        print(items3.text)

现在你可以看到很多输出。

于 2016-10-24T10:03:29.047 回答