1

在我的代码的第一部分,我获得了 20 个在推文中使用过“健身房”一词的用户的列表。这部分工作正常。

在第二部分中,我尝试使用在第一部分中获得的用户名,并获取他们最近 20 条推文中的每一条。

我目前的代码没有运行任何错误,但它肯定没有返回我在第一部分获得的每个人的 20 条推文,它所做的只是返回第一部分结果的最后一行.

我的代码在下面,如您所见,我尝试使用在第一部分“推文”中创建的列表作为第二部分中的 id 输入,并且我使用 [2] 作为尝试仅调用列表的第三列(用户名所在的位置)。

import tweepy
from tweepy import OAuthHandler
import pandas as pd

access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

tweets = []

count = 20

for tweet in tweepy.Cursor(api.search, q="gym"+'-filter:retweets', since='2020-02-08', tweet_mode='extended',
                           lang='en').items(count):

    try:
        data = [tweet.full_text, tweet.user.screen_name]
        data = tuple(data)
        tweets.append(data)

    except tweepy.TweepError as e:
        print(e.reason)
        continue

    except StopIteration:
        break

df = pd.DataFrame(tweets,
                  columns=['Tweet', '@ Name'])

print(df)

new_tweets = []

username = tweets[1]
count = 20

for user in tweepy.Cursor(api.user_timeline, id=username, tweet_mode='extended').items(count):

    try:
        data = [tweet.full_text, tweet.user.screen_name]
        data = tuple(data)
        new_tweets.append(data)

    except tweepy.TweepError as e:
        print(e.reason)
        continue

    except StopIteration:
        break

df2 = pd.DataFrame(new_tweets, columns=['Tweets', '@ Name'])

print(df2)

df2.to_csv('test3.csv')

这是我的输出:

                                                Tweet          @ Name
0            Gym chronicles                             chocodilish
1   @neilmcrowther @SpotifyUK I have a Spotify pla...    carey_bamber
2   Food pick-up for virtual learners today 9:00-1...    allentrotter
3                         couldn’t sleep so gym it is   esmeraldahdz_
4   We need I.D. to buy beer, to buy ciggies, we n...       beryl1946
5   So I actually have to go to the gym to have a ...       ___tshego
6   Currently three Marcela Bielsa lookalikes in t...     sammyptweet
7   I’m dreading going to the gym and coming back ...   cinnamonKayyy
8   yes we were there... what the fuck is going on...        blubbsie
9   @IamEzeNwanyi @LilburnEnugu @mr_robmichael @He...        _lilivet
10                                   GYM WEEK 2  LEGO        Mondo_92
11  Webinars for this week are as follows,\nBrain ...    EdCentreMayo
12    I rather be wakin up for the gym than work tbh.    illmindofPAT
13  First day back in the gym doing BASKETBALL  ...  AUMWarhawksWBB
14  i don’t wanna go to school today since i know ...  CEOofTsuyuAsui
15  @sunikies GYM DHSHSKDSH (i miss it :( ), indiv...       shienIove
16  @PaulMumba_ Is that gym work I'm seeing on tha...       jaymaxgie
17  Body builders on Instagram don’t go to the gym...  OfficialShann_
18                      @DivinePooh gym and game room     FinesseDee2
19  I use to wake up to go to the gym at this hour...    missgenafire
20
                                              Tweets        @ Name
0  I use to wake up to go to the gym at this hour...  missgenafire

Process finished with exit code 0

任何帮助将不胜感激,非常感谢。

4

2 回答 2

0

一些事情:

  1. 您不会迭代 20 个不同的用户名。您将其硬编码为仅使用 1 username = tweets[1]。即使那样,那是一个元组('tweetmessage','username'),所以你希望字符串在索引位置 1 那里,即'tweets[1][1]

  2. 您将 sostring 迭代为user,但随后调用tweet变量:

     for user in tweepy.Cursor(api.user_timeline, id=username, tweet_mode='extended').items(count):
    
         try:
             data = [user.full_text, user.user.screen_name]  #<-- correct
             #data = [tweet.full_text, tweet.user.screen_name]  #<-- incorrect
             ...
    

完整代码:

import tweepy
from tweepy import OAuthHandler
import pandas as pd

access_token = ''
access_token_secret = ''
consumer_key = ''
consumer_secret = ''

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)

api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)

tweets = []

count = 20

for tweet in tweepy.Cursor(api.search, q="gym"+'-filter:retweets', since='2020-02-08', tweet_mode='extended',
                           lang='en').items(count):

    try:
        data = [tweet.full_text, tweet.user.screen_name]
        data = tuple(data)
        tweets.append(data)

    except tweepy.TweepError as e:
        print(e.reason)
        continue

    except StopIteration:
        break

df = pd.DataFrame(tweets,
                  columns=['Tweet', '@ Name'])

print(df)

new_tweets = []
count = 20
for tweet, username in tweets:
    for user in tweepy.Cursor(api.user_timeline, id=username, tweet_mode='extended').items(count):
    
        try:
            data = [user.full_text, user.user.screen_name]
            data = tuple(data)
            new_tweets.append(data)
    
        except tweepy.TweepError as e:
            print(e.reason)
            continue
    
        except StopIteration:
            break

df2 = pd.DataFrame(new_tweets, columns=['Tweets', '@ Name'])

print(df2)

df2.to_csv('test3.csv')
于 2020-09-25T08:24:32.287 回答
0

在您的第二个 for 循环中,您仍在使用第一个 for 循环的tweet变量。您应该使用用户变量。

于 2020-09-21T01:17:49.767 回答