0

我想抓取 Twitter API 以检索特定用户的关注者 ID,以便我可以映射他们的连接。

当我运行下面的代码时,followerIds每个单独的用户都是相同的,这是不对的:

    try:
        import json
    except ImportError:
        import simplejson as json
        import urllib2
        import urllib
        import codecs
        import time
        import datetime
        import os
        import random
        import time
        import tweepy
    from tweepy.parsers import RawParser
        import sys

    fhLog = codecs.open("LOG.txt",'a','UTF-8')
    def logPrint(s):
    fhLog.write("%s\n"%s)
    print s

    #List of screennames of users whose followers we want to get
    users =["_AReichert",
    "_CindyWallace_",
    "_MahmoudAbdelal",
    "1939Ford9N",
    "1FAMILY2MAN",
    "8Amber8",
    "AboutTeaching",
    "AcamorAcademy",
    "acraftymom",
    "ActivNews",
    "ActuVideosPub",
    "ad_jonez",
    "adamsteaching",
    "ADHD_HELP",
    "AIHEHistory",
    "ajpodchaski",
    "ak2mn",
    "AkaMsCrowley",
    "AlanAwstyn",
    "albertateachers"]


     # == OAuth Authentication ==


    # The consumer keys can be found on your application's Details
    # page located at https://dev.twitter.com/apps (under "OAuth settings")
     consumer_key=""
     consumer_secret=""

    # After the step above, you will be redirected to your app's page.
    # Create an access token under the the "Your access token" section
    access_token=""
    access_token_secret=""


    auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
    auth.set_access_token(access_token, access_token_secret)

    rawParser = RawParser()
    api = tweepy.API(auth_handler=auth, parser=rawParser)


    #Will store ids of followers for each user in the user_output directory
    os.system("mkdir -p user_output") #Create directory if it does not exist

    userCnt=0
    fhOverall=None
    for user in users:
         userCnt+=1
         print("Getting user %s of %s"%(userCnt,len(users)))
         count=1
        nCursor=-1#First page
        while count>0:
            id_str=user

            try:
               fh=open("user_output/"+str(id_str)+"_" + str(count) + ".json","r")
               result=fh.read()
               fh.close()
               wait=0
            except: 
               result=api.followers_ids(count=5000,user_id=id_str,cursor=nCursor)
               fh=open("user_output/"+str(id_str)+"_" + str(count) + ".json","w")
               fh.write(result)
               fh.close()
               wait=60


            result=json.loads(result)
            nCursor=result["next_cursor_str"]
            if nCursor=="0":
                count=-1
                nCursor=None
            else:
                count+=1
                print("Another page to get")

            time.sleep(wait)



    logPrint("\nDONE! Completed Successfully")
    fhLog.close()    

我该如何解决这个问题?

4

2 回答 2

0

This will probably not answer your question, but there are indentation problems in your imports... Try this :

try:
  import json
except ImportError:
  import simplejson as json
import urllib2
import urllib
import codecs
import time
import datetime
import os
import random
import time
import tweepy
from tweepy.parsers import RawParser
import sys

Also, you can create a directory with os module directly. Try this:

if not os.path.exists("./user_output"):
  os.path.makedirs("./user_output")

Finally, you do a time.sleep(wait) but wait might not be set. Try this:

if  api.followers_ids(count=5000,user_id=id_str,cursor=nCursor):
  time.sleep(60)
于 2013-08-06T15:49:20.787 回答
0

tweepy 的文档表明 api.followers_ids 接受的唯一参数是 id、user_id 或 screen_name,而不是您传递的三个参数。

http://pythonhosted.org/tweepy/html/api.html#api-reference

您还需要将返回的值分配给结果变量。摆脱 if 语句并将其放在它的位置。

result=api.followers_ids(id_str)
wait=60
于 2013-08-07T10:44:00.510 回答