我有一个难题要解决:我有一个 Instagram 用户列表,我需要从每个帐户中提取“关注”列表。为了简化请求,(使用 Python)我使用了一个名为“igramscraper”的模块。我以这种方式构造了脚本:我创建了两个函数(一个用于提取关注者,另一个用于将其放入数据库)我创建了一个 for 循环,该循环遍历用户名,并为循环中的每个用户调用这两个函数。我在用户的迭代中放了一个 time.sleep 时间。在提取关注者的代码中,我首先需要检查帐户是否仍然存在,然后我需要发出请求以获取帐户是否为私有,然后如果 account-is_private==False 我提取相关关注者。
正如我所说,我在用户循环中放置了大约 2 分钟的睡眠时间,在帐户私有请求和帐户跟随请求之间有一个 time.sleep,最后,如果我收到 429 错误太多请求,则通过 try/except ,一次。睡眠约2小时。
问题是即使我等待 2 小时,Instagram 也允许我执行前 100/150 个请求,然后每次我尝试执行此请求时都会拒绝我的请求。
有什么方法或建议可以避免这个问题吗?代码如下:
from igramscraper.instagram import Instagram
import requests
import json
import pandas as pd
import time
from pymongo import MongoClient
import random
global login_username, instagram, login_password, client,db,user_data, data, bio
def getUserFollowing(username):
global followings
followings = []
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.97 Safari/537.36',
}
try:
account = instagram.get_account(username)
if account.is_private == 1:
return print("account privated. Skipping...")
else:
while True:
count = 0
try:
following = instagram.get_following(account.identifier, account.follows_count, 30, delayed=True)
for following_user in following['accounts']:
if following_user.is_verified == 1:
followings.append(following_user.username)
return username, followings, print('following scraped successfully.')
except Exception as e:
print(e)
except Exception as e:
return print("account doesn't exist or some error occurred.." + str(e))
def queryUserFollowing(username, followings):
try:
if not followings:
return print('not insert due to private account')
else:
userself = {
"username": username,
"following": followings,
}
query = user_data.insert_one(userself)
return print('data added.')
except Exception as e:
return ('Something went wrong in querying.')
df = pd.read_csv('/home/rootanalytics/Scrivania/follower.csv')
instagram = Instagram()
client = MongoClient()
db = client['DataUsers']
user_data = db['user-following']
cnt = 0
def loginOne():
usm = 'username'
password = 'pwd'
instagram.with_credentials(usm, password)
log = instagram.login()
return log
loginOne()
for user in df.iterrows():
username = user[1][0]
getUserFollowing(username)
queryUserFollowing(username, followings)