1

我从 facebook-graph-api 获取 json 数据:

  1. 我和朋友的关系
  2. 我的朋友之间的关系。

现在我的程序看起来像这样(在 python 伪代码中,请注意一些变量已经更改为隐私):

import json
import requests

# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"

r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

# list used to store the friend/friend relationships
a = list()

for j in range(0, terminate + 1):
    # calculate terminating displacement:
    term_displacement = terminate - (j + 1) 
    print("Currently processing: " + str(j) + " of " + str(terminate))
    for dj in range(1, term_displacement + 1):
        # construct urls based on the raw data:
        url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr
        # visit site *THIS IS THE BOTTLENECK*:
        reqTemp = requests.get(url)
        rawTemp = json.loads(reqTemp.text)
        if len(rawTemp["data"]) != 0:
            # data dumps to list which dumps to file
            a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))

outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")

# write all me/friend relationship to file
for k in range(0, terminate):
    output.write(_myID + "," + raw["data"][k]["id"] + "\n")

# write all friend/friend relationships to file
for i in range(0, len(a)):
    output.write(a[i])

output.close()

所以它的作用是:首先它调用我的页面并获取我的朋友列表(这是通过 facebook api 使用 access_token 允许的)调用朋友的朋友列表是不允许的,但我可以通过请求朋友之间的关系来解决这个问题我的名单和我名单上的另一个朋友。所以在第二部分(由双 for 循环表示)我正在发出另一个请求,以查看某个朋友 a 是否也是 b 的朋友(两者都在我的列表中);如果是这样,将有一个长度为 1 的 json 对象,其中包含朋友 a 的名字。

但是对于大约 357 个朋友,实际上需要进行数千个页面请求。换句话说,该程序花费大量时间等待 json 请求。

我的问题是,这可以重写以提高效率吗?目前,由于安全限制,不允许调用好友的好友列表属性。而且看起来api不允许这样做。是否有任何 python 技巧可以使它运行得更快?也许并行?

更新修改的代码粘贴在下面的答案部分。

4

2 回答 2

1

更新这是我想出的解决方案。感谢@DMCS 的 FQL 建议,但我只是决定使用我所拥有的。当我有机会研究实施时,我将发布 FQL 解决方案。如您所见,此方法只是利用了更精简的 API 调用。

顺便说一下,API 调用限制为每 600 秒、每个令牌和每个 IP 600 次调用,因此对于每个具有唯一访问令牌的唯一 IP 地址,调用次数限制为每秒 1 次调用。我不确定这对于异步调用@Gerrat 意味着什么,但就是这样。

import json
import requests

# protected
_accessCode = "someaccesscode"
_accessStr = "?access_token=" + _accessCode
_myID = "someidnumber"

r = requests.get("https://graph.facebook.com/" 
    + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

a = list()
for k in range(0, terminate - 1):
    friendID = raw["data"][k]["id"]
    friendName = raw["data"][k]["name"]
    url = ("https://graph.facebook.com/me/mutualfriends/" 
        + friendID + _accessStr)
    req = requests.get(url)
    temp = json.loads(req.text)
    print("Processing: " + str(k + 1) + " of " + str(terminate))
    for j in range(0, len(temp["data"])):
        a.append(friendID + "," + temp["data"][j]["id"] + "," 
            + friendName + "," + temp["data"][j]["name"])

# dump contents to file:
outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")
print("Dumping to file...")
# write all me/friend relationships to file
for k in range(0, terminate):
    output.write(_myID + "," + raw["data"][k]["id"] 
        + ",me," + str(raw["data"][k]["name"].encode("utf-8", "ignore")) + "\n")

# write all friend/friend relationships to file
for i in range(0, len(a)):
    output.write(str(a[i].encode("utf-8", "ignore")) + "\n")

output.close()  
于 2013-01-03T21:31:38.370 回答
0

这可能不是最优的,但我稍微调整了您的代码以使用 Requests 异步方法(未经测试):

import json
import requests
from requests import async

# protected
_accessCode = "someAccessToken"
_accessStr = "?access_token=" + _accessCode
_myID = "myIDNumber"

r = requests.get("https://graph.facebook.com/" + _myID + "/friends/" + _accessStr)
raw = json.loads(r.text)

terminate = len(raw["data"])

# list used to store the friend/friend relationships
a = list()

def add_to_list(reqTemp):
    rawTemp = json.loads(reqTemp.text)
    if len(rawTemp["data"]) != 0:
        # data dumps to list which dumps to file
        a.append(str(raw["data"][j]["id"]) + "," + str(rawTemp["data"][0]["id"]))

async_list = []
for j in range(0, terminate + 1):
    # calculate terminating displacement:
    term_displacement = terminate - (j + 1) 
    print("Currently processing: " + str(j) + " of " + str(terminate))
    for dj in range(1, term_displacement + 1):
        # construct urls based on the raw data:
        url = "https://graph.facebook.com/" + raw["data"][j]["id"] + "/friends/" + raw["data"][j + dj]["id"] + "/" + _accessStr

        req = async.get(url, hooks = {'response': add_to_list})
        async_list.append(req)

# gather up all the results
async.map(async_list)

outputFile = "C:/Users/franklin/Documents/gen/friendsRaw.csv"
output = open(outputFile, "w")
于 2012-12-31T22:00:23.603 回答