python - 如何使用 Python 的 gdata 模块获取所有 YouTube 评论？

Question

希望从给定视频中获取所有评论，而不是一次只看一页。

from gdata import youtube as yt
from gdata.youtube import service as yts

client = yts.YouTubeService()
client.ClientLogin(username, pwd) #the pwd might need to be application specific fyi

comments = client.GetYouTubeVideoComments(video_id='the_id')
a_comment = comments.entry[0]

上面的代码可以让你抓取一条评论，可能是最近的评论，但我正在寻找一种方法来一次抓取所有评论。Python的gdata模块可以做到这一点吗？

用于评论的 Youtube API 文档、评论源文档和 Python API文档

score 7 · Accepted Answer

以下实现了您使用Python YouTube API的要求：

from gdata.youtube import service

USERNAME = 'username@gmail.com'
PASSWORD = 'a_very_long_password'
VIDEO_ID = 'wf_IIbT8HGk'

def comments_generator(client, video_id):
    comment_feed = client.GetYouTubeVideoCommentFeed(video_id=video_id)
    while comment_feed is not None:
        for comment in comment_feed.entry:
             yield comment
        next_link = comment_feed.GetNextLink()
        if next_link is None:
             comment_feed = None
        else:
             comment_feed = client.GetYouTubeVideoCommentFeed(next_link.href)

client = service.YouTubeService()
client.ClientLogin(USERNAME, PASSWORD)

for comment in comments_generator(client, VIDEO_ID):
    author_name = comment.author[0].name.text
    text = comment.content.text
    print("{}: {}".format(author_name, text))

不幸的是，API 将可检索的条目数限制为1000。这是我尝试使用手工制作的GetYouTubeVideoCommentFeedURL 参数调整版本时遇到的错误：

gdata.service.RequestError: {'status': 400, 'body': 'You cannot request beyond item 1000.', 'reason': 'Bad Request'}

请注意，同样的原则应该适用于检索 API 的其他提要中的条目。

如果你想手工制作GetYouTubeVideoCommentFeedURL 参数，它的格式是：

'https://gdata.youtube.com/feeds/api/videos/{video_id}/comments?start-index={sta‌rt_index}&max-results={max_results}'

适用以下限制：start-index <= 1000和max-results <= 50。

score 2 · Accepted Answer

我现在唯一的解决方案，但它没有使用 API，当有几千条评论时会变慢。

import bs4, re, urllib2
#grab the page source for vide
data = urllib2.urlopen(r'http://www.youtube.com/all_comments?v=video_id') #example XhFtHW4YB7M
#pull out comments
soup = bs4.BeautifulSoup(data)
cmnts = soup.findAll(attrs={'class': 'comment yt-tile-default'})
#do something with them, ie count them
print len(cmnts)

请注意，由于 'class' 是一个内置的 python 名称，因此您不能通过正则表达式或 lambdas 定期搜索 'startwith' ，因为您使用的是 dict，而不是常规参数。由于 BeautifulSoup，它也变得相当缓慢，但它需要被使用，因为etree并且由于minidom某种原因找不到匹配的标签。即使在prettyfying()与bs4

python - 如何使用 Python 的 gdata 模块获取所有 YouTube 评论？

2 回答 2

Related

Reference