0

我正在编写一个 python 脚本来扫描 youtube 视频,以查找对视频发表评论的人的用户名,并将他们的用户名写入文件。

我正在使用 youtube API,当我打印comment_entry 的整个响应时,我能够获得评论作者。

有没有办法隔离用户名?

例如,输入 9bZkp7q19f0 (Gangnam Style) 作为 video_id 将产生(在第一条评论的值集中):

<?xml version='1.0' encoding='UTF-8'?>
<ns0:entry xmlns:ns0="http://www.w3.org/2005/Atom" xmlns:ns1="http://gdata.youtube.com/schemas/2007"><ns0:category scheme="http://schemas.google.com/g/2005#kind" term="http://gdata.youtube.com/schemas/2007#comment" /><ns0:id>http://gdata.youtube.com/feeds/api/videos/9bZkp7q19f0/comments/LZQPQhLyRh9VQaVtT18UUKqLpyWBdytJ7B-JRTu0cf8</ns0:id><ns0:author><ns0:name>THAsweatyGamer</ns0:name><ns0:uri>https://gdata.youtube.com/feeds/api/users/THAsweatyGamer</ns0:uri></ns0:author><ns0:content type="text">sometimes but not always</ns0:content><ns0:updated>2013-05-17T12:30:27.000Z</ns0:updated><ns0:published>2013-05-17T12:30:27.000Z</ns0:published><ns0:title type="text">sometimes but not ...</ns0:title><ns0:link href="https://gdata.youtube.com/feeds/api/videos/9bZkp7q19f0?client=TJNP_YT_BOT" rel="related" type="application/atom+xml" /><ns0:link href="https://www.youtube.com/watch?v=9bZkp7q19f0" rel="alternate" type="text/html" /><ns0:link href="https://gdata.youtube.com/feeds/api/videos/9bZkp7q19f0/comments/LZQPQhLyRh-b_np-G6TRfbDU8xlXaRcR_qXeRfla_vo?client=TJNP_YT_BOT" rel="http://gdata.youtube.com/schemas/2007#in-reply-to" type="application/atom+xml" /><ns0:link href="https://gdata.youtube.com/feeds/api/videos/9bZkp7q19f0/comments/LZQPQhLyRh9VQaVtT18UUKqLpyWBdytJ7B-JRTu0cf8?client=TJNP_YT_BOT" rel="self" type="application/atom+xml" /><ns1:videoid>9bZkp7q19f0</ns1:videoid></ns0:entry>

我想隔离<ns0:author><ns0:name>THAsweatyGamer</ns0:name><ns0:uri>https://gdata.youtube.com/feeds/api/users/THAsweatyGamer</ns0:uri></ns0:author>将用户名写入文件。使用 comment_entry.author 产生:

[<atom.Author object at 0x02CE5B50>]
[<atom.Author object at 0x02CE5EB0>]
[<atom.Author object at 0x02CED230>]
[<atom.Author object at 0x02CED5B0>]
[<atom.Author object at 0x02CED910>]
[<atom.Author object at 0x02CEDCD0>]
[<atom.Author object at 0x02CF6070>]
[<atom.Author object at 0x02CF63D0>]
[<atom.Author object at 0x02CF6750>]
[<atom.Author object at 0x02CF6B10>]
[<atom.Author object at 0x02CF6E90>]
[<atom.Author object at 0x03591210>]
[<atom.Author object at 0x03591590>]
[<atom.Author object at 0x03591950>]
[<atom.Author object at 0x03591CD0>]
[<atom.Author object at 0x0359B050>]
[<atom.Author object at 0x0359B3D0>]
[<atom.Author object at 0x0359B750>]
[<atom.Author object at 0x0359BAD0>]
[<atom.Author object at 0x0359BE50>]
[<atom.Author object at 0x035A31D0>]
[<atom.Author object at 0x035A3530>]
[<atom.Author object at 0x035A3890>]
[<atom.Author object at 0x035A3BF0>]

我的脚本(到目前为止)是:

import gdata.youtube
import gdata.youtube.service

yt_service = gdata.youtube.service.YouTubeService()
yt_service.ssl = True
yt_service.developer_key = #mykey
yt_service.client_id = #myclientid
yt_service.source = #myclientid

video_id = raw_input("Enter the video's ID")

comment_feed = yt_service.GetYouTubeVideoCommentFeed(video_id= video_id)
for comment_entry in comment_feed.entry:
  print comment_entry.author
4

1 回答 1

0

您需要使用 XML 解析器来提取您要查找的数据。这是一个使用 Python 的Element Tree XML API的快速示例:

import xml.etree.ElementTree as ET
tree = ET.parse('youtube.xml')
root = tree.getroot()

print root[2][0].text
print root[2][1].text

这将为您提供以下输出:

THAsweatyGamer
https://gdata.youtube.com/feeds/api/users/THAsweatyGamer

注意:youtube.xml在上面的示例代码中是一个文件,其中包含您在问题中包含的 YouTube XML 输出。

于 2013-05-17T14:39:28.070 回答