0

我对 python 还很陌生,我的代码有问题。我正在尝试解析从 Gracenote 返回的 XML 文件,但我一直遇到问题。这是我试图去掉艺术家名字的代码。

import urllib.request

from lxml import etree

queryXML=b'QUERIES><LANG>eng</LANG><AUTH>/
    +<CLIENT>a_client_id</CLIENT>/
    +<USER>a_user_id</USER>/
    +</AUTH><QUERY CMD="ALBUM_SEARCH"><TEXT TYPE="ARTIST">oasis</TEXT>/
    +<TEXT TYPE="ALBUM_TITLE"></TEXT>/
    +<TEXT TYPE="TRACK_TITLE">wonderwall</TEXT></QUERY></QUERIES>'

response = urllib.request.urlopen("https://c3172608.web.cddbp.net/webapi/xml/1.0/", queryXML)

root = etree.parse(response).getroot()


artist = item.find('ARTIST').text

print(artist)

我收到的错误是

    Traceback (most recent call last):

  File "C:\Users\Aidan Howie\Documents\University\First Year\EE106 Group  
   Project\frankocean.py", line 8, in <module>
    root = etree.parse(response).getroot()
  File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src\lxml\lxml.etree.c:69955)
  File "parser.pxi", line 1769, in lxml.etree._parseDocument 
(src\lxml\lxml.etree.c:102257)
  File "parser.pxi", line 1789, in lxml.etree._parseFilelikeDocument 
(src\lxml\lxml.etree.c:102516)
  File "parser.pxi", line 1684, in lxml.etree._parseDocFromFilelike
 (src\lxml\lxml.etree.c:101442)
  File "parser.pxi", line 1134, in lxml.etree._BaseParser._parseDocFromFilelike 
(src\lxml\lxml.etree.c:97069)
  File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc
 (src\lxml\lxml.etree.c:91275)
  File "parser.pxi", line 683, in lxml.etree._handleParseResult 
(src\lxml\lxml.etree.c:92461)
  File "parser.pxi", line 622, in lxml.etree._raiseParseError 
(src\lxml\lxml.etree.c:91757)
  File "<string>", line None
lxml.etree.XMLSyntaxError: Document is empty, line 1, column 1

任何人都可以帮忙,因为我已经为此苦苦挣扎了一段时间。

4

1 回答 1

0

以下是基于您的代码的快速而肮脏的修复。它在 Python 2.7 上运行。我希望它有所帮助。

import urllib2
import StringIO
from lxml import etree

queryXML='''
<QUERIES>
    <LANG>eng</LANG>
    <AUTH>
        <CLIENT>a_client_id</CLIENT>
        <USER>a_user_id</USER>
    </AUTH>
    <QUERY CMD="ALBUM_SEARCH">
        <TEXT TYPE="ARTIST">oasis</TEXT>
        <TEXT TYPE="ALBUM_TITLE"></TEXT>
        <TEXT TYPE="TRACK_TITLE">wonderwall</TEXT>
    </QUERY>
</QUERIES>
'''.strip()

request = urllib2.Request("https://cxxxxxxx.web.cddbp.net/webapi/xml/1.0/", queryXML)
response = urllib2.urlopen(request)

response_page = response.read()
tree = etree.parse(StringIO.StringIO(response_page))
root = tree.getroot()

artist = root.find('.//ARTIST').text

print artist

请注意,响应返回多个结果,并且此代码仅打印出第一个结果。此处还缺少错误处理,因此请仅将其用作入门的参考。

于 2014-03-27T02:54:43.817 回答