0

I am retrieving data from a url as:

data = urllib2.urlopen(url).read()

However, I noticed that there are no tags. This was because the "<" and ">" were replaced by ";" character. Everything else is intact. So:

<foo>bar</foo> is changed to ;foo;bar;/foo;

How can I fix this and why is it happening?

[EDIT]: I found out how to fix it. Apparently, it was replacing '<' with '&lt' and '>' with '&gt'. I guess the short forms of the signs. I still don't know why this is happening. I guess some bug in the webservice/API.

4

1 回答 1

0

我刚刚跑了这个:

    import urllib2

    url='http://www.google.com'
    data = urllib2.urlopen(url).read()
    print data

我得到很多 < 和 >,包括最后一行</script></body></html>

您能否发布更多详细信息,例如您尝试访问的 url 和 的值data

于 2012-12-13T17:00:08.160 回答