这是我的示例脚本:
import urllib2, re
response = urllib2.urlopen('http://domain.tld/file')
data = response.read() # Normally displays "the emoticon <3 is blah blah"
pattern = re.search('(the emoticon )(.*)( is blah blah)', data)
result = pattern.group(2) # result should contain "<3" now
print 'The result is ' + result # prints "<3" because not encoded
如您所见,我正在获取一个页面并尝试从中获取一个字符串,但它没有正确编码,因为我不确定要添加到该脚本中的内容以使最终结果正确。谁能指出我做错了什么?