2

我有一个编码的 URI 组件"http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"。我可以"http://www.yelp.com/biz/carriage-house-café-houston-2"通过递归应用 decodeURIComponent 函数来将其转换为,如下所示

function recursiveDecodeURIComponent(uriComponent){
        try{
            var decodedURIComponent = decodeURIComponent(uriComponent);
            if(decodedURIComponent == uriComponent){
                return decodedURIComponent;
            }
            return recursiveDecodeURIComponent(decodedURIComponent);
        }catch(e){
            return uriComponent;
        }
    }
    console.log(recursiveDecodeURIComponent("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"))

输出:"http://www.yelp.com/biz/carriage-house-café-houston-2".

我想在 python 中得到同样的结果。我尝试了以下方法:

print urllib2.unquote(urllib2.unquote(urllib2.unquote("http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2").decode("utf-8")))

但我得到了http://www.yelp.com/biz/carriage-house-café-houston-2。而不是 Expected character é,我得到了'é'不管任何数量的调用 urllib2.unquote。

我正在使用python2.7.3,有人可以帮助我吗?

4

1 回答 1

1

我想一个简单的循环应该可以解决问题:

uri = "http://www.yelp.com/biz/carriage-house-caf%25C3%25A9-houston-2"

while True:
    dec = urllib2.unquote(uri)
    if dec == uri:
        break
    uri = dec

uri = uri.decode('utf8')
print '%r' % uri  
# u'http://www.yelp.com/biz/carriage-house-caf\xe9-houston-2'
于 2013-02-05T08:16:24.540 回答