2

很抱歉再次发布此信息。UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 45: ordinal not in range(128)运行以下代码 strip_html() 时出现此错误:

from HTMLParser import HTMLParser
class MLStripper(HTMLParser):
    def __init__(self):
        self.reset()
        self.fed = []

    def handle_data(self, d):
        self.fed.append(d)

    def get_data(self):
        return ''.join(self.fed)


def strip_tags(html):
    s = MLStripper()
    s.feed(html )
    return s.get_data()

在这串文本上:

"<p>We’re implementing the PayPal MECL library in a client’s app but we’re experiencing some poor user experience that we don’t seem to be able to change. \nWhen the PayPal experience is complete, PayPal show a “Please wait while we transfer you to the business site...” message. Obviously this is an iOS app not a “business site”...</p>\n\n<p>The flow functions by dismissing the web view on completion of the PayPal experience by listening for new URL requests within the UIWebViewDelegate method:</p>\n\n<pre><code>- (BOOL)webView:(UIWebView *)webView shouldStartLoadWithRequest:(NSURLRequest *)request navigationType:(UIWebViewNavigationType)navigationType\n</code></pre>\n\n<p>This issue seems to be that PayPal update their web view with the message via editing the DOM (JS or some such) which does not create a new web request and therefor no shouldStartLoadWithRequest fired. Note: A new request is made after a second or so when redirected but that’s too late, the inappropriate copy has been presented to the user.</p>\n\n<p>Has anyone working with MECL on iOS or Android managed to alter this copy/experience either via the <a href=\"https://cms.paypal.com/uk/cgi-bin/?cmd=_render-content&amp;content_ID=developer/e_howto_api_nvp_r_SetExpressCheckout\" rel=\"nofollow\">SetExpressCheckout</a> server call or configuration of the <a href=\"https://cms.paypal.com/uk/cgi-bin/?cmd=_render-content&amp;content_ID=developer/e_howto_api_WPECOnMobileDevices\" rel=\"nofollow\">MECL URL get params</a>?I ’ve been unable to find a resolution on this so far but will post a solution if we find one. Any help would be greatly appreciated as we don’t seem to be able to find a solution in PayPals documentation...</p>\n\n<p><strong>NOTE:</strong> Also we have a similar UX issue when pressing the cancel button on the PayPal web view that causes a redirect, but with a similar bad piece of copy presented before hand “Cancel this purchase and return to the seller’s website?”. This is worded as a confirmation dialogue but there are no buttons presented and it redirects anyway. Mad UX. Again if anyone knows a solution to either if these please post.</p>\n\n<p><img src=\"http://i.stack.imgur.com/gc4zq.png\" alt=\"&quot;Please wait while we transfer you to the business site...&quot; image\"></p>\n\n<p><img src=\"http://i.stack.imgur.com/cztum.png\" alt=\"&quot;Cancel this purchase and return to the seller’s website?&quot; image\"></p>\n"

我正在处理 600 万份文档,到目前为止(完成的 10%)我遇到了上述错误消息。如果我a.decode("utf-8")在调用strip_tags函数之前这样做,我可以为上述文本修复此问题,但我的其余文本代码停止工作。

关于我能做什么的任何想法?我很想只使用正则表达式来去除 HTML 标签(我知道这是错误的)。

谢谢你。

4

0 回答 0