0

我对 goose 提取的文本有一个小的正则表达式问题。

我已经使用 Goose 从 html 页面中提取了干净的文本,goose 给出的输出很好,但是有一个小问题。我得到下面的字符串。

    My name is Sam\'s, I like to play \'football\'

The actual text looks like 

    My name is Sam's, I like to play 'football'

I am trying to get rid of the backslash. When I try the below code for the text extracted by goose, somehow the code doesn't work, however, if I input the text myself the code works perfectly.

I tried the below code

re.sub(r"\\","",text) or
text.replace("\\","")
text.decode()

请在下面找到代码:

from goose import Goose
url = 'http://economictimes.indiatimes.com/news/politics-and-    nation/swach-bharat-drives-draws-inspiration-from-mahatma-    gandhi/articleshow/49203355.cms'
g = Goose()
article = g.extract(url=url)
text=article.cleaned_text

print text
.....International School here on Friday, Gandhi\'s 146th birth anniversary.Gurjit Singh said that apart from Gandhi\'s birth anniversary,....

text=re.sub(r"\\","",text)
print text
.....International School here on Friday, Gandhi\'s 146th birth anniversary.Gurjit Singh said that apart from Gandhi\'s birth anniversary,....

我如何摆脱反斜杠。

4

0 回答 0