2

我正在使用 Beautiful Soup 4 从 HTML 文件中提取文本,并且使用get_text()我可以轻松地仅提取文本,但现在我正在尝试将该文本写入纯文本文件,当我这样做时,我收到消息“ 416。” 这是我正在使用的代码:

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup)
f = open("example.txt", "w")
f.write(soup.get_text())

控制台的输出是,416但没有任何内容写入文本文件。我哪里出错了?

4

1 回答 1

5

您需要向BeautifulSoup班级发送文本。也许试试markup.read()

from bs4 import BeautifulSoup
markup = open("example1.html")
soup = BeautifulSoup(markup.read())
markup.close()
f = open("example.txt", "w")
f.write(soup.get_text())
f.close()

并以更pythonic的风格

from bs4 import BeautifulSoup

with open("example1.html") as markup:
    soup = BeautifulSoup(markup.read())

with open("example.txt", "w") as f: 
    f.write(soup.get_text())

正如@bernie 建议的那样

于 2013-04-26T16:52:12.350 回答