python - BeautifulSoup，在文本文件中解析和写入数据

Question

from bs4 import BeautifulSoup


soup = BeautifulSoup(open("youtube.htm"))

for link in soup.find_all('img'):
    print  link.get('src')



file = open("parseddata.txt", "wb")
file.write(link.get('src')+"\n")
file.flush()

您好，我想尝试一下 BeautifulSoup 并解析了一些 youtube 网站。它得到约。25行链接由此而来。但是，如果我查看文件，则只有最后一个已写入（其中的一小部分）。我尝试了不同的打开模式，或者 file.close() 函数。但没有任何效果。有人有线索吗？

score 5 · Accepted Answer

您正在遍历这一行中的每个 img 标签并打印每个标签：

for link in soup.find_all('img'):
    print  link.get('src')

但是，您不是在该循环中写入文件，您只是link.get('src')+'\n'在最后写入。

这只会写入当前分配给的链接，这只是您在上面的循环中找到的最后一个img 标记。这就是为什么只有一个 'src' 值将写入输出文件的原因。

您需要将每一行写入循环中的文件，该文件会遍历您感兴趣的每个 img 标签。您需要进行一些重新排列才能做到这一点：

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("youtube.htm"))


file = open("parseddata.txt", "wb")

for link in soup.find_all('img'):
    print  link.get('src')
    file.write(link.get('src')+"\n")

file.flush()
file.close()

您还应该记得关闭文件，因为我在上面代码段的最后一行中添加了该文件。

with编辑：根据下面 Hooked 的评论，如果您使用关键字，这就是这个片段的样子。一旦缩进块结束，使用with将自动为您关闭文件，这样您甚至不必考虑它：

from bs4 import BeautifulSoup

soup = BeautifulSoup(open("youtube.htm"))


with open("parseddata.txt", "wb") as file:
    for link in soup.find_all('img'):
        print  link.get('src')
        file.write(link.get('src')+"\n")

python - BeautifulSoup，在文本文件中解析和写入数据

1 回答 1

Related

Reference