python - 解析html文件后将元组转换为字符串

Question

我需要将解析结果保存在文本文件中。

import urllib
from bs4 import BeautifulSoup
import urlparse

path = 'A html file saved on desktop'

f = open(path,"r")
if f.mode == 'r':       
    contents = f.read()

soup = BeautifulSoup(contents)
search = soup.findAll('div',attrs={'class':'mf_oH mf_nobr mf_pRel'})
searchtext = str(search)
soup1 = BeautifulSoup(searchtext)   

urls = []
for tag in soup1.findAll('a', href = True):
    raw_url = tag['href'][:-7]
    url = urlparse.urlparse(raw_url)
    urls.append(url)
    print url.path

with open("1.txt", "w+") as outfile:
    for item in urls:
        outfile.write(item + "\n")

但是，我得到了这个： Traceback（最近一次调用最后一次）：文件“c.py”，第 26 行，在 outfile.write(item + "\n") TypeError: can only concatenate tuple (not "str") to tuple .

如何将元组转换为字符串并将其保存在文本文件中？谢谢。

score 1 · Accepted Answer

问题是item列表中的每个调用urls都是一个tuple. 元组是其他项目的容器，也是不可变的。当你这样做时item + "\n"，你要求解释器连接一个元组和一个不可能的字符串。

您想要做的是检查元组并选择每个项目中的一个字段以写入输出文件：

with open("1.txt", "w+") as outfile:
    for item in urls:
        outfile.write(str(item[1]) + "\n")

这里元组项的第一个字段首先转换为字符串（如果它碰巧是别的东西），然后与“\n”连接。如果你想按原样编写元组，你可以这样写：

outfile.write(str(item) + "\n")

python - 解析html文件后将元组转换为字符串

1 回答 1

Related

Reference