python - Python urlopen 返回值

Question

我正在尝试将现有 URL 作为参数传递以将其 HTML 加载到单个txt文件中：

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open("C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib.urlopen(line)
    fichier.write(allhtml)
    fichier.close()

但我收到以下错误：

TypeError: expected a character buffer object

score 3 · Accepted Answer

返回的值urllib.urlopen()是一个类似文件的对象，一旦你打开它，你应该用read()方法读取它，如以下代码片段所示：

for line in open('C:\Users\me\Desktop\URLS-HERE.txt'):
   if line.startswith('http') and line.endswith('html\n') :
      fichier = open("C:\Users\me\Desktop\other.txt", "a")
      allhtml = urllib.urlopen(line)
      fichier.write(allhtml.read())
      fichier.close()

希望这可以帮助！

score 1 · Accepted Answer

这里的问题是urlopen返回对您应该从中检索 HTML 的文件对象的引用。

for line in open(r"C:\Users\me\Desktop\URLS-HERE.txt"):
 if line.startswith('http') and line.endswith('html\n') :
    fichier = open(r"C:\Users\me\Desktop\other.txt", "a")
    allhtml = urllib2.urlopen(line)
    fichier.write(allhtml.read())
    fichier.close()

请注意，该urllib.urlopen函数自 python 2.6 起被标记为已弃用。建议urllib2.urlopen改用。

此外，您必须小心使用代码中的路径。你应该逃避每个\

"C:\\Users\\me\\Desktop\\other.txt"

或r在字符串前使用前缀。当存在 'r' 或 'R' 前缀时，反斜杠后面的字符将不加更改地包含在字符串中。

r"C:\Users\me\Desktop\other.txt"

python - Python urlopen 返回值

2 回答 2

Related

Reference