我正在使用 Django 框架开发一个在 Apache 服务器上运行的应用程序。我当前的脚本在本地桌面上运行时运行良好(没有 Django)。该脚本将网站上的所有图像下载到桌面上的文件夹中。但是,当我在服务器上运行脚本时,一个文件对象只是由 Django 创建的,其中显然有一些东西(应该是谷歌的徽标),但是,我无法打开该文件。我还创建了一个 html 文件,更新了图像链接位置,但是 html 文件创建得很好,我假设因为它都是文本,也许?我相信我可能不得不在某处使用文件包装器,但我不确定。任何帮助表示赞赏,以下是我的代码,谢谢!
from django.http import HttpResponse
from bs4 import BeautifulSoup as bsoup
import urlparse
from urllib2 import urlopen
from urllib import urlretrieve
import os
import sys
import zipfile
from django.core.servers.basehttp import FileWrapper
def getdata(request):
out = 'C:\Users\user\Desktop\images'
if request.GET.get('q'):
#url = str(request.GET['q'])
url = "http://google.com"
soup = bsoup(urlopen(url))
parsedURL = list(urlparse.urlparse(url))
for image in soup.findAll("img"):
print "Old Image Path: %(src)s" % image
#Get file name
filename = image["src"].split("/")[-1]
#Get full path name if url has to be parsed
parsedURL[2] = image["src"]
image["src"] = '%s\%s' % (out,filename)
print 'New Path: %s' % image["src"]
# print image
outpath = os.path.join(out, filename)
#retrieve images
if image["src"].lower().startswith("http"):
urlretrieve(image["src"], outpath)
else:
urlretrieve(urlparse.urlunparse(parsedURL), out) #Constructs URL from tuple (parsedURL)
#Create HTML File and writes to it to check output (stored in same directory).
html = soup.prettify("utf-8")
with open("output.html", "wb") as file:
file.write(html)
else:
url = 'You submitted nothing!'
return HttpResponse(url)