4

我正在尝试编写一个 Python 脚本来下载图像并将其设置为我的墙纸。不幸的是,机械化文档很差。我的脚本正确地跟随链接,但我很难将图像实际保存在我的计算机上。根据我的研究, .retrieve() 方法应该可以完成工作,但是如何指定文件应该下载到的路径?这是我所拥有的...

def followLink(browser, fixedLink):
    browser.open(fixedLink)

if browser.find_link(url_regex = r'1600x1200'):

    browser.follow_link(url_regex = r'1600x1200')

elif browser.find_link(url_regex = r'1400x1050'):

    browser.follow_link(url_regex = r'1400x1050')

elif browser.find_link(url_regex = r'1280x960'):

    browser.follow_link(url_regex = r'1280x960')

 return
4

4 回答 4

9
import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    data = browser.open(image['src']).read()
    browser.back()
    save = open(filename, 'wb')
    save.write(data)
    save.close()

这可以帮助您从网页下载所有图像。至于解析html你最好使用BeautifulSoup或lxml。下载只是读取数据,然后将其写入本地文件。您应该将自己的值分配给 dir。这是你的图像存在的地方。

于 2013-03-24T02:54:39.063 回答
5

不知道为什么这个解决方案没有出现,但您也可以使用该mechanize.Browser.retrieve功能。也许这只适用于较新的版本,mechanize因此没有被提及?

无论如何,如果你想缩短zhangyangyu答案,你可以这样做:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    browser.retrieve(image['src'], filename)
    browser.back()

另请记住,您可能希望将所有这些放入一个try except像这样的块中:

import mechanize, os
from BeautifulSoup import BeautifulSoup

browser = mechanize.Browser()
html = browser.open(url)
soup = BeautifulSoup(html)
image_tags = soup.findAll('img')
for image in image_tags:
    filename = image['src'].lstrip('http://')
    filename = os.path.join(dir, filename.replace('/', '_'))
    try:
        browser.retrieve(image['src'], filename)
        browser.back()
    except (mechanize.HTTPError,mechanize.URLError) as e:
        pass
        # Use e.code and e.read() with HTTPError
        # Use e.reason.args with URLError

当然,您需要根据自己的需要进行调整。如果遇到问题,也许您希望它爆炸。这完全取决于您想要实现的目标。

于 2013-12-16T23:52:32.473 回答
3

You can get/download the image by opening the url of the img src.

image_response = browser.open_novisit(img['src'])

to save the file now, just use fopen:

with open('image_out.png', 'wb') as f:
    f.write(image_response.read())
于 2013-03-24T01:00:20.650 回答
0

这真的很糟糕,但它对我来说“有效”,有0xc0000022l anwer's

从 BeautifulSoup 导入 mechanize, os 导入 BeautifulSoup 导入 urllib2

def DownloadIMGs(url): # IMPORTANT URL WITH HTTP OR HTTPS
    print "From", url
    dir = 'F:\Downloadss' #Dir for Downloads
    basicImgFileTypes = ['png','bmp','cur','ico','gif','jpg','jpeg','psd','raw','tif']

    browser = mechanize.Browser()
    html = browser.open(url)
    soup = BeautifulSoup(html)
    image_tags = soup.findAll('img')
    print "N Images:", len(image_tags)
    print
    #---------SAVE PATH
    #check if available
    if not os.path.exists(dir):
        os.makedirs(dir)
    #---------SAVE PATH
    for image in image_tags:

        #---------SAVE PATH + FILENAME (Where It is downloading)
        filename = image['src']
        fileExt = filename.split('.')[-1]
        fileExt = fileExt[0:3]

        if (fileExt in basicImgFileTypes):
            print 'File Extension:', fileExt
            filename = filename.replace('?', '_')
            filename = os.path.join(dir, filename.split('/')[-1])
            num = filename.find(fileExt) + len(fileExt)
            filename = filename[:num]
        else:
            filename = filename.replace('?', '_')
            filename = os.path.join(dir, filename.split('/')[-1]) + '.' + basicImgFileTypes[0]
        print 'File Saving:', filename
        #---------SAVE PATH + FILENAME (Where It is downloading)

        #--------- FULL URL PATH OF THE IMG
        imageUrl = image['src']
        print 'IMAGE SRC:', imageUrl

        if (imageUrl.find('http://') > -1 or imageUrl.find('https://') > -1):
            pass
        else:
            if (url.find('http://') > -1):
                imageUrl = url[:len('http://')]
                imageUrl = 'http://' + imageUrl.split('/')[0] + image['src']
            elif(url.find('https://') > -1):
                imageUrl = url[:len('https://')]
                imageUrl = 'https://' + imageUrl.split('/')[0] + image['src']
            else:
                imageUrl = image['src']

        print 'IMAGE URL:', imageUrl
        #--------- FULL URL PATH OF THE IMG

        #--------- TRY DOWNLOAD
        try:
            browser.retrieve(imageUrl, filename)
            print "Downloaded:", image['src'].split('/')[-1]
            print
        except (mechanize.HTTPError,mechanize.URLError) as e:
            print "Can't Download:", image['src'].split('/')[-1]
            print
            pass
        #--------- TRY DOWNLOAD
    browser.close()

DownloadIMGs('https://stackoverflow.com/questions/15593925/downloading-a-image-using-python-mechanize')
于 2017-07-05T01:08:56.790 回答