4

我的脚本可以工作,但它会将文件保存为 .part,尽管将其与手动下载的文件进行检查,文件大小相同,谢天谢地。我不明白为什么它被保存为部分文件。我的下一个想法有点不方便。有人知道为什么会这样吗?这是我的代码...有效...

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time
import mechanize
import urllib
from urllib import urlretrieve

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",1)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",'Users/matthewyoung/Downloads')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","Plain text")
fp.set_preference("browser.download.manager.scanWhenDone",False)
fp.set_preference("browser.download.manager.showAlertOnComplete",True)
fp.set_preference("browser.download.manager.useWindow",False)
fp.set_preference("browser.helperApps.alwaysAsk.force",False)

browser = webdriver.Firefox(firefox_profile=fp)



#browser = webdriver.Firefox() # Get local session of firefox
browser.get("http://vizier.u-strasbg.fr/vizier/surveys.htx") # Load page
assert "VizieR" in browser.title
#p = raw_input('Star name? ')
elem = browser.find_element_by_name('-c') # Find the query box
elem.send_keys('mwc 560' + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
elem=browser.find_element_by_name('-out.max')
elem.send_keys('unlimited'+Keys.TAB)
elem2=browser.find_element_by_name('-out.form')
time.sleep(0.5)
elem2.send_keys('; -Separated-Values')
time.sleep(0.5)
elem2.send_keys(Keys.TAB)
elem2.send_keys(Keys.TAB)
time.sleep(0.2)
browser.find_element_by_class_name('data').submit()
time.sleep(3.0)
#df=elem2.send_keys(Keys.SPACE)
#print df
browser.close()
4

3 回答 3

3

It is downloading as .part because that popup save as dialog window appears. Python cannot deal with the popup window. I have found that when you try to set settings for a custom profile in webdriver it doesn't necessarily work (for instance I was able to set a custom profile in selenium to download a csv but not a pdf). However, I was able to solve my pdf problem by creating a custom profile in firefox. I am not very experienced with tsv files so I am not sure what setting that would be. If you can create a new firefox profile (following the instructions here: https://support.mozilla.org/en-US/kb/profile-manager-create-and-remove-firefox-profiles) you can try to set that profile to save tsv by default. If you don't know the exact setting to go in and change in "about:config" you can try just click the checkbox on the popup to always save those kinds of files.

From there you set your profile to that custom profile you created like this:

    profile = webdriver.firefox.firefox_profile.FirefoxProfile("/Users/matthewyoung/Library/Application Support/Firefox/Profiles/"YOUR PROFILE NAME")

Keep in mind that YOUR PROFILE NAME will have a bunch of random letters first, so follow that path to find the actual profile name.

于 2013-10-17T15:28:54.613 回答
0

以下值应用于纯文本:

fp.set_preference("browser.helperApps.neverAsk.saveToDisk","text/plain")
于 2014-11-21T11:56:05.453 回答
0

我认为您的 Firefox 配置文件设置中唯一缺少的是以下内容

fp.set_preference("browser.helperApps.neverAsk.openFile",
                       'Plain Text')

所以整个代码应该是

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.keys import Keys
import time

fp = webdriver.FirefoxProfile()

fp.set_preference("browser.download.folderList",2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir",'Users/matthewyoung/Downloads')

fp.set_preference("browser.helperApps.neverAsk.openFile", 'Plain Text')
fp.set_preference("browser.helperApps.neverAsk.saveToDisk","Plain text")
fp.set_preference("browser.download.manager.scanWhenDone",False)
fp.set_preference("browser.download.manager.showAlertOnComplete",True)
fp.set_preference("browser.download.manager.useWindow",False)
fp.set_preference("browser.helperApps.alwaysAsk.force",False)

browser = webdriver.Firefox(firefox_profile=fp)


browser.get("http://vizier.u-strasbg.fr/vizier/surveys.htx") # Load page
assert "VizieR" in browser.title

elem = browser.find_element_by_name('-c') # Find the query box
elem.send_keys('mwc 560' + Keys.RETURN)
time.sleep(0.2) # Let the page load, will be added to the API
elem=browser.find_element_by_name('-out.max')
elem.send_keys('unlimited'+Keys.TAB)
elem2=browser.find_element_by_name('-out.form')
time.sleep(0.5)
elem2.send_keys('; -Separated-Values')
time.sleep(0.5)
elem2.send_keys(Keys.TAB)
elem2.send_keys(Keys.TAB)
time.sleep(0.2)
browser.find_element_by_class_name('data').submit()
time.sleep(3.0)

browser.close()
于 2014-02-19T05:10:46.337 回答