2

I am having a problem and not sure if this is possible at all, so if someone could point me in the right direction.

I need to open a file from a webpage, open it in excel and save the file.

The problem I am running into the file name on the website has a file name ( not an active link ) and then it will have a "download " button that is not specific to the file I need to download. So instead of the download button being "file1todaysdate", they are nothing that I could use from day to day.

Is there a way I could locate file name then grab the file from the download icon? then save in excel? If not sorry for wasting time.

4

3 回答 3

2

当您按下载时,文件实际来自哪里?首先获取该下载链接。如果很难从浏览器中检测到,请使用 firebug 之类的工具获取下载链接。一旦你得到它。您可以使用 Python 使用 urllib.urlretrieve 下载它

filename, msg = urllib.urlretrieve('http://yourlinktodownload/file.xls')

文件名将指向下载的文件。如果是 xls 格式,它应该在 Excel 中打开。

于 2012-04-05T06:21:33.010 回答
2

我认为您要问的是如何在网页中搜索一些不是链接的文本,请求该链接,保存文件。

BeautifulSoup通常用于此目的。

但是,requests是另一个库,您可以使用它来获取页面,然后获取内容以供以后分析。

于 2012-04-05T06:12:01.963 回答
0

Examine the Content-Disposition header of the response to discover what the server wants you to call the file.

于 2012-04-05T06:07:32.797 回答