“urlopen”的相关标签问题_Stack Overflow中文网

0 投票

11 回答

25062 浏览

python - 如何在 python 中加快使用 urllib2 获取页面的速度？

我有一个脚本可以获取多个网页并解析信息。

（可以在http://bluedevilbooks.com/search/?DEPT=MATH&CLASS=103&SEC=01看到一个例子）

我在上面运行了 cProfile，正如我所假设的，urlopen 占用了很多时间。有没有办法更快地获取页面？或者一次获取多个页面的方法？我会做任何最简单的事情，因为我是 python 和 web 开发的新手。

提前致谢！:)

更新：我有一个名为的函数fetchURLs()，我用它来制作我需要的 URL 数组，urls = fetchURLS()比如我的虚拟主机很慢？）

我需要做的是加载每个 URL，读取每个页面，并将该数据发送到脚本的另一部分，该部分将解析和显示数据。

请注意，在获取所有页面之前，我无法执行后一部分，这就是我的问题所在。

此外，我相信，我的主机一次限制我最多 25 个进程，所以服务器上最简单的东西都会很好:)

这是时间：

2010-08-16T02:03:08.140

0 投票

2 回答

3405 浏览

python - 在 Python 的 urllib2 urlopen 中检测超时错误

我对 Python 还是比较陌生，所以如果这是一个明显的问题，我深表歉意。

我的问题是关于 urllib2 库，它是 urlopen 函数。目前我正在使用它从另一台服务器加载大量页面（它们都在同一个远程主机上）但是脚本时不时地被超时错误杀死（我假设这是来自大请求）。

有没有办法让脚本在超时后继续运行？我希望能够获取所有页面，所以我想要一个脚本，它会一直尝试直到它得到一个页面，然后继续。

附带说明一下，保持与服务器的连接是否有帮助？

python urllib2 urlopen

2010-08-18T17:56:50.100

0 投票

1 回答

1155 浏览

python - Python auth_handler 不适合我

我一直在阅读有关 Python 的 urllib2 打开和读取受密码保护的目录的能力，但即使在查看了文档中的示例以及 StackOverflow 上的示例之后，我也无法让我的脚本工作。

当我打印内容时，它会打印登录屏幕的内容，我试图打开的网址会将您重定向到。有谁知道这是为什么？

python authentication urllib2 urlopen

2010-08-20T12:43:10.813

0 投票

5 回答

12172 浏览

python - urllib2.urlopen() 缓存东西吗？

他们在 python 文档中没有提到这一点。最近我正在测试一个网站，只是使用 urllib2.urlopen() 来提取某些内容来刷新网站，我注意到有时当我更新网站时 urllib2.urlopen() 似乎没有得到新添加的内容。所以我想知道它会在某处缓存东西，对吗？

python urllib2 urlopen

2010-08-27T16:34:27.863

0 投票

1 回答

327 浏览

php - How to By pass WP super cache using python?

I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?

php python wordpress urllib2 urlopen

2010-09-07T14:13:09.260

0 投票

1 回答

1146 浏览

python - 为什么 urllib2.urlopen 不能打开像“http://localhost/new-post#comment-29”这样的页面？

我很好奇，为什么我在运行这一行时遇到 404 错误：

虽然在任何浏览器中浏览http://localhost/new-post#comment-29一切正常...

urlopen 方法不解析带有“#”的url？

有人知道吗？

python urllib2 fragment-identifier urlopen

2010-09-26T15:21:09.000

0 投票

1 回答

5463 浏览

python - python mechanize javascript提交按钮问题！

我用 mechanize.browser 模块制作了一些脚本。

问题之一是所有其他事情都可以，但是当提交（）表单时，它不起作用，

所以我发现了一些怀疑来源部分。

在 html 源代码中，我发现如下所示。

我在想，loginCheck(this) 在提交表单时出现问题。

但是如何使用 mechanize 模块处理这种 javascript 函数，所以我可以

成功提交表格并可以收到结果？

以下是与 loginCheck(this) javascript 函数相关的 websource 片段。

我知道 mechanize 不支持 javascript，所以我想以编程方式进行 loginCheck()

python 机械化代码的功能。

有人能帮我把这个javascript函数变成python mechanize吗

翻译代码？

可以正确登录网站吗？

如果这么感谢！

如果有人可以帮助我..非常感谢！

python mechanize urlopen

2010-09-26T15:50:09.230

0 投票

1 回答

1754 浏览

python - Urllib 在 Python 3 中引发无效参数 URLError，urllib.request.urlopen

Python新手，但我正在尝试...从站点检索数据：

这是我在 Python 3.1 文档中看到的相同代码。还有很多网站。

但是，我得到：

我不知道是什么原因造成的。有人知道吗？

python python-3.x urllib urlopen

2010-10-03T19:14:44.717

0 投票

1 回答

328 浏览

python - AppEngine 没有主机出现异常

我有一个 Python 应用程序，它使用urllib.urlopen. 它在上运行良好，但在我的 GAE 生产服务器上dev_appserver.py引发错误。[Errno http error] no host given代码完全相同，它连接到的 url 是硬编码的。我没有想法，可能有什么问题。

UPD：代码：

它获取由 quicklatex.com 网站返回的页面。第一行包含错误数量，第二行包含指向生成图像的链接，然后是空格和数字。我正在获取图片的网址。url变量本身包含一些 LaTeX 代码。

python google-app-engine urlopen

2010-10-04T20:31:12.043

0 投票

2 回答

1592 浏览

python - 无缓冲 urllib2.urlopen

我有用于长期运行进程的 Web 界面客户端。我希望该过程的输出在出现时显示出来。很好用urllib.urlopen()，但它没有timeout参数。另一方面，urllib2.urlopen()输出被缓冲。有没有一种简单的方法可以禁用该缓冲区？

python urllib2 urllib buffering urlopen

2010-10-08T08:20:39.817

问题标签 [urlopen]

Reference