r - 如果执行时间过长，如何停止执行 RCurl::getURL()？

Question

有没有办法告诉 R 或 RCurl 包在超过指定时间段时放弃尝试下载网页并转移到下一行代码？例如：

> library(RCurl)
> u = "http://photos.prnewswire.com/prnh/20110713/NY34814-b"
> getURL(u, followLocation = TRUE)
> print("next line") # programme does not get this far

这只会挂在我的系统上，而不是进入最后一行。

编辑：根据@Richie Cotton 下面的回答，虽然我可以“有点”实现我想要的，但我不明白为什么它需要比预期更长的时间。例如，如果我执行以下操作，系统会挂起，直到我在 RGUI 中选择/取消选择“Misc >> Buffered Output”选项：

> system.time(getURL(u, followLocation = TRUE, .opts = list(timeout = 1)))
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : 
  Operation timed out after 1000 milliseconds with 0 out of 0 bytes received
Timing stopped at: 0.02 0.08 ***6.76***

解决方案：基于@Duncan 下面的帖子，然后查看 curl 文档，我通过使用 maxredirs 选项找到了解决方案，如下所示：

> getURL(u, followLocation = TRUE, .opts = list(timeout = 1, maxredirs = 2, verbose = TRUE))

非常感谢你，

托尼·布雷亚尔

O/S: Windows 7
R version 2.13.0 (2011-04-13) Platform: x86_64-pc-mingw32/x64 (64-bit)
attached base packages: 
[1] stats     graphics  grDevices utils    
datasets  methods   base     
other attached packages: 
[1] RCurl_1.6-4.1  bitops_1.0-4.1
loaded via a namespace (and not attached): 
[1] tools_2.13.0

score 5 · Accepted Answer

timeout并且connecttimeout是 curl 选项，因此它们需要在列表中传递.opts给getURL. 不确定您需要哪两个，但从

getURL(u, followLocation = TRUE, .opts = list(timeout = 3))

编辑：

我可以重现挂起；更改缓冲输出并不能解决我的问题（在 R2.13.0 和 R2.13.1 下测试），并且无论是否使用 timeout 参数都会发生这种情况。如果您getURL在作为重定向目标的页面上尝试，它会显示为空白。

u2 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fNY34814%252db&action=details"
getURL(u2)

如果您删除page参数，它会将您重定向到登录页面；也许美通社在索取凭据方面做了一些有趣的事情。

u3 <- "http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&prnid=20110713%252fNY34814%252db&action=details"
getURL(u3)

score 5 · Accepted Answer

我相信 Web 服务器通过告诉我们 URL 被临时移动然后将我们指向一个新 URL 来使自己陷入混乱状态

http://photos.prnewswire.com/medias/switch.do?prefix=/appnb&page=/getStoryRemapDetails.do&prnid=20110713%252fN \ Y34814%252db&action=details

当我们遵循它时，它会再次将我们重定向到....相同的 URL！

所以超时不是问题。响应非常快，因此不会超过超时时间。事实上，我们在圈子里转来转去会导致明显的挂起。

我发现这一点的方法是将verbose = TRUE添加到.opts列表中然后我们看到我们和服务器之间的所有通信。

D.

r - 如果执行时间过长，如何停止执行 RCurl::getURL()？

2 回答 2

Related

Reference