1

I'm trying to collecting data from a frequently updating blog, so I simply use a while loop which includes urllib2.urlopen("http:\example.com") to refresh the page every 5 minutes to collect the data I wanted.

But I notice that I'm not getting the most recent content by doing this, it's different from what I see via browser such as Firefox, and after checking both the source code of Firefox and the same page I get from python, I found that it's WP Super Cache which is preventing me from getting the most recent result.

And I still get the same cache page even if I spoof the headers in my python code. So I wonder is there a way to by pass WP super cache? And why there's no such super cache in Firefox at all?

4

1 回答 1

2

您是否尝试过使用一些无害的数据更改 URL?像这样的东西:

import time
urllib2.urlopen("http:\example.com?time=%s" % int(time.time()))

它实际上会调用http:\example.com?time=1283872559. 如果有查询字符串或者不是预期的内容,大多数缓存系统都会绕过缓存。

于 2010-09-07T15:17:36.697 回答