将 dryscrape 及其依赖项更新到最新版本后,它现在可以正常工作了。
版本是:dryscrape-1.0、lxml-4.1.1、webkit-server-1.0、xvfbwrapper-0.2.9
编码:
import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
loop = 1
while loop < 100000:
sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(30)
sess.visit(url)
response = sess.body()
print(response)
print('loop:', loop)
sess.reset()
loop = loop + 1
输出:
'loop:' 1
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
'loop:' 2
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
'loop:' 3
<!DOCTYPE html><html><head>
<meta charset="utf-8">
<title>Javascript scraping test</title>
</head>
<body>
<p id="intro-text">Yay! Supports javascript</p>
<script>
document.getElementById('intro-text').innerHTML = 'Yay! Supports javascript';
</script>
</body></html>
如果您无法更新模块,或者不想更新,快速修复将在循环结束时访问另一个页面。
import dryscrape
dryscrape.start_xvfb()
sess = dryscrape.Session()
url = 'http://192.168.1.5/jsSupport.html';
otherurl = "http://192.168.1.5/test"
loop = 1
while loop < 100000:
sess.set_header('user-agent', 'Mozilla/5.0 (Windows NT 6.4; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2225.0 Safari/537.36')
sess.set_attribute('auto_load_images', False)
sess.set_timeout(30)
sess.visit(url)
response = sess.body()
print(response)
print('loop:', loop)
sess.reset()
loop = loop + 1
sess.visit(otherurl) #Visits the other url, so that when sess.visit(url) is called, it is forced to visit the page again.