2

我正在编译使用我的 jQuery 插件 ( http://loopj.com/jquery-tokeninput/ ) 的大型网站列表。

任何人都可以建议最好/最简单的方式来抓取网络以使用特定的 js 库吗?

4

1 回答 1

2

如果你能找到一个不错的网站列表,你可以试试这个:

$ curl http://namebench.googlecode.com/svn-history/r495/trunk/data/alexa-top-10000-global.txt >  sitelist.txt
$ cat sitelist.txt | xargs -P 32 -I {} wget -x -k -t1 -T1 {}
$ grep -r tokeninput .
./magicbricks.com/index.html:       <script src="http://www.magicbricks.com/scripts/mainJsGroup.js,Mjm.7iF81SLtpJ.js+dwrGroup.js,Mjm.sV7AgY13MG.js+jquery.js,Mjm.bKnjrjdqsX.js+jquery.tokeninput.js,Mjm.44Xrew-eFT.js.pagespeed.jc.XlwW_QiXCc.js"></script><script type='text/javascript'>eval(mod_pagespeed_vEIzknsBWt);</script>
./spoke.com/index.html:<script src="http://www.spoke.com/javascripts/3p/jquery.tokeninput.min.js?1327114061" type="text/javascript"></script>

如果你觉得雄心勃勃,你可以跟进:

$ cat `find -name '*.html'` | \grep -o "['\"]http[^'\"]*\?jquery[^'\"]*\?['\"]" | sed -e 's%/[^/]*$%/jquery.tokeninput.js%' | sed -e "s/['\"]//g"|uniq > potential.txt
$ cat potential.txt | xargs -P 32 -I {} wget -x -t 1 -T 5 {}
$ grep -r loopj .
./www.g4tv.com/assets/js/jquery/plugins/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./static.networkedblogs.com/static/js/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./www.podomatic.com/javascripts/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./cdn4.f-cdn.com/js/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./cdn1.static.youporn.phncdn.com/cb/youpornwebfront/js/jquery.tokeninput.js: * http://loopj.com/jquery-tokeninput/
./scd.hwstatic.com/static/js/1.17.15.1/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./cdn6.f-cdn.com/js/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./cdn5.f-cdn.com/js/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
./static.expressz.hu/_expressz_/js/jquery.tokeninput.js: * Copyright (c) 2009 James Smith (http://loopj.com)
于 2012-09-14T07:50:02.933 回答