我正在尝试在 python 中编写一个简单的程序,它为所有以“http://”开头的文本解析网页,这就是我目前所拥有的。
import urllib2
from BeautifulSoup import BeautifulSoup
from bs4 import BeautifulSoup
malurl = urllib2.urlopen('minotauranalysis.com/malwarelist-urls.aspx')
soup = BeautifulSoup(malurl)
print(soup.get_text())
当我运行程序时,这是我得到的结果
5678875/18/2013 7:06:52 AMvxVaulthttp://46.254.16.170:80/9.exe
5678815/18/2013 1:05:37 AMMalc0dehttp://thewebblocker.com/TheWebBlocker.exe
5678555/17/2013 11:05:21 PMMalc0dehttp://zenerx.product-truths.com/ebook/freetrialzenerx.exe
5672545/17/2013 4:12:45 PMClean-MXhttp://www.suportersteaua.net/archive/index.php/t-2339.html%3Flogin%3D3d1
5666735/16/2013 3:13:23 PMMalc0dehttp://pad.okozo.com/installers/digital-alarm-clock.exe
5665865/16/2013 10:40:53 AMMalc0dehttp://educacionfinanciera.fovissste.gob.mx/elearning/materiales/install_flashplayer.exe
5662765/16/2013 3:49:35 AMClean-MXhttp://d6.05sun.com/books/ping3gb.exe
5661125/16/2013 3:05:36 AMClean-MXhttp://www.bigticketreview.com/software/bttw02/makemoneyonlinebigticketreviewDropDown.exe
5652715/15/2013 2:29:56 PMClean-MXhttp://www.kv-dinklage.de/oben.html
5649075/15/2013 10:29:32 AMClean-MXhttp://udegdogale.rbcmail.ru/28/zxc.js
5644295/14/2013 6:59:55 PMClean-MXhttp://inet-tr.org.tr/inetconf6/inet-forum/Oct/
5641295/14/2013 10:57:21 AMClean-MXhttp://thebabycub.com/wp-includes/js/jquery/jquery.jsver=1.7.2
5636525/14/2013 8:13:02 AMClean-MXhttp://apstudio-pc.it
5636305/14/2013 8:05:06 AMClean-MXhttp://rocketdock.com/addon/37092/download
5634765/14/2013 7:23:06 AMClean-MXhttp://gameguru.ru/files/5142/download.html
5627075/10/2013 10:25:33 AMvxVaulthttp://irojtumy.ru/shem001.exe
5630915/13/2013 12:37:56 AMClean-MXhttp://tamasu.org/css.js
5630705/13/2013 12:35:25 AMClean-MXhttp://www.goldenyachtcharters.com/templates/yacht-charters/js/css_browser_selector.js
5629335/13/2013 12:04:51 AMClean-MXhttp://rolmlecz.republika.pl/
5628935/12/2013 11:56:24 PMClean-MXhttp://www.buzz-fr.com/ppl/68a3b67202e532e0352556e48599c2e3/
5627845/12/2013 11:03:01 PMMalc0dehttp://www.wadel.com/clients/cite.exe
5627525/12/2013 3:04:32 PMMalc0dehttp://accf.fantasygames.to/LauncherCFFGnew.exe?random=3324
5627355/11/2013 9:04:39 AMMalc0dehttp://down.power-pc.co.kr/pwtray.exe
5627285/11/2013 1:04:30 AMMalc0dehttp://www.iefloating.com/update/ddn/instt2.exe
5622955/9/2013 7:50:23 PMClean-MXhttp://pad.okozo.com/installers/ml-silver.exe
5622825/9/2013 7:47:08 PMClean-MXhttp://testing.keywordclicker.com/downloads/setmeup.exe
5622735/9/2013 7:31:54 PMClean-MXhttp://utilityport.net/templates/atomic/images/prorate.exe
5621515/9/2013 5:04:07 AMvxVaulthttp://www.naturacoiffure.com/images/svchost.exe
5620205/8/2013 6:05:48 AMClean-MXhttp://bbs.jysq.net/forum.php?mod=attachment&aid=NDkwNDI1fGI2OTQ4NTJhfDEzNjc2MzYxNTh8MHwxMjY4OTU1
5620155/8/2013 6:02:48 AMClean-MXhttp://download.77169.com/soft/hacrktools/control/hyhd.zip
5619925/8/2013 5:49:03 AMClean-MXhttp://www.daohaowang.com/forum.php?mod=attachment&aid=NnxhNWEwYzA0OXwxMzY3NDg4MTg0fDB8MzA%3D
5619895/8/2013 5:47:06 AMClean-MXhttp://29.duote.org/dashidingshidshuen.zip
5619835/8/2013 5:43:26 AMClean-MXhttp://21.duote.org:8080/qqlsbldq.zip
5619705/8/2013 5:37:00 AMClean-MXhttp://down2.btfw.org/Temporary/%E6%9E%81%E8%99%8E%E7%97%85%E6%AF%92%E6%A0%B7%E6%9C%AC.rar
5619625/8/2013 5:34:20 AMClean-MXhttp://anhui.jz5u.com/dpin/0000000559244910218/1367634317/b095dfc04e5f1f130d0237d12733c698/soft-2013-02/qqckq.rar
5619565/8/2013 5:31:50 AMClean-MXhttp://c.uzzf.com/lyl/voteHelp_fr.zip
5619555/8/2013 5:31:31 AMClean-MXhttp://58.duote.org/ajzs.zip
5619525/8/2013 5:30:10 AMClean-MXhttp://down4.cr173.com/soft3/ddtsssmzqpj.zip
5618755/8/2013 5:12:23 AMClean-MXhttp://down.signkey.co.kr/olive/signkey.exe
5618445/8/2013 3:10:08 AMMalc0dehttp://176.74.218.47/winAPI.exe
5618275/8/2013 1:06:46 AMMalc0dehttp://cdn.bubbledock.es/cl/inst/bundles/Lollipop/2013_02_05/setup__1730.exe
5616385/6/2013 10:40:23 PMClean-MXhttp://www.ezlifeinc.org/easylifeshop/product_detail.aspx?CID=4&PID=28&rev=y
5612175/6/2013 7:08:41 AMMalc0dehttp://testing.keywordclicker.com/downloads/driverpackage.exe
5608975/5/2013 6:02:41 AMClean-MXhttp://kola.by/DISKaeznemezis.htm
5608165/5/2013 5:45:00 AMClean-MXhttp://rinylagarde.nl/
5607225/5/2013 5:21:59 AMClean-MXhttp://sky.ru/SkyInform/update.zip
5606675/5/2013 5:10:34 AMClean-MXhttp://cozyflames.com/wp-content/themes/MyProduct/epanel/shortcodes/js/et_shortcodes_frontend.js?ver=2.0
5606615/5/2013 5:08:57 AMClean-MXhttp://a.coughstuffs.com/ic/gplhblite63/45645/0/a13f3ada-8639-4a33-adbf-85a5e28bccc5/emulesetup.exe
5602965/4/2013 12:14:24 PMClean-MXhttp://www.mak.web.kg/
5602625/4/2013 12:03:17 PMClean-MXhttp://brunoraineri.it/links/siti1.htm
5601015/3/2013 4:57:55 PMClean-MXhttp://b20.uzzf.com/cl/iisserver_gr.zip
5600365/3/2013 4:13:24 PMClean-MXhttp://jses40813.ibbt.tw/bbs/forum.php?mod=attachment&aid=MTcyMXw5M2NmYTFiZXwxMzY1MzE0MjI4fDEwMDYyfDM5MjM=
5600065/3/2013 3:54:56 PMClean-MXhttp://w1.hackhome.com/h1/-dnfdjlfcxxz.zip
5600035/3/2013 3:53:49 PMClean-MXhttp://wt.wxdown.net:8083/soft/18X%20TV.rar?11102975449718x1365252622x11103022167864-52b5fa9d39168668d78cc7af5ba59563
5599985/3/2013 3:51:51 PMClean-MXhttp://wt7.52z.com/QQSSV.zip
5597275/3/2013 6:55:28 AMClean-MXhttp://c5.cr173.com/soft1/winbox3.30.zip
5597105/3/2013 6:51:48 AMClean-MXhttp://a.coughstuffs.com/ic/gplcplite69/44593/0/f57aaf9d-3aca-417d-b3a4-684d724359ab/xvidsetup.exe
5596705/3/2013 6:46:01 AMClean-MXhttp://a.coughstuffs.com/ic/gplcplite69/44593/0/c1a29cca-a36a-42f0-9d4f-52cc1d111dca/xvidsetup.exe
5596675/3/2013 6:45:39 AMClean-MXhttp://a.coughstuffs.com/ic/gplcplite69/44593/0/ba076324-2a70-4f98-9c0f-89c8ebd857fc/xvidsetup.exe
5596615/3/2013 6:44:48 AMClean-MXhttp://arnika.ua/index.php?Itemid=23
5596015/3/2013 6:21:07 AMClean-MXhttp://forum.szczytno.org/cron.php?amp;amp;
5594365/3/2013 5:24:54 AMClean-MXhttp://www.chdr.org/bamp/buyafricanmangoplus.exe
5587925/1/2013 9:09:41 AMvxVaulthttp://jahbob3.free.fr/server.exe
5587825/1/2013 9:03:21 AMMalc0dehttp://win.olisistemi.com/servizio/PRTbox.exe
5587284/30/2013 6:53:19 AMClean-MXhttp://sv80.de/Material/bilder/Schuetzenfest%202007/pages/DSC00250-1.html
5577634/29/2013 1:12:06 PMClean-MXhttp://js.users.51.la/5988624.js
5572624/29/2013 7:15:00 AMClean-MXhttp://picasa.com.awisshipping.com/bot.txt???
5571144/29/2013 6:27:25 AMClean-MXhttp://a.coughstuffs.com/ic/gplhblite63/45645/0/692b2591-65e9-44a6-b65e-81c29ff729af/applicationsetup.exe
5567264/28/2013 8:29:49 PMClean-MXhttp://www.trovaconcorsi.com/concorsi/2009/04/concorso-geometra-ato-di-im=peria/
5567254/28/2013 8:29:44 PMClean-MXhttp://www.trovaconcorsi.com/concorsi/2009/05/2-assegni-di-ricerca-univer=sita-politecnica-delle-marche-di-ancona/
5566564/28/2013 8:12:59 PMClean-MXhttp://www.exploringecuador.com/deutsch/index.htm
5558564/28/2013 10:03:53 AMClean-MXhttp://databanksurat.com/
5558124/28/2013 9:53:31 AMClean-MXhttp://files.download-at.de/cheats_einsendungen/1344264504-GTAIVsupertrainer.rar
5558114/28/2013 9:53:16 AMClean-MXhttp://files.download-at.de/dl/bws-0457.rar
5558064/28/2013 9:52:36 AMClean-MXhttp://files.download-at.de/dl/ru-w2p2t.rar
5557084/28/2013 9:34:07 AMClean-MXhttp://www.gratisgames24.de/fred/nfsmwXXLv1.rar
5557074/28/2013 9:33:53 AMClean-MXhttp://www.gratisgames24.de/fred/pdrtw15t.zip
5557064/28/2013 9:33:42 AMClean-MXhttp://www.gratisgames24.de/trainer/kq160403/fl.rar
5553954/27/2013 11:48:44 PMClean-MXhttp://www.farida.cn/company/flash/guitarsflash/d1612/d-16%2012n%281%29.html
5552794/27/2013 11:15:00 PMClean-MXhttp://mirror02.cheaters-heaven.com/FileDB/Thundagav1.02InclSourceCode.zip
5550444/27/2013 1:28:21 AMClean-MXhttp://www.yekalonhardware.com/webapp/product.asp?catalog_id=498
5547904/27/2013 12:16:34 AMClean-MXhttp://johnnyglazner.com/%3Ffeed%3Drss2%26
5547504/27/2013 12:10:30 AMClean-MXhttp://e-fashionpoint.pl/kategoria/1/strona/1
5545884/26/2013 11:25:01 PMClean-MXhttp://bowen-enterprises.com/index.html
5543514/26/2013 8:03:13 AMClean-MXhttp://www.ww.wilei.pl--www.ilei.pl/includes/jquery.js
5543474/13/2013 5:06:43 AMZeusTrackerhttp://casasxavier.com.br/site/public_html/frontend/~~filemanager~~/bot.exe
5543314/12/2013 9:00:39 PMClean-MXhttp://mega-boom.3dn.ru/news/1-0-7
5542934/12/2013 8:52:36 PMClean-MXhttp://maylamsuadaunanh.com/?ncl=dfdSPTf-Feax0MM
5542944/12/2013 8:52:55 PMClean-MXhttp://lemonstudio.ro/blog/?m=200905
5542954/12/2013 8:53:13 PMClean-MXhttp://www.volcanwines.gr/eshop/product_reviews.php?products_id=81&osCsid=85a32fd2814155ec8b1c21eaa47e17d0
5542974/12/2013 8:53:47 PMClean-MXhttp://mamaijawroclaw.home.pl/1/cerveza-preparada
5542984/12/2013 8:54:01 PMClean-MXhttp://lemonstudio.ro/blog/?m=200909
5542994/12/2013 8:54:20 PMClean-MXhttp://lemonstudio.ro/blog/?m=200910
5543004/12/2013 8:54:34 PMClean-MXhttp://maylamsuadaunanh.com/dau-nanh-gao-sua-maker-de-nghi.html
5543014/12/2013 8:54:42 PMClean-MXhttp://mamaijawroclaw.home.pl/1/clown-gang
5543024/12/2013 8:54:49 PMClean-MXhttp://malahidegroup.com/acne-marks-treatment
5543034/12/2013 8:54:59 PMClean-MXhttp://mamaijawroclaw.home.pl/1/chinese-thermos
5543044/12/2013 8:55:13 PMClean-MXhttp://lemonstudio.ro/blog/?m=200911
5543054/12/2013 8:55:32 PMClean-MXhttp://lemonstudio.ro/blog/?m=200912
5543064/12/2013 8:55:44 PMClean-MXhttp://mamaijawroclaw.home.pl/1/corak-binatang
5543074/12/2013 8:55:56 PMClean-MXhttp://mamaijawroclaw.home.pl/1/couture-outfits
5543084/12/2013 8:56:10 PMClean-MXhttp://lemonstudio.ro/blog/?m=201005
5543094/12/2013 8:56:30 PMClean-MXhttp://lemonstudio.ro/blog/?m=200901
5543104/12/2013 8:56:41 PMClean-MXhttp://malahidegroup.com/adizero-star
5543114/12/2013 8:56:57 PMClean-MXhttp://lemonstudio.ro/blog/?m=201007
5543124/12/2013 8:57:09 PMClean-MXhttp://mamaijawroclaw.home.pl/1/crazy-fun-photography
5543134/12/2013 8:57:23 PMClean-MXhttp://lemonstudio.ro/blog/?m=200902
5543144/12/2013 8:57:42 PMClean-MXhttp://lemonstudio.ro/blog/?m=200903
5543154/12/2013 8:57:53 PMClean-MXhttp://malahidegroup.com/aerox-parts
5543164/12/2013 8:58:08 PMClean-MXhttp://lemonstudio.ro/blog/?p=447
5543174/12/2013 8:58:20 PMClean-MXhttp://mamaijawroclaw.home.pl/1/croatia-photography
5543184/12/2013 8:58:34 PMClean-MXhttp://lemonstudio.ro/blog/?m=201008
5543214/12/2013 8:58:57 PMClean-MXhttp://mega-boom.3dn.ru/news/1-0-1
5543224/12/2013 8:59:07 PMClean-MXhttp://mamaijawroclaw.home.pl/1/creede-simpson
5543234/12/2013 8:59:15 PMClean-MXhttp://malahidegroup.com/amazona-loewe
5543254/12/2013 8:59:41 PMClean-MXhttp://mega-boom.3dn.ru/index/3
5543264/12/2013 8:59:48 PMClean-MXhttp://malahidegroup.com/apple-pie-liquor
5543274/12/2013 9:00:04 PMClean-MXhttp://lemonstudio.ro/blog/?p=457
5543294/12/2013 9:00:23 PMClean-MXhttp://lenworlbook.narod.ru/
5543324/12/2013 9:00:46 PMClean-MXhttp://malahidegroup.com/a-shetland-pony
5543334/12/2013 9:01:00 PMClean-MXhttp://malahidegroup.com/aviator-vector
5543364/12/2013 9:01:43 PMClean-MXhttp://malahidegroup.com/beer-with-straw
5543374/12/2013 9:01:53 PMClean-MXhttp://mcdz.euweb.cz/
5543384/12/2013 9:02:00 PMClean-MXhttp://malahidegroup.com/bare-pool-mirage
5543394/12/2013 9:02:08 PMClean-MXhttp://mega-boom.3dn.ru/news/1-0-11
5543404/12/2013 9:02:16 PMClean-MXhttp://mega-boom.3dn.ru/news/1-0-10
5543414/12/2013 9:02:23 PMClean-MXhttp://malahidegroup.com/bat-orchid
5541574/12/2013 8:25:09 PMClean-MXhttp://kumaran.narbhavee.org/2011/03/enterprise-architecture-books/
5541624/12/2013 8:25:58 PMClean-MXhttp://www.asuspoint-shop.com/index.php?cPath=29...
5541664/12/2013 8:26:40 PMClean-MXhttp://iznikteml.k12.tr/indian-turnip
5541674/12/2013 8:26:48 PMClean-MXhttp://iznikteml.k12.tr/ipsos-beach
5541684/12/2013 8:26:54 PMClean-MXhttp://iznikteml.k12.tr/lisa-hollenbeck
5541694/12/2013 8:27:01 PMClean-MXhttp://iznikteml.k12.tr/mask-of-japan
5541714/12/2013 8:27:14 PMClean-MXhttp://iznikteml.k12.tr/marlboro-lite
5541724/12/2013 8:27:20 PMClean-MXhttp://iznikteml.k12.tr/love-couple-gif
5541744/12/2013 8:27:39 PMClean-MXhttp://indianoyster.com/otherprofile.php?user=75
5541754/12/2013 8:27:54 PMClean-MXhttp://kumaran.narbhavee.org/2011/03/enterprise-architecture-web-links/
5541764/12/2013 8:28:02 PMClean-MXhttp://iznikteml.k12.tr/mudkip-face
5541784/12/2013 8:28:20 PMClean-MXhttp://www.asuspoint-shop.com/password_forgotten.php?ac...
5541804/12/2013 8:28:55 PMClean-MXhttp://www.asuspoint-shop.com/reviews.php?osCsid=4eeb9ba56cdbb6f19df45cb18c8784ab
5541814/12/2013 8:29:08 PMClean-MXhttp://iznikteml.k12.tr/melanie-adler
5541834/12/2013 8:29:24 PMClean-MXhttp://kumaran.narbhavee.org/2011/03/zachman-enterprise-architecture-framework-fundamentals/
5541844/12/2013 8:29:31 PMClean-MXhttp://iznikteml.k12.tr/michele-allen
5541854/12/2013 8:29:45 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=73
5541864/12/2013 8:29:57 PMClean-MXhttp://imalles.narod.ru/
5541874/12/2013 8:30:12 PMClean-MXhttp://iznikteml.k12.tr/old-butterfly
5541884/12/2013 8:30:21 PMClean-MXhttp://iamacustodian.com/
5541894/12/2013 8:30:33 PMClean-MXhttp://iamacustodian.com/?page_id=10
5541904/12/2013 8:30:45 PMClean-MXhttp://iznikteml.k12.tr/okami-moegami
5541914/12/2013 8:30:55 PMClean-MXhttp://kumaran.narbhavee.org/2011/03/a-comparison-of-the-top-four-enterprise-architecture-methodologies/
5541924/12/2013 8:31:10 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=18
5541944/12/2013 8:31:29 PMClean-MXhttp://iznikteml.k12.tr/paul-zimmermann
5541954/12/2013 8:31:47 PMClean-MXhttp://iamacustodian.com/?page_id=2
5541964/12/2013 8:32:02 PMClean-MXhttp://iamacustodian.com/?page_id=39
5541984/12/2013 8:32:37 PMClean-MXhttp://kumaran.narbhavee.org/2011/03/enterprise-architecture-roadmap-for-modernization/
5541994/12/2013 8:32:44 PMClean-MXhttp://iznikteml.k12.tr/pacman-new-game
5542004/12/2013 8:32:56 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=37
5542024/12/2013 8:33:17 PMClean-MXhttp://iznikteml.k12.tr/satanic-signs
5542044/12/2013 8:33:34 PMClean-MXhttp://iznikteml.k12.tr/shamsul-safie
5542064/12/2013 8:33:58 PMClean-MXhttp://in-china.cn/_notes/abote.asp?/list_(b_11.htm
5542074/12/2013 8:34:21 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=40
5542084/12/2013 8:34:39 PMClean-MXhttp://insacci.narod.ru/
5542094/12/2013 8:35:23 PMClean-MXhttp://yourcareerplan.org/BbAEKD.exe
5542124/12/2013 8:35:48 PMClean-MXhttp://www.5th3rd.com/rTxD0.exe
5542194/12/2013 8:37:42 PMClean-MXhttp://hiltonheadrealestatemlstoday.com/aS8.exe
5542214/12/2013 8:38:12 PMClean-MXhttp://brennstoffegehling.de/mv3EYAbG.exe
5542254/12/2013 8:39:22 PMClean-MXhttp://www.tjgjhb.com/product.asp?bigclassname=gjqf%D0%A0%D1%94%D0%A0%C2%AD%D0%A0%C2%98%D0%A0%D1%92
5542294/12/2013 8:40:24 PMClean-MXhttp://www.in-china.cn/_notes/abote.asp?/list_%D0%A7%C2%B0%D0%95%C2%B7_1.htm
5542304/12/2013 8:40:38 PMClean-MXhttp://www.in-china.cn/_notes/abote.asp?/list_%D0%91%D1%87%D0%A0%D0%8E_1.htm
5542324/12/2013 8:40:58 PMClean-MXhttp://mochelek.republika.pl/
5542364/12/2013 8:42:04 PMClean-MXhttp://autopflege-chemnitz.de/
5542394/12/2013 8:42:31 PMClean-MXhttp://550788.com/
5542414/12/2013 8:42:44 PMClean-MXhttp://iznikteml.k12.tr/second-fiddle
5542424/12/2013 8:42:50 PMClean-MXhttp://iznikteml.k12.tr/sks-bullpup
5542434/12/2013 8:43:03 PMClean-MXhttp://kk-cocoon.eu/de/component/content/article/1-design-effect-wrapping-films-for-individualists
5542464/12/2013 8:43:37 PMClean-MXhttp://janessafari.com/index.php/category/11-in-11-by-111111/page/3/
5542474/12/2013 8:43:50 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=42
5542484/12/2013 8:44:06 PMClean-MXhttp://in-china.cn/_notes/abote.asp?/list_(b_13.htm
5542504/12/2013 8:44:20 PMClean-MXhttp://kumaran.narbhavee.org/2011/05/startup-resources-updated-for-2011/
5542514/12/2013 8:44:30 PMClean-MXhttp://in-china.cn/_notes/abote.asp?/list_(b_18.htm
5542524/12/2013 8:44:41 PMClean-MXhttp://jeffemanuel.net/weblog/?p=93
5542534/12/2013 8:44:54 PMClean-MXhttp://kk-cocoon.eu/en/component/content/article/1-design-effect-wrapping-films-for-individualists
5542544/12/2013 8:45:08 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=41
5542554/12/2013 8:45:25 PMClean-MXhttp://in-china.cn/_notes/abote.asp?/list_(b_19.htm
5542564/12/2013 8:45:31 PMClean-MXhttp://malahidegroup.com/brisca-f2-photos
5542574/12/2013 8:45:45 PMClean-MXhttp://indianoysters.com/otherprofile.php?user=43
如您所见,肯定有带有http://....的字符串,现在当我使用搜索参数时
print(soup.find_all(text='http://'))
什么都没有打印出来。为什么找不到带有 http:// 的字符串?