0

我正在尝试在我的 aws 实例上运行一个简单的脚本。相同的脚本在 Windows 7 和 ubuntu ( python27 ) 上运行良好。但是当我在我的服务器上运行我的脚本时,网站将我重定向到一个错误页面,上面写着“你必须在浏览器上启用 js”。

到目前为止,我尝试了很多东西(用户代理、重定向处理程序、机械化分机)。我仅通过以下域获得这些重定向。所有其他启用 js 的网站都运行良好。

你有什么主意吗?

import urllib2
req = urllib2.Request("http://www.sahibinden.com/ilan/emlak-konut-satilik-karatepe-emlak-tan-zumrutevler-de-2-plus1-ara-kat-luks-daire-186413632/detay")
response = urllib2.urlopen(req)
the_page = response.read()
print the_page

编辑:原来网页阻止了我的服务器IP。感谢帮助

4

1 回答 1

1

您的代码中没有错误。

你需要一个 js 解释器。

urllib2 只是获取原始数据,不解释页面中的 js 代码。

你可以检查一下:如何用 Python 解释 JavaScript


此外,它适用于以下代码:

import requests
session = requests.Session()
session.get('http://www.sahibinden.com/ilan/emlak-konut-satilik-karatepe-emlak-tan-zumrutevler-de-2-plus1-ara-kat-luks-daire-186413632/detay').content.decode('utf8')

它返回大量的 html 代码,如下所示:

<li class="">\n                            Çamaşır Makinesi</li>\n                    <li class="">\n                            Çamaşır Odası</li>\n                    <li class="selected">\n                            Çelik Kapı</li>\n                    <li class="">\n                            Şofben</li>\n                    <li class="">\n                            Şömine</li>\n                    </ul>\n            <h3>Dış Özellikler</h3>\n                <ul>\n                    <li class="">\n                            Asansör</li>\n                    <li class="">\n                            Engelliye Uygun</li>\n                    <li class="">\n                            Güvenlik</li>\n                    <li class="selected">\n                            Hidrofor</li>\n                    <li class="selected">\n                            Isı Yalıtım</li>\n                    <li class="">\n                            Jeneratör</li>\n                    <li class="selected">\n                            Kablo TV - Uydu</li>\n                    <li class="">\n                            Kapalı Garaj</li>\n                    <li class="">\n                            Kapıcı</li>\n                    <li class="">\n                            Kreş</li>\n                    <li class="">\n                            Otopark</li>\n                    <li class="">\n                            Oyun Parkı</li>\n                    <li class="selected">\n                            Ses Yalıtımı</li>\n                    <li class="">\n                            Siding</li>\n                    <li class="">\n                            Spor Alanı</li>\n                    <li class="selected">\n                            Su Deposu</li>\n                    <li class="">\n                            Tenis Kortu</li>\n                    <li class="">\n                            Yangın Merdiveni</li>\n                    <li class="">\n                            Yüzme Havuzu (Açık)</li>\n                    <li class="">\n                            Yüzme Havuzu (Kapalı)</li>\n                    </ul>\n            <h3>Muhit</h3>\n                <ul>\n                    <li class="selected">\n                            Alışveriş Merkezi</li>\n                    <li class="">\n                            Belediye</li>\n                    <li class="selected">\n                            Cami</li>\n                    <li class="">\n                            Cemevi</li>\n                    <li class="">\n                            Denize Sıfır</li>\n                    <li class="selected">\n                            Eczane</li>\n                    <li class="">\n                            Eğlence Merkezi</li>\n                    <li class="">\n                            Fuar</li>\n                    <li class="selected">\n                            Hastane</li>\n                    <li class="">\n                            Havra</li>\n                    <li class="">\n                            Kilise</li>\n                    <li class="">\n                            Lise</li>\n                    <li class="selected">\n                            Market</li>\n                    <li class="selected">\n                            Park</li>\n                    <li class="">\n                            Polis Merkezi</li>\n                    <li class="selected">\n                            Sağlık Ocağı</li>\n                    <li class="selected">\n                            Semt Pazarı</li>\n                    <li class="">\n                            Spor Salonu</li>\n                    <li class="">\n                            Üniversite</li>\n                    <li class="selected">\n                            İlköğretim</li>\n                    <li class="">\n                            İtfaiye</li>\n                    <li class="">\n                            Şehir Merkezi</li>\n                    </ul>\n            <h3>Ulaşım</h3>\n                <ul>\n                    <li class="">\n                            Anayol</li>\n                    <li class="">\n                            Boğaz Köprüleri</li>\n                    <li class="selected">\n                            Cadde</li>\n                    <li class="">\n                            Deniz Otobüsü</li>\n                    <li class="">\n                            Dolmuş</li>\n                    <li class="selected">\n                            E-5</li>\n                    <li class="">\n                            Havaalanı</li>\n                    <li class="">\n                            Marmaray</li>\n                    <li class="selected">\n                            Metro</li>\n                    <li class="">\n                            Metrobüs</li>\n                    <li class="selected">\n                            Minibüs</li>\n                    <li class="">\n                            Otobüs Durağı</li>\n                    <li class="">\n                            Sahil</li>\n                    <li class="">\n                            TEM</li>\n                    <li class="">\n                            Tramvay</li>\n                    <li class="">\n                            Tren İstasyonu</li>\n                    <li class="">\n                            İskele</li>\n                    </ul>\n            <h3>Manzara</h3>\n                <ul>\n                    <li class="">\n                            Boğaz</li>\n                    <li class="">\n                            Deniz</li>\n                    <li class="">\n                            Doğa</li>\n                    <li class="">\n                            Göl</li>\n                    <li class="selected">\n                            Şehir</li>\n                    </ul>\n            <h3>Konut Tipi</h3>\n                <ul>\n                    <li class="">\n                            Ara Kat Dubleks</li>\n                    <li class="">\n                            Bahçe Dubleksi</li>\n                    <li class="">\n                            Bahçe Katı</li>\n                    <li class="">\n                            Bahçeli</li>\n                    <li class="">\n                            Müstakil Girişli</li>\n                    <li class="">\n                            Tripleks</li>\n                    <li class="">\n                            Çatı Dubleksi</li>\n                    </ul>\n            </div>\n    </div>\n<script type="text/javascript">\n    var bannerZoneId = "101";\n</script>\n\n<div class="uiBox">\n        <div class="uiBoxTitle">\n            <h3>Hadi Taşının!</h3>\n        </div>\n        <div class="uiBoxContainer" id="adHelperBoxMov">\n            <div class="helper">\n                <ul>\n                    <script type="text/javascript">\n                        var classifiedFooterZone9 = "&amp;PAGE_NAME=ilan_detay_zone_9&amp;CATEGORY_ID=16633&amp;PARENT_ID=16623&amp;CATEGORY_LEVEL_0=3518&amp;CATEGORY_LEVEL_1=3613&amp;CATEGORY_LEVEL_2=16623&amp;CATEGORY_LEVEL_3=16633&amp;CATEGORY_LEVEL_4=0&amp;CATEGORY_LEVEL_5=0&amp;CATEGORY_LEVEL_6=0&amp;LANGUAGE=tr&amp;CITY_ID=34&amp;DISTRICT_ID=2177&amp;TOWN_ID=446&amp;QUARTER_ID=23171" + cAttributes;\n                        var classifiedFooterZone10 = "&amp;PAGE_NAME=ilan_detay_zone_10&amp;CATEGORY_ID=16633&amp;PARENT_ID=16623&amp;CATEGORY_LEVEL_0=3518&amp;CATEGORY_LEVEL_1=3613&amp;CATEGORY_LEVEL_2=16623&amp;CATEGORY_LEVEL_3=16633&amp;CATEGORY_LEVEL_4=0&amp;CATEGORY_LEVEL_5=0&amp;CATEGORY_LEVEL_6=0&amp;LANGUAGE=tr&amp;CITY_ID=34&amp;DISTRICT_ID=2177&amp;TOWN_ID=446&amp;QUARTER_ID=23171" + cAttributes;\n                        var classifiedFooterZone11 = "&amp;PAGE_NAME=ilan_detay_zone_11&amp;CATEGORY_ID=16633&amp;PARENT_ID=16623&amp;CATEGORY_LEVEL_0=3518&amp;CATEGORY_LEVEL_1=3613&amp;CATEGORY_LEVEL_2=16623&amp;CATEGORY_LEVEL_3=16633&amp;CATEGORY_LEVEL_4=0&amp;CATEGORY_LEVEL_5=0&amp;CATEGORY_LEVEL_6=0&amp;LANGUAGE=tr&amp;CITY_ID=34&amp;DISTRICT_ID=2177&amp;TOWN_ID=446&amp;QUARTER_ID=23171" + cAttributes;\n                        var classifiedFooterZone12 = "&amp;PAGE_NAME=ilan_detay_zone_12&amp;CATEGORY_ID=16633&amp;PARENT_ID=16623&amp;CATEGORY_LEVEL_0=3518&amp;CATEGORY_LEVEL_1=3613&amp;CATEGORY_LEVEL_2=16623&amp;CATEGORY_LEVEL_3=16633&amp;CATEGORY_LEVEL_4=0&amp;CATEGORY_LEVEL_5=0&amp;CATEGORY_LEVEL_6=0&amp;LANGUAGE=tr&amp;CITY_ID=34&amp;DISTRICT_ID=2177&amp;TOWN_ID=446&amp;QUARTER_ID=23171" + cAttributes;\n\n                        getBanner(bannerZoneId, classifiedFooterZone9);\n                        getBanner(bannerZoneId, classifiedFooterZone10);\n                        getBanner(bannerZoneId, classifiedFooterZone11);\n                        getBanner(bannerZoneId, classifiedFooterZone12);\n                    </script>\n                </ul>\n            </div>\n       

您可以使用geturl()方法来确定您的 url 是否被重定向(因为网站可能真的会根据您的服务器的 ip 等生成您收到的消息)。如果它真的被重定向了,你可以阻止它或做一些其他的事情。请参阅如何防止 Python 的 urllib(2) 跟随重定向

于 2015-02-16T18:56:57.973 回答