1

I am trying to fetch some data from a site using urllib2 and I get different HTML page than what I see when I click on view source (some elements are swapped in divs, some elements in divs are totally not there etc)

example: try this python script

import urllib2

markup = urllib2.urlopen("http://www.ebay.com/sch/i.html?_trksid=p5197.m570.l1313&_nkw=harry+potter&_sacat=0&_from=R40").read()

and some samples tags from the above are (this is wrong, this is not how data is shown on page, checked it with firebug too)

<div class="catsgroup">
    <div class="cat-t"><a href="http://www.ebay.com/sch/Books-/267/i.html?_from=R40&amp;_nkw=harry+potter">Books</a><span class="cnt">&nbsp;(7,777)</span></div>
    <div class="cat-c">
        <div class="default">
            <div class="cat-link"><a href="http://www.ebay.com/sch/Children-Young-Adults-/279/i.html?_from=R40&amp;_nkw=harry+potter">Children &amp; Young Adults</a><span class="cnt">&nbsp;(1,999)</span></div> 
            <div class="cat-link"><a href="http://www.ebay.com/sch/Nonfiction-/378/i.html?_from=R40&amp;_nkw=harry+potter">Nonfiction</a><span class="cnt">&nbsp;(2,414)</span></div> 
            <div class="cat-link"><a href="http://www.ebay.com/sch/Fiction-Literature-/377/i.html?_from=R40&amp;_nkw=harry+potter">Fiction &amp; Literature</a><span class="cnt">&nbsp;(1,461)</span></div> 
            **<div class="cat-link"><a href="http://www.ebay.com/sch/Antiquarian-Collectible-/29223/i.html?_from=R40&amp;_nkw=harry+potter">Antiquarian &amp; Collectible</a><span class="cnt">&nbsp;(508)</span></div>**
        </div>
    </div>
</div>

The last line with the ** doesn't belong to that tag in view source, but it does in curl/wget/urllib2

same snippet from view source (This is actually how data is viewed on page)

<div class="catsgroup">
    <div class="cat-t"><a href="http://www.ebay.com/sch/Books-/267/i.html?_from=R40&amp;_nkw=harry+potter">Books</a><span class="cnt">&nbsp;(4,358)</span></div>
    <div class="cat-c">
        <div class="default">
            <div class="cat-link"><a href="http://www.ebay.com/sch/Children-Young-Adults-/279/i.html?_from=R40&amp;_nkw=harry+potter">Children &amp; Young Adults</a><span class="cnt">&nbsp;(1,334)</span></div> 
            <div class="cat-link"><a href="http://www.ebay.com/sch/Nonfiction-/378/i.html?_from=R40&amp;_nkw=harry+potter">Nonfiction</a><span class="cnt">&nbsp;(1,298)</span></div> 
            <div class="cat-link"><a href="http://www.ebay.com/sch/Fiction-Literature-/377/i.html?_from=R40&amp;_nkw=harry+potter">Fiction &amp; Literature</a><span class="cnt">&nbsp;(710)</span></div> 
        </div>
    </div>
</div>

Any help with whats going wrong here and how to get correct html as shown in view source is appreciated.

Thanks in advance

4

0 回答 0