我正在尝试从此页面获取所有评论信息(http://www.amazon.com/Learning-Java-Patrick-Niemeyer/dp/1449319246%3FSubscriptionId%3DAKIAIZJQKUHUCXRLH6MQ%26tag%3Dyuplayit-20%26linkCode%3Dxm2%26camp %3D2025%26creative%3D165953%26creativeASIN%3D1449319246),标签内的文本,<div class=“drkgry”>....</div>
但它总是显示返回[]
。我不知道发生了什么。
Python:
import bs4 from BeautifulSoup
data = open("example_1.html").read()
soup = BeautifulSoup(data)
soup.find_all("div",class="drkgry")
我也尝试过soup.findall("div",class="drkgry"), soup.find_all('div', attrs ={'class':'drkgry'}),
,但它们只是不起作用。
我要抓取的数据源:
</div> <div class="txtsmall mt4 fvavp"><span class="inlineblock formatVariation"><span class="gr3 gry formatKey">Format:</span><span class="formatValue">Paperback</span></span></div> <div class="mt9 reviewText">
<div class="drkgry">
Learning Java (Fourth Edition) is book for Java practitioner as reference book. This covers lot of topics.<br><br>This is an excellent book for someone who knows basics of programming. This book is not beginners. This book lacks examples and exercises which may disappoint few people.<br><br>Book has 24 chapters covering almost all of basic Java. The chapter one talks about historical aspects. Second chapter is brief introduction of java but it assumes that reader is aware of programming, OOP, threading etc which is difficult for any beginner.
</div>
</div> <div class="clearboth txtsmall gt9 vtStripe"> <div class="fl cmt">
有人帮我解决问题吗?