3

HTML 代码是这样的:

    <div id="wrap">
    <div id="content">

    <h1>head</h1>
    <ul class="jobpara">
    <li class="floatl"><span>time:&lt;/span>2013-08-13</li> 
        <li class="floatl"><span>place:&lt;/span>new york</li> 
        <li class="floatl"><span>source </span>www.goole.com</li> 
    </ul>
    </div>
</div>

如何获取介于Python sgmllib 或解析器<div>之间的内容?</div>

4

1 回答 1

0
from bs4 import BeautifulSoup
import urllib2

url="http://some-website.com/"
page = urllib2.urlopen(url)
soup = BeautifulSoup(page.read())

build_form = {}

for input_field in soup.findAll('input'):
    if input_field['type'] in ('hidden', 'text', 'password', 'submit', 'image'):
        if input_field.has_attr('name'):
            value = ''
            if input_field.has_attr('value'):
                value=input_field['value']
            build_form[input_field['name']] = value

print build_form

This is an example of how to use beautifulsoup giving you "data within" an object or all the objects of a certain kind.

于 2013-08-13T12:11:37.580 回答