python - 用 Beautiful Soup 解析 html 表单输入标签

Question

好吧，我需要从“输入”中解析 html 表单，我需要提取类型为“文本”的表单以及任何不是文本的表单。

我有这个代码：

from BeautifulSoup import BeautifulSoup as beatsop

html_data = open("forms.html")

def html_parser(html_data)
    html_proc = beatsop(html_data)
    #We extract the text inputs.
    txtinput = html_proc.findAll('input', {'type':'text'})
    #We extract the any kind of input that is not text.
    listform = ["radio", "checkbox", "password", "file", "image", "hidden"]
    otrimput = html_proc.findAll('input', {'type':listform})

html_parser(html_data)

我将它与本地文档一起使用，但您可以使用 urllib 请求任何带有表单的网页。现在，问题是，我需要提取非文本输入表单的“值”标签，以及文本输入表单的“名称”标签。有谁知道我该怎么做？

谢谢！

score 3 · Accepted Answer

要访问元素的属性，请使用element['attribute'].

from BeautifulSoup import BeautifulSoup as beatsop


def html_parser(html_data):
    html_proc = beatsop(html_data)
    #We extract the text inputs.
    txtinput = html_proc.findAll('input', {'type':'text'})
    listform = ["radio", "checkbox", "password", "file", "image", "hidden"]
    otrimput = html_proc.findAll('input', {'type': listform})

    print('Text input names:')
    for elem in txtinput:
        print(elem['name'])

    print('Non-text input values:')
    for elem in otrimput:
        value = elem.get('value')
        if value:
            print(value)
        else
            print('{} has no value'.format(elem))

with open("forms.html") as html_data:
    html_parser(html_data)

python - 用 Beautiful Soup 解析 html 表单输入标签

1 回答 1

Related

Reference