2

How can I read the value from

<th class="class_name"> Sample Text </th>

Can any one help me in getting the string "Sample Text" from the above HTML code using python.

Thank you.

4

4 回答 4

5

你可以使用BeautifulSoup这是我最喜欢的解析 html 的库。

from BeautifulSoup import BeautifulSoup
html = '<th class="class_name"> Sample Text </th>'
soup = BeautifulSoup(html)
print soup.th.text
于 2013-03-01T07:29:25.877 回答
0

正则表达式解决方案:

import re

th_regex = re.compile(r'<th\s+class="class_name">(.*?)</th>')
search_result = th_regex.search(input_string)

print(search_result and search_result.group(1) or 'not found')

注意:您需要使用?after使用非贪婪搜索,当发生.*时将停止获取字符。</th>否则你会得到整个字符串到input_string.

于 2013-03-01T07:19:44.527 回答
0

您可以使用minidom它来解析它。不过,我不确定您的确切需求是什么。

from xml.dom import minidom
dom = minidom.parseString(html)
for elem in dom.getElementsByTagName('th'):
    if elem.getAttribute('class') == 'class_name':
        print elem.firstChild.nodeValue
于 2013-03-01T07:20:11.323 回答
0

正则表达式解决方案:

import re

s = '<th class="class_name"> Sample Text </th>'
data = re.findall('<th class="class_name">(.*?)</th>', s)
print data
于 2013-03-01T07:33:26.497 回答