How can I read the value from
<th class="class_name"> Sample Text </th>
Can any one help me in getting the string "Sample Text" from the above HTML code using python.
Thank you.
你可以使用BeautifulSoup这是我最喜欢的解析 html 的库。
from BeautifulSoup import BeautifulSoup
html = '<th class="class_name"> Sample Text </th>'
soup = BeautifulSoup(html)
print soup.th.text
正则表达式解决方案:
import re
th_regex = re.compile(r'<th\s+class="class_name">(.*?)</th>')
search_result = th_regex.search(input_string)
print(search_result and search_result.group(1) or 'not found')
注意:您需要使用?after使用非贪婪搜索,当发生.*时将停止获取字符。</th>否则你会得到整个字符串到input_string.
您可以使用minidom它来解析它。不过,我不确定您的确切需求是什么。
from xml.dom import minidom
dom = minidom.parseString(html)
for elem in dom.getElementsByTagName('th'):
    if elem.getAttribute('class') == 'class_name':
        print elem.firstChild.nodeValue
正则表达式解决方案:
import re
s = '<th class="class_name"> Sample Text </th>'
data = re.findall('<th class="class_name">(.*?)</th>', s)
print data