python-2.7 - 如何使用 read_html 读取具有 2 个类属性的表？

Question

我试图在以下方面刮取基金的价格：

http://www.prudential.com.hk/PruServlet?module=fund& purpose=searchHistFund&fundCd=JAS_U

但是表中行的类属性不同，有“class”：“fundPriceCell1”和“fundPriceCell2”：

<tr>
<td align="center" class="fundPriceCell1">08/11/2013</td><td align="center" class="fundPriceCell1">118.2500</td><td align="center" class="fundPriceCell1">118.2500</td>
</tr>
<tr>
<td align="center" class="fundPriceCell2">07/11/2013</td><td align="center" class="fundPriceCell2">118.9800</td><td align="center" class="fundPriceCell2">118.9800</td>
</tr>

如何刮桌子？这是错误的，但如何解决它？

import pandas as pd
import requests
url = 'http://www.prudential.com.hk/PruServlet?module=fund&purpose=searchHistFund&fundCd=JAS_U'
tables = pd.read_html(requests.get(url).text, attrs={"class":"fundPriceCell1"})

score 1 · Accepted Answer

我认为您可以传递已编译的正则表达式，并且此语法将匹配两个class属性：

import re
tables = pd.read_html(requests.get(url).text, attrs={"class":re.compile("fundPriceCell\d+")})

python-2.7 - 如何使用 read_html 读取具有 2 个类属性的表？

1 回答 1

Related

Reference