我有一个巨大的 HTML 表(大约 500,000 行),需要将其转换为 JSON 文件。该表如下所示:
<table>
<tr>
<th>Id</th>
<th>Timestamp</th>
<th>Artist_Name</th>
<th>Tweet_Id</th>
<th>Created_at</th>
<th>Tweet</th>
<th>User_name</th>
<th>User_Id</th>
<th>Followers</th>
</tr>
<tr>
<td>1</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034567793442816</td>
<td>Fri Jun 07 15:59:48 +0000 2013</td>
<td>So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...</td>
<td>Nicole Barrett</td>
<td>33831594</td>
<td>62</td>
</tr>
<tr>
<td>2</td>
<td>2013-06-07 16:00:17</td>
<td>Kelly Rowland</td>
<td>343034476395368448</td>
<td>Fri Jun 07 15:59:27 +0000 2013</td>
<td>RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…</td>
<td>A.J.</td>
<td>24193447</td>
<td>340</td>
</tr>
我想创建一个看起来像这样的 JSON 文件:
{'data': [
{
'text': 'So has @MissJia already discussed this Kelly Rowland Dirty Laundry song? I ain't trying to go all through her timelime...',
'id': 1,
'tweet_id': 343034567793442816
},
{
'text': 'RT @UrbanBelleMag: While everyone waits for Kelly Rowland to name her abusive ex, don't hold your breath. But she does say he's changed: ht…',
'id': 2,
'tweet_id': 343034476395368448
}
]}
也许包括更多的变量,但这应该是自我解释的。
我已经研究了几个选项,但大多数情况下我的 HTML 表太大了。我看到很多人推荐 jQuery。考虑到我的桌子的大小,这对我有意义吗?如果有合适的 Python 选项,我会非常赞成,因为到目前为止我的大部分代码都是用 Python 编写的。