我一直在尝试使用 jSoup 使用此代码。这个想法是从这个页面中提取一个电影时间表:
http://www.blitzmegaplex.com/en/schedule_movie.php?id=MOV1970
到目前为止,我只能单独提取电影院的名称。因为它被标记了一个特定的类名(“separator2”)。而其余的被命名为“分隔符”。
我正在尝试使用 for 循环建立以下步骤:对于 TABLE 中的每一行:
- 获取电影名称
- 跳过它下面的一行(从步骤 #1 开始的行)。
- 使用名为“分隔符”的类获取第二个
- 从它下面的所有中获取第二个(步骤#3中的行)。直到它到达包含名为“separator2”的类的下一行
- 重复该过程,直到处理完所有行。
有人可以建议我如何进行吗?或者也许有更好的建议?
谢谢。
到目前为止我的代码:
public void getMovieSchedule(String movieUrl) throws IOException
{
//URL url = new URL(movieUrl);
//Document doc = Jsoup.parse(url, 3000);
//Element table = doc.select("table[div=scheduletbl]").first();
//Iterator<Element> ite = table.select("tr").iterator();
//ite.next(); // Skip the first row.
// Actual content
//print(ite.next().text());
*** CODE ABOVE DOES NOT WORK ***
//final String urlSchedule = "http://www.blitzmegaplex.com/en/schedule_movie.php?id=MOV1970";
Document doc = Jsoup.connect(movieUrl).get();
Elements div = doc.select("div.panelbox");
for(Element child : div)
{
Elements table = child.select("table");
Elements row = table.select("tr"); // The actual content.
for (Element a: row)
{
Elements cinemaName = a.select("td.separator2");
print(cinemaName.text().toString());
}
}
}
要提取的 HTML(部分代码省略):
<table width="95%" border="0" cellpadding="2" cellspacing="0" id="scheduletbl">
<tbody>
<tr>
<td colspan="3" class="separator2"><strong>BLITZMEGAPLEX - PARIS VAN JAVA, BANDUNG</strong></td>
</tr>
<tr>
<td colspan="3"><img src="../img/ico_rss_schedule_white.gif" width="16" height="16" hspace="5" align="left"><strong><a href="../rss/schedule.php" class="navlink">RSS- Paris van Java</a></strong></td>
</tr>
<tr>
<td class="separator"> </td>
<td colspan="2" class="separator">TUESDAY, 05 NOVEMBER 2013</td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
10:30
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0100&movie=MOV1970&showtime=10:30&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
13:15
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0100&movie=MOV1970&showtime=13:15&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
16:00
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0100&movie=MOV1970&showtime=16:00&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
18:45
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0100&movie=MOV1970&showtime=18:45&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
21:30
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0100&movie=MOV1970&showtime=21:30&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td colspan="3" class="separator2"><strong>BLITZMEGAPLEX - GRAND INDONESIA, JAKARTA</strong></td>
</tr>
<tr>
<td colspan="3"><img src="../img/ico_rss_schedule_white.gif" width="16" height="16" hspace="5" align="left"><strong><a href="../rss/schedule.php" class="navlink">RSS- Grand Indonesia</a></strong></td>
</tr>
<tr>
<td class="separator"> </td>
<td colspan="2" class="separator">TUESDAY, 05 NOVEMBER 2013</td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
10:45
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0200&movie=MOV1970&showtime=10:45&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
13:30
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0200&movie=MOV1970&showtime=13:30&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
16:15
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0200&movie=MOV1970&showtime=16:15&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
19:00
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0200&movie=MOV1970&showtime=19:00&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
<tr>
<td class="separator"> </td>
<td width="20%" class="separator" rel="2D">
21:45
</td>
<td width="30%" class="separator">
<a href="https://www.blitzmegaplex.com/olb/seats.php?showdate=2013-11-05&cinema=0200&movie=MOV1970&showtime=21:45&suite=N&movieformat=2D" class="navlink" target="_blank">Buy Tickets</a></td>
</tr>
... MORE <tr> here ...
</tbody></table>