我使用 JSoup 解析来自该网站的数据:
http://www.skore.com/en/soccer/england/premier-league/results/all/
我得到球队的名字和结果,我还需要得到得分手的名字(它在结果下)。
我正在尝试但遇到了麻烦,因为它不在 HTML 中。
可能吗?如果是怎么办?
我使用 JSoup 解析来自该网站的数据:
http://www.skore.com/en/soccer/england/premier-league/results/all/
我得到球队的名字和结果,我还需要得到得分手的名字(它在结果下)。
我正在尝试但遇到了麻烦,因为它不在 HTML 中。
可能吗?如果是怎么办?
得分者信息是在 AJAX 请求之后获取的(当您单击得分链接时会发生该请求)。您必须提出这样的请求并解析结果。
例如,以第一场比赛(曼联 1x2 曼城)为例,它的标签是:
<a data-y="r1-1229442" data-v="england-premierleague-manchesterunited-manchestercity-13april2013" style="cursor: pointer;">1 - 2</a>
Take data-y
,删除前导r
并发出 get 请求:
http://www.skore.com/en/scores/soccer/id/<DATA-Y_HERE>?fmt=html
如:http ://www.skore.com/en/scores/soccer/id/1-1229442?fmt=html 。然后解析结果。
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
public class ParseScore {
public static void main(String[] args) throws Exception {
Document doc = Jsoup.connect("http://www.skore.com/en/soccer/england/premier-league/results/all/").get();
System.out.println("title: " + doc.title());
Elements dls = doc.select("dl");
for (Element link : dls) {
String id = link.attr("id");
/* check if then it is a game <dl> */
if (id != null && id.length() > 3 && "rid".equals(id.substring(0, 3))) {
System.out.println("Game: " + link.text());
String idNoRID = id.replace("rid", "");
// String idNoRID = "1-1229442";
String scoreURL = "http://www.skore.com/en/scores/soccer/id/" + idNoRID + "?fmt=html";
Document docScore = Jsoup.connect(scoreURL).get();
Elements trs = docScore.select("tr");
for (Element tr : trs) {
Elements spanGoal = tr.select("span.goal");
/* only enter if there is a goal */
if (spanGoal.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoal.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName);
}
Elements spanGoalPenalty = tr.select("span.goalpenalty");
/* only enter if there is a goal */
if (spanGoalPenalty.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoalPenalty.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (penalty)");
}
Elements spanGoalOwn = tr.select("span.goalown");
/* only enter if there is a goal */
if (spanGoalOwn.size() > 0) {
Elements score = tr.select("td.score");
String playerName = spanGoalOwn.get(0).text();
String currentScore = score.get(0).text();
System.out.println("\t\tGOAL: " + currentScore + ": " + playerName + " (own goal)");
}
}
}
}
}
}
输出:
title: Skore : Premier League, England - Soccer Results (All)
Game: F T Arsenal 3 - 1 Norwich
GOAL: 0 - 1: Michael Turner
GOAL: 1 - 1: Mikel Arteta (penalty)
GOAL: 2 - 1: Sébastien Bassong (own goal)
GOAL: 3 - 1: Lukas Podolski
Game: F T Aston Villa 1 - 1 Fulham
GOAL: 1 - 0: Charles N´Zogbia
GOAL: 1 - 1: Fabian Delph (own goal)
Game: F T Everton 2 - 0 Queens Park Rangers
GOAL: 1 - 0: Darron Gibson
GOAL: 2 - 0: Victor Anichebe
Game: F T Reading 0 - 0 Liverpool
Game: F T Southampton 1 - 1 West Ham
GOAL: 1 - 0: Gaston Ramirez
GOAL: 1 - 1: Andrew Carroll
Game: F T Manchester United 1 - 2 Manchester City
GOAL: 0 - 1: James Milner
...
使用了 JSoup 1.7.1。如果使用 Maven,请将其添加到您的pom.xml
:
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.7.1</version>
</dependency>