这是我感兴趣的
数据按层次组织如下
<div class="clr dayItem">
<div class="clr genreHeader">Alternative Rock</div>
<div class="clr genreEvents">
<div class="clr dayEvent">
<a href="/concert/muse/houston_1339329.php" title="7:00 PM Muse - Toyota Center - TX">Muse - Toyota Center - TX - 7:00 PM
</a>
</div>
<div class="clr dayEvent">
<a href="/concert/matchbox_20/pooler_1347335.php" title="7:30 PM Matchbox 20 - Johnny Mercer Theatre">Matchbox 20 - Johnny Mercer Theatre - 7:30 PM
</a>
</div>
etc...
</div>
</div>
所以基本上页面分为两列,每列都有DayItems,其中包括genere和带有hrefs的dayEvents
我一直在尝试获取数据,但我对 xpath 完全陌生,直到今天一直在使用 Regex
正则表达式变得繁琐且过于复杂,所以我选择了 xPath
获取我使用的 DayItems:
var cl = document.DocumentNode.SelectNodes("//*[contains(concat(' ', normalize-space(@class), ' '), ' dayItem ')]");
foreach (var item in cl.Where(x=> x.Attributes.Any(p=>p.Value == "clr dayItem" && p.OriginalName=="class")))
{
/// THIS LINE FAILS
var genre = item.SelectSingleNode("//.[contains(concat(' ', normalize-space(@class), ' '), ' genre ')]");
Console.WriteLine(item.Name);
foreach (var attr in item.Attributes.Select(x => x.OriginalName + ".." + x.Value))
{
Console.WriteLine(attr);
}
}