我是 csquery 的新手,在抓取 html 时遇到问题,如下所示:
<li id="Ingredient">
<span id="Amount" class="ingredient-amount">1 pound</span>
<span id="Name" class="ingredient-name">sweet Italian Sausage
</li>
<li id="Ingredient">
<span id="Amount" class="ingredient-amount">3/4 pound</span>
<span id="Name" class="ingredient-name">lean ground beef</span>
</li>
我想取出跨度标签内的文本并将它们格式化如下:
1 pound sweet Italian sausage
3/4 pound lean ground beef
这是我下面的代码:
for (int i = 0; i < dom.Select("#Ingredient").Length; ++i) {
if (dom.Select("#Ingredient span#Amount")[i] != null)
Console.WriteLine(dom.Select("#Ingredient span#Amount")[i].InnerHTML + " ");
if (dom.Select("#Ingredient span#Name")[i] != null)
Console.WriteLine(dom.Select("#Ingredient span#Name")[i].InnerHTML);
Console.WriteLine(Environment.NewLine);
}
它适用于上面的 html,但是当缺少其中一个跨度时会出现问题。例如,如果<span id="lblIngName" class="ingredient-name">sweet Italian sausage</span>
html 中缺少,我的代码将返回:
1 pound lean ground beef
3/4 pound
如您所见,lean ground beef
价格上涨了。我希望它3/4 pound
不惜一切代价说。并且1 pound
可以一个人呆着。我怎样才能做到这一点?我尝试了很多方法,但没有奏效。所以我想做类似的事情:
for each "#Ingredient" write the "#Amount" if it exists or "#Name" if it exists. Do not bother with things on another Ingredient