0

我试图解析论坛 news.ycombinator.com 上的评论线程。但是,在查看 html 之后,似乎没有嵌套注释的层次结构。这将使解析变得非常困难。例如,这是一个父评论及其子评论:

<!-- This part below draws the upvote/downvote images -->
<table border=0><tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td><td valign=top><center><a id=up_4241971 href="vote?for=4241971&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4241971></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">


<!-- This part below is user/time and permalink info for a parent comment -->
<span class="comhead"><a href="user?id=JshWright">JshWright</a> 7 hours ago  | <a href="item?id=4241971">link</a></span></div><br>


<!-- This part below is actual Comment -->
<span class="comment"><font color=#000000>I just got my Verizon Galaxy S3, and ordered the 20-pack of NFC tags offered by <a href="http://tagsfordroid.com" rel="nofollow">http://tagsfordroid.com</a><p>I think I know what my Dad felt like when he got his first label printer... Within days it seemed like every object in his office was labeled...<p>I've got a tag in my car to automatically send my wife a "Headed home" SMS, a tag on my night stand to toggle between 'night' (silent) and 'day' (loud) volume settings, a tag by my back door to launch CardioTrainer when I go out for a run (this one may have crossed the "I've run out of ideas" line...). I'm using the keychain tag to dial a response number for the fire department I'm a member of.</font></span><p><font size=1><u><a href="reply?id=4241971&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr>


<!-- This part below is upvote/downvote arrow for child of parent -->
<tr><td><table border=0><tr><td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td><td valign=top><center><a id=up_4242025 href="vote?for=4242025&dir=up&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34"><img src="http://ycombinator.com/images/grayarrow.gif" border=0 vspace=3 hspace=2></a><span id=down_4242025></span></center></td><td class="default"><div style="margin-top:2px; margin-bottom:-10px; ">

<!-- This part has user/time/permalink for child comment -->
<span class="comhead"><a href="user?id=msbmsb">msbmsb</a> 7 hours ago  | <a href="item?id=4242025">link</a></span></div><br>

<!-- This part is the content of the  child comment -->
<span class="comment"><font color=#000000>I did the same thing. Tag next to the entry-way light switch for changing to an "at-home" profile, tag next to the bed for switching between night mode and morning mode, tag at work, keychain tag for switching between car mode and quiet mode.<p>And profile switching is just the basics. You can have a tag that connects guests' NFC-enabled phones to your wifi without having to hand out the password, for instance.<p>NFC task launcher + tasker is an amazing combination that opens up all kinds of possibilities.</font></span><p><font size=1><u><a href="reply?id=4242025&whence=%69%74%65%6d%3f%69%64%3d%34%32%34%31%37%38%34">reply</a></u></font></td></tr></table></td></tr><tr><td>

那么黑客新闻是如何存储评论的层次结构的,我在抓取他们的数据时如何复制它呢?

4

1 回答 1

2

在表格中,缩进由图像标签完成:

...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=0></td>...
...<td><img src="http://ycombinator.com/images/s.gif" height=1 width=40></td>...

大概您会阅读并解析这些内容。可以通过保留width值的内部堆栈来重建表示的实际线程。

于 2012-07-14T05:12:14.923 回答