0

我已经提取了一组关于同一主题的网址。我想找到它们之间的链接,以便我可以使用 python 形成图形。url 或网站将表示为节点和它们之间的链接,表示为边缘。请帮我..

4

1 回答 1

0

You can follow this simple approach -

Parse web pages using BeautifulSoup[1] and keep anchor tags' href property stored in a nested list(assume lst). So, if a web page(assume web1) links to 3 other web pages(assume with links href1, href2, href3), then -

lst['web1'][0] = 'href1'
lst['web1'][1] = 'href2'
lst['web1'][2] = 'href3'

Similarly parse other web pages and created lists for them. This web1 can be hrefx for webx. Hope you got the idea.

[1] http://www.crummy.com/software/BeautifulSoup/

于 2012-12-31T10:26:57.657 回答