我已经提取了一组关于同一主题的网址。我想找到它们之间的链接,以便我可以使用 python 形成图形。url 或网站将表示为节点和它们之间的链接,表示为边缘。请帮我..
问问题
243 次
1 回答
0
You can follow this simple approach -
Parse web pages using BeautifulSoup
[1] and keep anchor tags' href
property stored in a nested list(assume lst). So, if a web page(assume web1) links to 3 other web pages(assume with links href1, href2, href3), then -
lst['web1'][0] = 'href1'
lst['web1'][1] = 'href2'
lst['web1'][2] = 'href3'
Similarly parse other web pages and created lists for them. This web1 can be hrefx for webx. Hope you got the idea.
于 2012-12-31T10:26:57.657 回答