python - 有没有办法在 python 中使用可读性（文本提取算法）和自定义算法从文本中提取链接？

Question

有没有办法在 python 中使用可读性（文本提取算法）和自定义算法从文本中提取链接？

我想找出一种提取文本正文中链接的方法。

2.) 我想以某种方式将提取的文本与原始 html 文本进行比较，以便在文章的实际正文中提取链接。

score 2 · Accepted Answer

好吧，看起来它返回了一个 BeautifulSoup 树。因此，您应该能够执行以下操作：

article = page.summary()   # Extract article using readability
article.findAll("a")       # Return a list of all links in the article

1 回答 1