python-3.x - 使用 PYTHON 仅从包含多个 tumblr 帖子的 URL 中提取第一个帖子内容

翻译自：https://stackoverflow.com/questions/52002802 2018-08-24T10:45:41.020

50 次

我正在尝试使用 python3 中的报纸包从给定的输入 URL 中仅提取实际内容/文本。我已经成功这样做了，但是我的一个 URL 包含同一页面中的多个 tumblr 帖子。

在下面的 URL 中，我只想要第一篇文章的内容，即以“卡纳塔克邦议会选举 2018 年结果接近被称为投票计数正在周二进行中”开头的段落，

https://poonamparekh.tumblr.com/post/173920050130/karnataka-election-results-modi-rallies-set-to

在我从上面的 URL 而不是第一篇文章中提取内容的工作中，我得到了第 6 篇文章内容作为我的输出。但这不是我需要的。我需要第一篇文章作为我的输出。谁能帮助我实现这一目标？

这是我的代码：

from newspaper import Article

url="https://poonamparekh.tumblr.com/post/173920050130/karnataka-election-results-modi-rallies-set-to"
print(url)
article = Article(url, language='en')
article.download()
article.download_state
print('articlee_state : ',article.download_state)

if article.download_state == 2:
  try:
    article.parse()
    result=article.text[0]
    print(result[:150])
    if result=='':
      print('----MESSAGE : No description written for this post')
   except Exception as e:
    print(e)

python-3.x - 使用 PYTHON 仅从包含多个 tumblr 帖子的 URL 中提取第一个帖子内容

0 回答 0

Related

Reference