python - 在文件中查找链接，不断重复相同的链接

Question

我对 Python 有点陌生，但我学习了 HS 级别的 Java 课程。我正在尝试编写一个 Python 脚本，它将获取我 Humble Bundle 下载页面中的所有 torrent 链接并将它们吐出到一个 .txt 文件中。我目前正试图让它阅读所有这些并打印它们，但我似乎无法让它超越第一个。我尝试了一些不同的循环，其中一些吐出一次，其他人一遍又一遍地不断吐出同一个。这是我的代码。

f = open("Humble Bundle.htm").read()

pos = f.find('torrents.humblebundle.com') #just to initialize it for the loop
end = f.find('.torrent') #same here

pos1 = f.find('torrents.humblebundle.com') #first time it appears
end1 = f.rfind('.torrent') #last time it appears
while pos >= pos1 and end <= end1:
    pos = f.find('torrents.humblebundle.com')
    end = f.find('.torrent')
    link = f[pos:end+8]#the link in String form
    print(link)

我希望在我当前的问题以及如何继续完成最终脚本方面得到帮助。这是我在这里的第一篇文章，但在放弃并寻求帮助之前，我已经研究了我能做的事情。谢谢你的时间。

score 0 · Accepted Answer

您可以在此处尝试正则表达式：

import re

f = open('Humble Bundle.htm').read()
pattern = re.compile(r'torrents\.humblebundle\.com.*\.torrent')
print re.findall(pattern, f)

score 0 · Accepted Answer

您可以在http://docs.python.org/2/library/string.html#string.find找到有关find方法的更多信息

问题是当你执行这两行时，它们总是返回相同的值pos，end因为函数总是得到相同的参数。

pos = f.find('torrents.humblebundle.com')
end = f.find('.torrent')

find方法有另一个名为 start 的可选参数，它告诉函数从哪里开始搜索给定的字符串。因此，如果您更改代码：

pos = f.find('torrents.humblebundle.com', pos+1)
end = f.find('.torrent', end+1)

它应该工作

python - 在文件中查找链接，不断重复相同的链接

2 回答 2

Related

Reference