python - 在新行上用python将文本写入txt文件？

Question

所以我试图检查一个 url 是否存在，如果存在，我想使用 python 将 url 写入文件。我还希望每个 url 在文件中单独一行。这是我已经拥有的代码：

import urllib2

在桌面创建一个空白 TXT 文件

urlhere = "http://www.google.com"   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    print "It exists"

然后，如果 URL 确实存在，则将 url 写入文本文件中的新行

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

如果 URL 不存在，请不要向文件中写入任何内容。

score 0 · Accepted Answer

像这样的东西怎么样：

import urllib2

url  = 'http://www.google.com'
data = ''

try:
    data = urllib2.urlopen(url).read()
except urllib2.URLError, e:
    data = 'PAGE 404: It Doesnt Exist ' + e

with open('outfile.txt', 'w') as out_file:
   out_file.write(data)

score 0 · Accepted Answer

您提出问题的方式有点令人困惑，但如果我理解正确，您所做的所有尝试都是使用 urllib2 测试 url 是否有效，以及是否将 url 写入文件？如果这是正确的，那么以下应该可以工作。

import urllib2
f = open("url_file.txt","a+")
urlhere = "http://www.google.com"   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    f.write(urlhere + "\n")
    f.close()
    print "It exists"

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

如果您想测试多个 url 但不想编辑 python 脚本，您可以通过键入来使用以下脚本python python_script.py "http://url_here.com"。这可以通过使用 sys 模块实现，其中 sys.argv[1] 等于传递给 python_script.py 的第一个参数。在这个例子中是 url (' http://url_here.com ')。

import urllib2,sys
f = open("url_file.txt","a+")
urlhere = sys.argv[1]   
print "for url: " + urlhere + ":"  

try: 
    fileHandle = urllib2.urlopen(urlhere)
    data = fileHandle.read()
    fileHandle.close()
    f.write(urlhere+ "\n")
    f.close()
    print "It exists"

except urllib2.URLError, e:
    print 'PAGE 404: It Doesnt Exist', e

或者，如果您真的想让您的工作变得轻松，您可以通过在命令行python python_script http://url1.com,http://url2.com中键入以下内容来使用以下脚本，您希望测试的所有 url 都用逗号分隔，没有空格。

import urllib2,sys
f = open("url_file.txt","a+")
urlhere_list = sys.argv[1].split(",")   

for urls in urlhere_list:
    print "for url: " + urls + ":" 
    try: 
        fileHandle = urllib2.urlopen(urls)
        data = fileHandle.read()
        fileHandle.close()
        f.write(urls+ "\n")

        print "It exists"

    except urllib2.URLError, e:
        print 'PAGE 404: It Doesnt Exist', e
    except:
        print "invalid url"
f.close()

sys.argv[1].split()如果您不想使用命令行功能，也可以用脚本中的 python 列表替换。希望这对您有所帮助，并祝您的程序好运。

note 使用命令行输入的脚本在 ubuntu linux 上进行了测试，因此如果您使用的是 windows 或其他操作系统，我不能保证它可以按照给出的说明工作，但应该可以。

score 0 · Accepted Answer

使用requests：

import requests

def url_checker(urls):
    with open('somefile.txt', 'a') as f:
       for url in urls:
           r = requests.get(url)
           if r.status_code == 200:
              f.write('{0}\n'.format(url))

url_checker(['http://www.google.com','http://example.com'])

python - 在新行上用python将文本写入txt文件？

3 回答 3

Related

Reference