1

我正在尝试创建一个脚本,该脚本向 txt 文件中的随机 url 发出请求

import urllib2

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 

但我希望当某些 url 找不到 404 时从文件中删除。每个 url 都是每行的,所以基本上是擦除每个找不到 404 的 url。怎么做?!

4

2 回答 2

1

您可以写入第二个文件:

import urllib2

with open('urls.txt', 'r') as urls, open('urls2.txt', 'w') as urls2:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e

        if r.code in (200, 401):
            print '[{}]: '.format(url), "Up!"
            urls2.write(url + '\n')
        elif r.code == 404:
            print '[{}]: '.format(url), "Not Found!" 
于 2013-01-24T03:53:50.353 回答
0

为了从文件中删除行,您必须重写文件的全部内容。最安全的方法是在同一目录中写出一个文件,然后rename将其覆盖旧文件。我会像这样修改你的代码:

import os
import sys
import tempfile
import urllib2

good_urls = set()

with open('urls.txt') as urls:
    for url in urls:
        try:
            r = urllib2.urlopen(url)
        except urllib2.URLError as e:
            r = e
        if r.code in (200, 401):
            sys.stdout.write('[{}]: Up!\n'.format(url))
            good_urls.add(url)
        elif r.code == 404:
            sys.stdout.write('[{}]: Not found!\n'.format(url))
        else:
            sys.stdout.write('[{}]: Unexpected response code {}\n'.format(url, r.code))

tmp = None
try:
    tmp = tempfile.NamedTemporaryFile(mode='w', suffix='.txt', dir='.', delete=False)
    for url in sorted(good_urls):
        tmp.write(url + "\n")
    tmp.close()
    os.rename(tmp.name, 'urls.txt')
    tmp = None
finally:
    if tmp is not None:
        os.unlink(tmp.name)

您可能希望在第一个循环中good_urls.add(url)的子句中添加 a。else如果有人知道一种更整洁的方法来完成我对 try-finally 所做的事情,我想听听。

于 2013-01-24T04:01:08.280 回答