python - 读取文件，提取 url 并重写 -Python

Question

我正在阅读以下文本文件format(a.txt)。

http://www.example.com/forum/showthread.php?t=779689/images/webcard.jpg 121.10.208.31

然后我只需要获取www.example.com部分/images/webcard.jpg 121.10.208.31并写入同一个文件或单独的文件。在这种情况下，我将其写入b.txt.

from urlparse import urlparse 
f = open('a.txt','r')
fo = open('b','w')


for line in f:
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')

上面的代码给出了以下错误？如何实现这一点？

    Traceback (most recent call last):
  File "prittyprint.py", line 17, in <module>
    fo.write(urlparse(line).netloc+ ' ' + line.split(' ')[1] + ' ' + line.split(' ')[2] + '\n')
IndexError: list index out of range

score 3 · Accepted Answer

可能是您的文件中存在异常a.txt。某些行可能没有这种格式。你可以试试这个——

from urlparse import urlparse 

f = open('a.txt','r')
fo = open('b','w')

for line in f:
    split_line = line.split(' ')
    if len(split_line) >=3:
        fo.write(urlparse(line).netloc+ ' ' + split_line[1] + ' ' + split_line[2] + '\n')
    else:
        print "ERROR: some other line: %s" % (line) #continue on with next line

python - 读取文件，提取 url 并重写 -Python

1 回答 1

Related

Reference