0

我有一些包含代理 ip 的文本文件。

如下所示

    130.14.29.111:80                          
    130.14.29.120:80                          
    130.159.235.31:80                         
    14.198.198.220:8909                       
    141.105.26.183:8000                       
    160.79.35.27:80                           
    164.77.196.75:80                          
    164.77.196.78:45430                       
    164.77.196.78:80                          
    173.10.134.173:8081                       
    174.132.145.80:80                         
    174.137.152.60:8080                       
    174.137.184.37:8080                       
    174.142.125.161:80     

处理后检查这个代理,然后我想标记如下

     total number of '0' = 8
     total number of 'x' = 6
     percentage = alive 60% , dead 40%


     x  130.14.29.111:80              
     0  130.14.29.120:80              
     0  130.159.235.31:80             
     0  14.198.198.220:8909           
     0  141.105.26.183:8000           
     0  160.79.35.27:80               
     x  164.77.196.75:80              
     x  164.77.196.78:45430           
     x  164.77.196.78:80              
     0  173.10.134.173:8081           
     0  174.132.145.80:80             
     0  174.137.152.60:8080           
     x  174.137.184.37:8080           
     x  174.142.125.161:80           

如何用python完成?或一些样品,如果有人会帮助我或启发我非常感谢!

我被编辑了

这是我所拥有的脚本来源

最后检查完成的代理列表是否保存到'proxy_alive.txt'

在这个文件中,我想标记代理元素是否存在。

    import socket
    import urllib2
    import threading
    import sys
    import Queue
    import socket

    socket.setdefaulttimeout(7)

    print "Bobng's proxy checker. Using %s second timeout"%(socket.getdefaulttimeout())

    #input_file = sys.argv[1]
    #proxy_type = sys.argv[2] #options: http,s4,s5
    #output_file = sys.argv[3]
    input_file = 'proxylist.txt'
    proxy_type = 'http'
    output_file = 'proxy_alive.txt'

    url = "www.seemyip.com" # Don't put http:// in here, or any /'s

    check_queue = Queue.Queue()
    output_queue = Queue.Queue()
    threads = 20

    def writer(f,rq):
        while True:
            line = rq.get()
            f.write(line+'\n')

    def checker(q,oq):
        while True:
            proxy_info = q.get() #ip:port
            if proxy_info == None:
                print "Finished"
            #quit()
                return
            #print "Checking %s"%proxy_info
            if proxy_type == 'http':
                try:

            listhandle = open("proxylist.txt").read().split('\n')

            for line in listhandle:   
                saveAlive = open("proxy_alive.txt", 'a')

                details = line.split(':')
                email = details[0]
                password = details[1].replace('\n', '')


                proxy_handler = urllib2.ProxyHandler({'http':proxy_info})
                opener = urllib2.build_opener(proxy_handler)
                opener.addheaders = [('User-agent','Mozilla/5.0')]
                urllib2.install_opener(opener)
                req = urllib2.Request("http://www.google.com")
                sock=urllib2.urlopen(req, timeout= 7)
                rs = sock.read(1000)
                if '<title>Google</title>' in rs:
                oq.put(proxy_info)
                print '[+] alive proxy' , proxy_info
                saveAlive.write(line)
            saveAlive.close()    
                except urllib2.HTTPError,e:
            print 'url open error? slow?'
                    pass
                except Exception,detail:
                    print '[-] bad proxy' ,proxy_info

            else:
                # gotta be socks
                try:
                    s = socks.socksocket()
                    if proxy_type == "s4":
                        t = socks.PROXY_TYPE_SOCKS4
                    else:
                        t = socks.PROXY_TYPE_SOCKS5
                    ip,port = proxy_info.split(':')
                    s.setproxy(t,ip,int(port))
                    s.connect((url,80))
                    oq.put(proxy_info)
                    print proxy_info
                except Exception,error:
                    print proxy_info

    threading.Thread(target=writer,args=(open(output_file,"wb"),output_queue)).start()
    for i in xrange(threads):
        threading.Thread(target=checker,args=(check_queue,output_queue)).start()
    for line in open(input_file).readlines():
        check_queue.put(line.strip('\n'))
    print "File reading done"
    for i in xrange(threads):
        check_queue.put(None)
    raw_input("PRESS ENTER TO QUIT")
    sys.exit(0)
4

1 回答 1

0

您可以使用队列来排队所有地址列表及其元信息。完成对该 IP 地址的操作后,您可以使用 'w' 模式将其写回到同一个文件中。

[ ( ip-address-1,'x' ), ( ip-address-2, '0'), ...... ]
于 2012-06-01T05:29:06.957 回答