在执行简单的 ip 地址提取任务时,我发现程序运行良好。但是在完整的网络爬虫程序中,它无法生存并且结果参差不齐。
这是我的 ip 地址代码片段:
#!/usr/bin/python3
import os
import re
def get_ip_address(url):
command = "host " + url
process = os.popen(command)
results = str(process.read())
marker = results.find("has address") + 12
n = (results[marker:].splitlines()[0])
m = re.search('\w+ \w+: \d\([A-Z]+\)', n)
if m is not None:
url_new = url[8:]
command = "host " + url_new
process = os.popen(command)
results = str(process.read())
marker = results.find("has address") + 12
return results[marker:].splitlines()[0]
print(get_ip_address("https://www.yahoo.com"))
网络爬取的完整程序如下所示:
#!/usr/bin/python3
from general import *
from domain_name import *
from ip_address import *
from nmap import *
from robots_txt import *
from whois import *
ROOT_DIR = "companies"
create_dir(ROOT_DIR)
def gather_info(name, url):
domain_name = get_domain_name(url)
ip_address = get_ip_address(url)
nmap = get_nmap('-F', ip_address)
robots_txt = get_robots_txt(url)
whois = get_whois(domain_name)
create_report(name, url, domain_name, nmap, robots_txt, whois, ip_address)
def create_report(name, full_url, domain_name, nmap, robots_txt, whois, ip_address):
project_dir = ROOT_DIR + '/' + name
create_dir(project_dir)
write_file(project_dir + '/full_url.txt', full_url)
write_file(project_dir + '/domain_name.txt', domain_name)
write_file(project_dir + '/nmap.txt', nmap)
write_file(project_dir + '/robots_txt.txt', robots_txt)
write_file(project_dir + '/whois.txt', whois)
write_file(project_dir + '/ip_address.txt', ip_address)
x = input("Enter the Company Name: ")
y = input("Enter the complete url of the company: ")
gather_info( x , y )
输入的输入如下所示:
root@nitin-Lenovo-G580:~/Desktop/web_scanning# python3 main.py
106.10.138.240
Enter the Company Name: Yahoo
Enter the complete url of the company: https://www.yahoo.com/
/bin/sh: 1: Syntax error: "(" unexpected
ip_address.txt 中的输出为:
hoo.com/ not found: 3(NXDOMAIN)
所见的程序在运行时运行良好,并且将 ip 提供为 106.10.138.240 仍然在 ip_address.txt 中保存了一些不同的东西我也未能找出这个 /bin/sh 语法错误是如何产生的。请帮我...