我不知道该怎么做,我有一个 39 行的 Python 脚本,它在第 40 行给了我一个错误!:( 错误:
Traceback (most recent call last):
File "C:\Mass Storage\pythonscripts\Internet\execute.py", line 2, in <module>
execfile("firstrunSoup.py")
File "firstrunSoup.py", line 40
^
SyntaxError: invalid syntax
C:\Mass Storage\pythonscripts\Internet>
这是我的 Python 代码:
###firstrunSoup.py###
FILE = open("startURL","r") #Grab from
stURL = FILE.read() #Read first line
FILE.close() #Close
file2save = "index.txt" #File to save URLs to
jscriptV = "not"
try:
#Returns true/false for absolute
def is_absolute(url):
return bool(urlparse.urlparse(url).scheme)
#Imports
import urllib2,sys,time,re,urlparse
from bs4 import BeautifulSoup
cpURL = urllib2.urlopen(stURL) #Human-readable to computer-usable
soup = BeautifulSoup(cpURL) #Defines soup
FILE = open(file2save,"a")
for link in soup.find_all('a'): #Find all anchor tags
outPut = ""
checkVar = link.get('href') #Puts href into string
if (checkVar is not None) and (checkVar != ""): #Checks if defined
if len(checkVar) > 11: #Check if longer than 11 characters
if checkVar[:11] != "javascript:": #Check if first 11 are "javascript:"
if checkVar[:7] != "mailto:": #Check if first 7 are "mailto:"
jscriptV = "not"
else: jscriptV = ""
else: jscriptV = ""
if checkVar != "#" and checkVar != "/":
if jscriptV == "not":
if checkVar is not None: #Checks if defined
if is_absolute(checkVar): outPut = checkVar.split("#")[0]
else: outPut = urlparse.urljoin(stURL,checkVar).split("#")[0]
if outPut != "":
print outPut
FILE.write(outPut + "\r\n")
FILE.close()
execfile("nextrunsSoup.py")
如果你能帮助我,请做。到目前为止,我已经花了很多时间在这上面,当它终于准备好时,我明白了。提前致谢!