当前正在运行以下脚本,该脚本检查一长串 url 中的错误。此代码首先在 df['Final_URL'] 中查找唯一 url,测试每个单独的 url 并返回该链接 url 的状态。当我运行下面的代码时,我会在我的笔记本上得到当前的输出,这很好。现在我想将状态代码(例如 200、404、BAD 等)推送到我的 df 中名为“Status”的所有 url 的新列中,该列等于我在代码开头获得的唯一 url。
创建新列 df['Status'] 的最佳方法是什么,因为我想将其导出到谷歌表格,你知道在使用 pygsheets 更新单元格时是否保留了文本颜色?
Input code:
#get unique urls and check for errors
URLS = []
for unique_link in df['Final_URL'].unique():
URLS.append(unique_link)
try:
GREEN = '\033[92m'
YELLOW = '\033[93m'
RED = '\033[91m'
ENDC = '\033[0m'
def main():
while True:
print ("\nTesting URLs.", time.ctime())
checkUrls()
time.sleep(10) #Sleep 10 seconds
break
def checkUrls():
for url in URLS:
status = "N/A"
try:
#check if regex contains bet3.com
if re.search(".*bet3\.com.*", url):
status = checkUrl(url)
else:
status = "BAD"
except requests.exceptions.ConnectionError:
status = "DOWN"
printStatus(url, status)
#for x in df['Final_URL']:
# if x == url:
# df['Status'] = printStatus(status)
def checkUrl(url):
r = requests.get(url, timeout=5)
#print r.status_code
return str(r.status_code)
def printStatus(url, status):
color = GREEN
if status != "200":
color=RED
print (color+status+ENDC+' '+ url)
#
# Main app
#
if __name__ == '__main__':
main()
except:
print('Something went wrong!')
Current output:
200 https://www.bet3.com/dl/~offer
404 http://extra.bet3.com/promotions/en/soccer/soccer-accumulator-bonus
BAD https://extra.betting3.com/features/en/bet-builder
200 https://www.bet3.com/dl/6