我有返回不同过滤结果的循环脚本,我可以使这些数据作为每个不同过滤器类的数组返回。但是,我不确定将所有这些数组连接在一起的最佳方法。
import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep
from sets import Set
##### Code to loop the script and set up scheduling time
s = scheduler(time, sleep)
random.seed()
##### Code to stop duplicates part 1
userset = set ()
def run_periodically(start, end, interval, func):
event_time = start
while event_time < end:
s.enterabs(event_time, 0, func, ())
event_time += interval + random.randrange(-5, 10)
s.run()
##### Code to get the data required from the URL desired
def getData():
post_url = "URL OF INTEREST"
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
##### These are the parameters you've got from checking with the aforementioned tools
parameters = {'page' : '1',
'rp' : '250',
'sortname' : 'race_time',
'sortorder' : 'asc'
}
##### Encode the parameters
data = urllib.urlencode(parameters)
trans_array = browser.open(post_url,data).read().decode('UTF-8')
xmlload1 = json.loads(trans_array)
pattern2 = re.compile('/control/profile/view/(.*)\' title=')
pattern4 = re.compile('title=\'posted: (.*) strikes:')
pattern5 = re.compile('strikes: (.*)\'><img src=')
for row in xmlload1['rows']:
cell = row["cell"]
##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex
user_delimiter = cell['username']
selection_delimiter = cell['race_horse']
user_numberofselections = float(re.findall(pattern4, user_delimiter)[0])
user_numberofstrikes = float(re.findall(pattern5, user_delimiter)[0])
strikeratecalc1 = user_numberofstrikes/user_numberofselections
strikeratecalc2 = strikeratecalc1*100
userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
##### Code to stop duplicates throughout the day part 2 (skips if the id is already in the userset)
if userid_delimiter_results in userset: continue;
userset.add(userid_delimiter_results)
arraym = ""
arrayna = ""
if strikeratecalc2 > 50 and strikeratecalc2 < 100):
arraym0 = "System M"
arraym1 = "user id = ",userid_delimiter_results
arraym2 = "percantage = ",strikeratecalc2,"%"
arraym3 = ""
arraym = [arraym0, arraym1, arraym2, arraym3]
if strikeratecalc2 > 0 and strikeratecalc2 < 50):
arrayna0 = "System NA"
arrayna1 = "user id = ",userid_delimiter_results
arrayna2 = "percantage = ",strikeratecalc2,"%"
arrayna3 = ""
arrayna = [arrayna0, arrayna1, arrayna2, arrayna3]
getData()
run_periodically(time()+5, time()+1000000, 10, getData)
我想要做的是将'arraym'和'arrayna'作为一个最终数组返回,但是由于脚本在脚本的每个循环上的循环性质,旧的'arraym'/'arrayna'被覆盖,目前我尝试生成一个包含所有数据的数组已导致“systemm”的最后一个用户标识和“sustemna”的最后一个用户标识。这显然是因为,在每次运行循环时,它都会覆盖旧的“arraym”和“arrayna”,但是我不知道有什么方法可以解决这个问题,这样我的所有数据都可以累积在一个数组中。请注意,我现在已经累计编码两周了,所以很可能有一些简单的函数可以解决这个问题。
亲切的问候 AEA