python - 使用 SET 忽略循环脚本中预先登录的用户

Question

我正在尝试使用一组来阻止用户在以下代码中被重新打印。我设法让 python 接受他的代码而不会产生任何错误，但是如果我让代码在 10 秒的循环中运行，它会继续打印应该已经登录的用户。这是我第一次尝试使用集合，而且我是 python 的完全新手（到目前为止，它是根据我看到的示例构建的，并对它们进行逆向工程。）

下面是我正在使用的代码示例

import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep

######Code to loop the script and set up scheduling time

s = scheduler(time, sleep)
random.seed()

def run_periodically(start, end, interval, func):
    event_time = start
    while event_time < end:
        s.enterabs(event_time, 0, func, ())
        event_time += interval + random.randrange(-5, 45)
    s.run()

###### Code to get the data required from the URL desired
def getData():  
    post_url = "URL OF INTEREST"
    browser = mechanize.Browser()
    browser.set_handle_robots(False)
    browser.addheaders = [('User-agent', 'Firefox')]

######These are the parameters you've got from checking with the aforementioned tools
    parameters = {'page' : '1',
              'rp' : '250',
              'sortname' : 'roi',
              'sortorder' : 'desc'
             }
#####Encode the parameters
    data = urllib.urlencode(parameters)
    trans_array = browser.open(post_url,data).read().decode('UTF-8')

    xmlload1 = json.loads(trans_array)
    pattern1 = re.compile('>&nbsp;&nbsp;(.*)<')
    pattern2 = re.compile('/control/profile/view/(.*)\' title=')
    pattern3 = re.compile('<span style=\'font-size:12px;\'>(.*)<\/span>')



##### Making the code identify each row, removing the need to numerically quantify the     number of rows in the xmlfile,
##### thus making number of rows dynamic (change as the list grows, required for looping function to work un interupted)

    for row in xmlload1['rows']:
        cell = row["cell"]

##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex

        user_delimiter = cell['username']
        selection_delimiter = cell['race_horse']


        if strikeratecalc2 < 12 : continue;

##### REMAINDER OF THE REGEX DELMITATIONS
        username_delimiter_results = re.findall(pattern1, user_delimiter)[0]
        userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
        user_selection = re.findall(pattern3, selection_delimiter)[0]

##### Code to stop duplicate posts of each user throughout the day

    userset = set ([])
    if userid_delimiter_results in userset: continue;



##### Printing the results of the code at hand

        print "user id = ",userid_delimiter_results
        print "username = ",username_delimiter_results
        print "user selection = ",user_selection
        print ""



##### Code to stop duplicate posts of each user throughout the day  part 2 (udating set to add users already printed to the ignore list)

    userset.update(userid_delimiter_results)

    getData()


    run_periodically(time()+5, time()+1000000, 300, getData)

任何评论都将不胜感激，这对您经验丰富的编码人员来说似乎是常识，但我真的只是通过“Hello world”

亲切的问候 AEA

score 2 · Accepted Answer

这个：

userset.update(userid_delimiter_results)

应该是这样的：

userset.add(userid_delimiter_results)

为了证明这一点，请尝试userset在每次通话后打印内容。

python - 使用 SET 忽略循环脚本中预先登录的用户

1 回答 1

Related

Reference