python - Dynamically named set, or alternative suggested method?

Question

First of all thank you for taking the time to look at my problem. Rather than simply describing the solution I have in mind for the problem I have to solve, I though it best to outline the problem also in order to enable alternative solution ideas to be suggested. It is more than likely that there is a better way to achieve this solution.

The problem I have:

I generate lists of names with associated scores ranks and other associated values, these lists are generated daily but have to change as the day progresses as a result of needing to remove some names. Currently these lists of names are produced on excel based sheets which contains the following data types in the following format;

(Unique List Title)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)

(Unique List Title)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)
(Unique Name in list),(Rank),(Score),(Calculated Numeric Value),(non unique filter)

For example;

Mrs Dodgsons class
Rosie,1,123.8,5,Lincoln University
James,2,122.6,7,Lincoln University
Chris,3,120.4,12,Lincoln University
Douglas,4,120.2,18,Lincoln University

Dr Clements class
Hannah,1,126.9,2.56,Durham University
Gill,2,124.54,6.89,Durham University
Jack,3,122.04,15.62,Durham University
Jamie,4,121.09,20.91,Durham University
Douglas,4,120.2,18,Durham University

Now what I have is a separate list of users and their associated "non unique filter" who need removing from the above excel generated lists (don't worry the final product of this little project is not to re-save a modified excel doc), this list is generated via a web scraper which is updated every two minutes. The method I currently perceive as a potentially viable solution to this problem is to use a piece of code which saves each list in the CSV as a SET (if this is possible) then upon finding a Unique Name/non uniqe filter combination it would then delete them from the SET/S in which they occur.

For instance if Douglas,Durham University was returned on this list then the second of the two defined sets would see Douglas removed. In the instance where a Unique user name appears in two of the sets one of them will always appear on the list of users to be removed along with their associated University (so we can identify which set to remove the user from.) However please note that users to be removed do not always appear in two sets at once, for instance "Rosie,Lincoln University" could just as easily appear on the list of users to be removed.

I previously put a very similar problem on the python forum, however I had made a few mistakes in the way the question was asked, and what I wanted to achieve, instead of confusing the issue on the old thread I have started up a new thread here. On the old thread there were some general questions asked about the problem which I shall answer here in order to provide some clarification.

Q1 So the first list is only generated once a day, what happens to it after that day, is it thrown away, stored, replaced ect.

A1 My gut feeling is that it should be saved to a folder as a simple .txt .csv or similar, if only for a debuggin log.

Q2 Every two minutes the first list needs altering, what happens to the altered list, who needs to know about it, is it stored or just changed in some memory state etc.

A2 The ultimate aim of this code is to produce an RSS with user statistics, some of these stats include the (Rank) & the (Calculated Numeric Value). The Rank is self explanitory with regards to how this could change as a result of a user being removed. However the (Calculated Numeric Value) is derived from an equation which uses the sum of the (Score)'s for each list as well as the number of users in said list. So in answering the original question the list will need to be stored in some way.

Q3 Are names unique per class or unique throughout the whole data.

A3 Name are unique throughout the entire data, i.e. the username Douglas will always refer to Douglas, if a user appears in more than one class then it will always appear on the list of users to be removed.

Q4 If names are unique what happens when two people have same name in the same class which sounds quite possible.

A4 In this example it seems possible for more than one user in the same class to have the same name, however in reality it can not happen.

My questions to the stack overflow are;

is the methodology proposed viable with regards to producing multiple uniquely named SETs (up to 60 per day )
Is there a better method of achieving the same result ?

Any help or comments would be greatly appreciated

Best regards AEA

score 1 · Accepted Answer

No, I don't think you could convert the data in each csv file to a set without a loss of data. You could avoid that by converting them into dictionaries keyed by a tuple of(user, non-unique filter) associated with a list value consisting of all the other quantities in the corresponding row of the csv.

To update these dictionaries, you could simply delete any entries that exist in them that match any on the separate list of users you have of those that need removing.

If you are unsure of how to do either of these things, feel free to ask another question.

python - Dynamically named set, or alternative suggested method?

1 回答 1

Related

Reference