使用 pandas.dataframe,例如:
<class 'pandas.core.frame.DataFrame'>
Index: 685 entries, 7789285 to 8009947
Data columns (total 18 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 sourcedId 685 non-null string
1 status 685 non-null string
2 dateLastModified 685 non-null datetime64[ns, UTC]
3 username 685 non-null string
4 userIds 685 non-null object
5 enabledUser 685 non-null string
6 givenName 685 non-null string
7 familyName 685 non-null string
8 middleName 685 non-null string
9 role 685 non-null string
10 identifier 685 non-null string
11 email 685 non-null string
12 sms 685 non-null string
13 phone 685 non-null string
14 agents 685 non-null object
15 orgs 685 non-null object
16 grades 685 non-null object
17 password 685 non-null string
dtypes: datetime64[ns, UTC](1), object(4), string(13)
memory usage: 101.7+ KB
df.head()
'grades' 列包含作为字符串的整数列表,即 ['9','10']。我可以通过过滤单个值
mask = df.grades.apply(lambda x: '10' in x)
在我的测试数据集中,它是从我手动填充的列表列表中创建的,我使用了整数值,所以下面的工作正常(?)(为了论证,假设数据是整数列表而不是整数列表字符串)
gradeList = [9,10]
mask = df.grades.apply(lambda x: any(map(lambda x,y: x==y,x gradeList)))
df[mask].head()
我对 Python 比较陌生(在过去的五年中,我已经积累了我认为大约 6 到 8 个月的 Python 经验,如果那样的话)并且对 Pandas 完全陌生。我对列表理解和地图功能只有初步的了解。
我的本意是让我能够检索成绩列中存在成绩列表子集的任何记录。对于Grade中的单个整数,这是通过以下方式完成的:
mask = df.grades.apply(lambda x: grade in x)
我没有使用上述嵌套的 lambda 和映射来实现我的目标,而是创建了一些查询参数 ( gradesList ) 中的术语顺序很重要的东西。下面是我的测试脚本的输出,它对输出中包含的测试数据进行操作。我试图不假设任何一个列表的顺序......
--------------------------------------------------------------------------------
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 id 10 non-null object
1 email 10 non-null object
2 fullName 10 non-null object
3 jobTitles 10 non-null object
4 grades 10 non-null object
dtypes: object(5)
memory usage: 528.0+ bytes
--------------------------------------------------------------------------------
id email fullName jobTitles grades
0 smithsm smithsm@aplace.com Stu Smith [developer, licensed pretend nurse, worthless ... [9, 10, 11, 12]
1 mullenjb mullenjb@aplace.com Jason Mullen [printer guy, supervisor, senior it] [11, 12]
2 swainrl swainrl@aplace.com Ryan Swain [nap taker, goof-off, goober] [9, 10]
3 rankinsns rankinsns@aplace.com Nicholas Rankins [manual tesla autopilot] [9, 10]
4 carlsonrm carlsonrm@aplace.com Ryan Carlson [technician, snarky so-and-so] [10, 11]
5 ragomv ragomv@aplace.com Mike Rago [nice guy, swole] [10]
6 smithdl smithdl@aplace.com David Smith [old hand] [9]
7 kappleraj kappleraj@aplace.com Allison Kappler [girl coder, definitely not prettier than me] [11]
8 iresonss iresonss@aplace.com Sandy Ireson [hard worker] [12]
9 conklincc conklincc@aplace.com Caleb Conklin [millenial magnum pi] [12, 9]
--------------------------------------------------------------------------------
query for 'developer'
id email fullName jobTitles grades
0 smithsm smithsm@aplace.com Stu Smith [developer, licensed pretend nurse, worthless ... [9, 10, 11, 12]
--------------------------------------------------------------------------------
query for 11
id email fullName jobTitles grades
0 smithsm smithsm@aplace.com Stu Smith [developer, licensed pretend nurse, worthless ... [9, 10, 11, 12]
1 mullenjb mullenjb@aplace.com Jason Mullen [printer guy, supervisor, senior it] [11, 12]
4 carlsonrm carlsonrm@aplace.com Ryan Carlson [technician, snarky so-and-so] [10, 11]
7 kappleraj kappleraj@aplace.com Allison Kappler [girl coder, definitely not prettier than me] [11]
--------------------------------------------------------------------------------
query for 10
id email fullName jobTitles grades
0 smithsm smithsm@aplace.com Stu Smith [developer, licensed pretend nurse, worthless ... [9, 10, 11, 12]
2 swainrl swainrl@aplace.com Ryan Swain [nap taker, goof-off, goober] [9, 10]
3 rankinsns rankinsns@aplace.com Nicholas Rankins [technician] [9, 10]
4 carlsonrm carlsonrm@aplace.com Ryan Carlson [technician, snarky so-and-so] [10, 11]
5 ragomv ragomv@aplace.com Mike Rago [nice guy, swole] [10]
--------------------------------------------------------------------------------
query for 11,12
id email fullName jobTitles grades
1 mullenjb mullenjb@aplace.com Jason Mullen [printer guy, supervisor, senior it] [11, 12]
7 kappleraj kappleraj@aplace.com Allison Kappler [girl coder, definitely not prettier than me] [11]
--------------------------------------------------------------------------------
query for 10,11
id email fullName jobTitles grades
4 carlsonrm carlsonrm@aplace.com Ryan Carlson [technician, snarky so-and-so] [10, 11]
5 ragomv ragomv@aplace.com Mike Rago [nice guy, swole] [10]
--------------------------------------------------------------------------------
query for 9,10
id email fullName jobTitles grades
0 smithsm smithsm@aplace.com Stu Smith [developer, licensed pretend nurse, worthless ... [9, 10, 11, 12]
2 swainrl swainrl@aplace.com Ryan Swain [nap taker, goof-off, goober] [9, 10]
3 rankinsns rankinsns@aplace.com Nicholas Rankins [technician] [9, 10]
6 smithdl smithdl@aplace.com David Smith [old hand] [9]
--------------------------------------------------------------------------------
query for 10,9
id email fullName jobTitles grades
4 carlsonrm carlsonrm@aplace.com Ryan Carlson [technician, snarky so-and-so] [10, 11]
5 ragomv ragomv@aplace.com Mike Rago [nice guy, swole] [10]
9 conklincc conklincc@aplace.com Caleb Conklin [millenial magnum pi] [12, 9]
是否有人能够识别(希望是我缺少的核心概念)或指向我的文档来帮助我解开正在发生的事情?