仅当我正在搜索的字符串包含逗号时,我才尝试执行一堆代码。
这是我需要解析的一组示例行(名称是此制表符分隔文件的列标题,并且列(令人讨厌)包含名称、学位和实践领域:
name
Sam da Man J.D.,CEP
Green Eggs Jr. Ed.M.,CEP
Argle Bargle Sr. MA
Cersei Lannister M.A. Ph.D.
我的问题是,有些行包含一个逗号,后跟一个首字母缩略词,代表专业人士的“实践领域”,有些则没有。
我的代码依赖于每行包含一个逗号的原则,我现在必须修改代码以说明没有逗号的行。
def parse_ieca_gc(s):
########################## HANDLE NAME ELEMENT ###############################
degrees = ['M.A.T.','Ph.D.','MA','J.D.','Ed.M.', 'M.A.', 'M.B.A.', 'Ed.S.', 'M.Div.', 'M.Ed.', 'RN', 'B.S.Ed.', 'M.D.']
degrees_list = []
# separate area of practice from name and degree and bind this to var 'area'
split_area_nmdeg = s['name'].split(',')
area = split_area_nmdeg.pop() # when there is no area of practice and hence no comma, this pops out the name + deg and leaves an empty list, that's why 'print split_area_nmdeg' returns nothing and 'area' returns the name and deg when there's no comma
print 'split area nmdeg'
print area
print split_area_nmdeg
# Split the name and deg by spaces. If there's a deg, it will match with one of elements and will be stored deg list. The deg is removed name_deg list and all that's left is the name.
split_name_deg = re.split('\s',split_area_nmdeg[0])
for word in split_name_deg:
for deg in degrees:
if deg == word:
degrees_list.append(split_name_deg.pop())
name = ' '.join(split_name_deg)
# area of practice
category = area
re.search() 和 re.match() 似乎都不起作用,因为它们返回实例而不是布尔值,那么我应该用什么来判断是否有逗号?