我正在尝试创建一个函数来减少分配给变量的大量重复代码。
目前,如果我这样做,它会起作用
from pyquery import PyQuery as pq
import pandas as pd
d = pq(filename='20160319RHIL0_edit.xml')
# from nominations
res = d('nomination')
nomID = [res.eq(i).attr('id') for i in range(len(res))]
horseName = [res.eq(i).attr('horse') for i in range(len(res))]
zipped = list(zip(nomID, horseName))
frames = pd.DataFrame(zipped)
print(frames)
产生这个输出。
In [9]: 0 1
0 171115 Vergara
1 187674 Heavens Above
2 184732 Sweet Fire
3 181928 Alegria
4 158914 Piamimi
5 171408 Blendwell
6 166836 Adorabeel (NZ)
7 172933 Mary Lou
8 182533 Skyline Blush
9 171801 All Cerise
10 181079 Gust of Wind (NZ)
然而,为了继续添加这个,我需要创建更多像这个(下)这样的变量。在这种情况下,唯一变化的部分是变量名和属性 attr( 'horse' )
horseName = [res.eq(i).attr('horse') for i in range(len(res))]
因此,DRY 并创建一个接受参数的函数是合乎逻辑的,该参数是属性列表
from pyquery import PyQuery as pq
import pandas as pd
d = pq(filename='20160319RHIL0_edit.xml')
# from nominations
res = d('nomination')
aList = []
def inputs(args):
'''function to get elements matching criteria'''
optsList = ['id', 'horse']
for item in res:
for attrs in optsList:
if res.attr(attrs) in item:
aList.append([res.eq(i).attr(attrs) for i in range(len(res))])
zipped = list(zip(aList))
frames = pd.DataFrame(zipped)
print(frames)