1

我有一个如下的数据框:

BankNum | FirstName | LastName  | ID |

00987772  |  Michael  | Brown   | 123 |
00987772  |  Bob      | Brown   | 123 |
00987772  |  Michael  | Mooney  | 123 |
00987772  |  Raven    | Mallik  | 245 |
00982122  |  Karim    | Hareche | 564 |

我正在执行以下操作以获取两个字典:

cols = [
    {'col': 'BankNum', 'func': lambda x: x.value_counts().to_dict()},
    {'col': 'FirstName', 'func': pd.Series.nunique},
    {'col': 'LastName', 'func': pd.Series.nunique}]

    d = df.groupby('Transporter ID').apply(lambda x: tuple(c['func'](x[c['col']]) for c in cols)).to_dict()            

    cols1 = ['ID']
    df2 = df.groupby('BankNum').apply(lambda x: tuple(x[c].nunique() for c in cols1))
    d1 = df2.to_dict()

在哪里

d ={ 123 : ({00987772: 3}, 2,2), 245: ({00987772: 1}, 1,1), 564: ({00982122: 1}, 1,1)}

d1 = {00987772: (2,), 00982122:(1,)}

接下来,我正在执行以下操作(以下只是相关代码,还有其他我正在做的事情,我已从以下代码中删除:

   same_banknum={}
   l=[] 
    w=[]
    m = v[2].values()
    h2 = sum(i > 6 for i in m)
    mod2 = sum(i in [5,6] for i in m)
    l2 = sum(i in [3,4] for i in m)
    if h2 != 0:
        for k2, v2 in v[2].items():
            if v2 > 6:
                l.append(k2)
                w.append(v2)



    new_l=[]
    for i in l:
        v3 = d1.get(i) 
        new_l.append(v3[0])

    h3 = sum(i > 8 for i in new_l)
    m3 = sum(i in [5,6,7,8] for i in new_l)
    l3 = sum(i in [3,4] for i in new_l)
    c=[]
    if h3 != 0:
        for g in new_l:
            if g > 8:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("High", wt)
    elif m3 != 0:
        for g in new_l:
            if g in [5,6,7,8]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Moderate", wt)
    elif l3 != 0:
        for g in new_l:
            if g in [3,4]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Low", wt)
    else:
        same_banknum[k]= ("Low", 0.0)

elif mod2 != 0:
    for k2, v2 in v[2].items():
        if v2 in [5,6]:
            l.append(k2)
            w.append(v2)

    new_l=[]
    for i in l:
        v3 = d1.get(i) 
        new_l.append(v3[0])

    h3 = sum(i > 8 for i in new_l)
    m3 = sum(i in [5,6,7,8] for i in new_l)
    l3 = sum(i in [3,4] for i in new_l)
    c=[]
    if h3 != 0:
        for g in new_l:
            if g > 8:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("High", wt)
    elif m3 != 0:
        for g in new_l:
            if g in [5,6,7,8]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Moderate", wt)
    elif l3 != 0:
        for g in new_l:
            if g in [3,4]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Low", wt)
    else:
        same_banknum[k]= ("Low", 0.0)
elif l2 != 0:
    for k2, v2 in v[2].items():
        if v2 in [3,4]:
            l.append(k2)
            w.append(v2)

    new_l=[]
    for i in l:
        v3 = d1.get(i) 
        new_l.append(v3[0])

    h3 = sum(i > 8 for i in new_l)
    m3 = sum(i in [5,6,7,8] for i in new_l)
    l3 = sum(i in [3,4] for i in new_l)
    c=[]
    if h3 != 0:
        for g in new_l:
            if g > 8:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("High", wt)
    elif m3 != 0:
        for g in new_l:
            if g in [5,6,7,8]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Moderate", wt)
    elif l3 != 0:
        for g in new_l:
            if g in [3,4]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Low", wt)
    else:
        same_banknum[k]= ("Low", 0.0)
else:
    for k2, v2 in v[2].items():
        if v2 in [1,2]:
            l.append(k2)

    new_l=[]
    for i in l:
        v3 = d1.get(i) 
        new_l.append(v3[0])

    h3 = sum(i > 8 for i in new_l)
    m3 = sum(i in [5,6,7,8] for i in new_l)
    l3 = sum(i in [3,4] for i in new_l)
    c=[]
    if h3 != 0:
        for g in new_l:
            if g > 8:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("High", wt)
    elif m3 != 0:
        for g in new_l:
            if g in [5,6,7,8]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Moderate", wt)
    elif l3 != 0:
        for g in new_l:
            if g in [3,4]:
                c.append(g)
        wt = sum(c)
        same_banknum[k]= ("Low", wt)
    else:
        same_banknum[k]= ("Low", 0.0)

得到这样的字典:

same_banknum = {123: ('Low', 0.6), 245: ('Low', 0.6), 564: ('Low', 0.0)}

same_banknum 字典执行上述计算并找出BankNum多个 ID 是否存在相同的 ID,然后为它们分配High, Low,Moderate值以及它对它的权重,给我们一个字典。

我可以将其转换为如下数据框:

df1 = pd.DataFrame.from_dict(same_banknum, orient='index').reset_index()
df1.columns = ['ID','SameBankNum_Val','SameBankNum_Wt']

这使:

ID   | SameBankNum_Val  | SameBankNum_Wt
123  |  Low             | 0.6
245  | Low              | 0.6
564  | Low              | 0.0

我想要做的是,我不想为每个进来的新数据集一次又一次地执行此计算,而是想使用机器学习来构建一个预测模型,该模型预测上述SameBankNum_ValSameBankNum_Wt新 ID(测试数据)。

我可以将SameBankNum_Val&SameBankNum_Wt列添加到上述训练数据框中。但是,我想知道的是: 如何将多列(BankNum, FirstName, LastName, ID)(来自上面的 Dataframe 1)作为火车数据和多列(SameBankNum_Val, SameBankNum_Wt)(来自上面的 Dataframe 2)作为机器中的火车标签学习模式?

此外,机器学习模型是否足够准确地确定何时给出它HighLow或者Moderate价值和什么权重,而无需一次又一次地执行那么长的计算?对于这个问题,我想我只需要先用多个模型进行测试。

请帮忙!谢谢!

4

0 回答 0