0

我将一个更复杂的问题分解为更简单的问题。实际问题有更大的列表和更多的列。

从这个df开始:

 i |     COL1   |      COL2   |    COL3     |    COL4  |  Revenue    |  QTY    | Products
 
0  |      Coin  |   Gold Krug | Gold Coin   |  Coins   | 2333677473  |   21    |      12

1  | Gold Coin  |     Coins   | Gold Coin   |  Coins   | 2564774784  |   28    |    14

2  | Gold Coin  |     Coins   | Gold Krug   | Coins    |3256666647   |   35    |     16

3  |Gold Coin   |    Coins    |  Coins      |Gold Krug |    3456788  |   42    |     18

4  |Gold Krug   | Gold Coin   |  Coins      | Coins    |  4588960    | 49      |   20

5  |Gold Coin   |    Coins    | Gold Krug   | Coins    |346869909    |56       | 22

6  |Gold Coin   |    Coins    | Gold Coin   |  Coins   | 3777989     |63       | 24

7  |Gold Coin   |Silver Krug  |Gold Coin    | Coins    | 37687589    |70       | 26

8  |Gold Coin   |    Coins    |Gold Coin    | Coins    | 45789889    |77       | 28

9  |Gold Coin   | Gold Krug   |Gold Coin    |Coins     |    468      |84       | 30

我希望输出为 DF,并带有这样的新列:

i |  Category    |    Revenue         | QTY   |Products 

0 |Gold Krug     |  2333677473        |21     |    12

2 |Gold Krug     |  3256666647        | 35    |     16

3 |Gold Krug     |     3456788        | 42    |     18

4 | Gold Krug    |      4588960       |  49   |      20

5 | Gold Krug    |    346869909       |  56   |      22

7 | Silver Krug  |     37687589       |  70   |      26

9 | Gold Krug    |          468       |  84   |      30

我使用了这个,但根本不明白如何使用列表中与新列匹配的值创建新列:

KRUG = ['Gold Krug', 'Silver Krug', 'Gold Maple','Gold Eagle']

df = df[df[['COL1', 'COL2', 'COL3', 'COL4 ']].isin(KRUG).any(axis=1)]

print(df)

output :
i   |COL1         |COL2          |COL3          |COL4       |Revenue    |QTY    |Products
 
0   |Coin         |Gold Krug     |Gold Coin     |Coins      |2333677473 |21     |12

2   |Gold Coin    |Coins         |Gold Krug     |Coins      |3256666647 |35     |16

3   |Gold Coin    |Coins         |Coins         |Gold Krug  |3456788    |42     |18

4   |Gold Krug    |Gold Coin     |Coins         |Coins      |4588960    |49     |20

5   |Gold Coin    |Coins         |Gold Krug     |Coins      |346869909  |56     |22

7   |Gold Coin    |Silver Krug   |Gold Coin     |Coins      |37687589   |70     |26

9   |Gold Coin    |Gold Krug     |Gold Coin     |Coins      |468        |84     |30
4

2 回答 2

0

将搜索分成两部分,然后连接:

category = (df.filter(like='COL')
              .agg(','.join, axis = 1)
              .str.extract(fr"({'|'.join(KRUG)})")
              .dropna()
              .set_axis(['category'], axis = 'columns')
            )

others = df.loc[df.filter(like='COL').isin(KRUG).any(1), 
                ['Revenue', 'QTY', 'Products']]

pd.concat([category, others], axis = 'columns')

      category     Revenue  QTY  Products
0    Gold Krug  2333677473   21        12
2    Gold Krug  3256666647   35        16
3    Gold Krug     3456788   42        18
4    Gold Krug     4588960   49        20
5    Gold Krug   346869909   56        22
7  Silver Krug    37687589   70        26
9    Gold Krug         468   84        30

于 2021-11-14T05:25:48.650 回答
0

这是一个使用 apply() 的方法,虽然应该有一个更简单的方法使用 .str。如果数据库不是太大,这应该没问题。

import numpy as np
def get_coin(x):
    for k in KRUG:
        if k in x.tolist():
            return k
    return np.nan

df['category'] = df[['COL1', 'COL2', 'COL3', 'COL4']].apply(get_coin, axis=1)
df.drop(['COL1', 'COL2', 'COL3', 'COL4'], axis=1, inplace=True)
df.dropna(inplace=True)

   i     Revenue  QTY  Products     category
0  0  2333677473   21        12    Gold Krug
2  2  3256666647   35        16    Gold Krug
3  3     3456788   42        18    Gold Krug
4  4     4588960   49        20    Gold Krug
5  5   346869909   56        22    Gold Krug
7  7    37687589   70        26  Silver Krug
于 2021-11-14T03:30:07.230 回答