python-3.x - 数据框和系列之间按行按元素划分

Question

几周前我刚刚开始使用 pandas，现在我正在尝试对行执行元素除法，但无法找出实现它的正确方法。这是我的案例和数据

          date  type    id     ...            1096        1097        1098
0   2014-06-13   cal     1     ...       17.949524   16.247619   15.465079
1   2014-06-13   cow    32     ...        0.523429   -0.854286   -1.520952
2   2014-06-13   cow    47     ...        7.676000    6.521714    5.892381
3   2014-06-13   cow   107     ...        4.161714    3.048571    2.419048
4   2014-06-13   cow   137     ...        3.781143    2.557143    1.931429
5   2014-06-13   cow   255     ...        3.847273    2.509091    1.804329
6   2014-06-13   cow   609     ...        6.097714    4.837714    4.249524
7   2014-06-13   cow   721     ...        3.653143    2.358286    1.633333
8   2014-06-13   cow   817     ...        6.044571    4.934286    4.373333
9   2014-06-13   cow   837     ...        9.649714    8.511429    7.884762
10  2014-06-13   cow   980     ...        1.817143    0.536571   -0.102857
11  2014-06-13   cow  1730     ...        8.512571    7.114286    6.319048
12  2014-06-13  dark     1     ...      168.725714  167.885715  167.600001

my_data.columns
Index(['date', 'type', 'id', '188', '189', '190', '191', '192', '193', '194',
       ...
       '1089', '1090', '1091', '1092', '1093', '1094', '1095', '1096', '1097',
       '1098'],
      dtype='object', length=914)

我的目标是将所有行除以行"type" == "cal"，但从列'188'到列'1098'（911列）

这些是我尝试过的方法：

提取感兴趣的行并将其与 apply()、divide() 和运算符 '/' 一起使用：

>>> cal_r = my_data[my_data["type"]=="cal"].iloc[:,3:]
my_data.apply(lambda x: x.iloc[3:]/cal_r, axis=1)
0       188 189 190 191 192 193 194 195 ...  1091 10...
1          188      189      190    ...           10...
2           188      189      190    ...         109...
3           188      189      190   ...         1096...
4          188      189   190      191   ...        ...
5            188      189      190    ...         10...
6           188      189      190    ...         109...
7          188      189      190    ...         1096...
8          188      189      190    ...         1096...
9          188      189  190    ...         1096    ...
10          188      189      190     ...          1...
11          188      189      190    ...         109...
12         188      189      190      191   ...     ...
dtype: object

>>> mydata.apply(lambda x: x.iloc[3:].divide(cal_r,axis=1), axis=1)
Traceback (most recent call last):
  File "<input>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/frame.py", line 6014, in apply
    return op.get_result()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 142, in get_result
    return self.apply_standard()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 248, in apply_standard
    self.apply_series_generator()
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/apply.py", line 277, in apply_series_generator
    results[i] = self.f(v)
  File "<input>", line 1, in <lambda>
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/ops.py", line 1375, in flex_wrapper
    self._get_axis_number(axis)
  File "/usr/local/lib/python3.5/dist-packages/pandas/core/generic.py", line 375, in _get_axis_number
    .format(axis, type(self)))
ValueError: ("No axis named 1 for object type <class 'pandas.core.series.Series'>", 'occurred at index 0')

不使用应用：

>>> my_data.iloc[:,3:].divide(cal_r)
    188  189  190  191  192  193  ...   1093  1094  1095  1096  1097  1098
0   1.0  1.0  1.0  1.0  1.0  1.0  ...    1.0   1.0   1.0   1.0   1.0   1.0
1   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
2   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
3   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
4   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
5   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
6   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
7   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
8   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
9   NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
10  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
11  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN
12  NaN  NaN  NaN  NaN  NaN  NaN  ...    NaN   NaN   NaN   NaN   NaN   NaN

命令my_data.iloc[:,3:].divide(cal_r, axis=1)并my_data.iloc[:,3:]/cal_r给出相同的结果，只除第一行。

如果我只选择一行，那就很好了：

my_data.iloc[5,3:]/cal_r
       188      189      190    ...         1096      1097      1098
0  48.8182  48.8274  22.4476    ...     0.214338  0.154428  0.116671

[1 rows x 911 columns]

我缺少一些基本的东西吗？我怀疑我将需要复制cal_r整个数据的相同行数。

非常感谢任何提示或指导。

相关：将 pandas 数据框元素除以其最大行数

score 1 · Accepted Answer

我相信您需要转换Series为 numpy 数组以除以1d数组：

cal_r = my_data.iloc[(my_data["type"]=="cal").values, 3:]
print (cal_r)
        1096       1097       1098
0  17.949524  16.247619  15.465079

my_data.iloc[:, 3:] /= cal_r.values
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319

或将一行转换DataFrame为SeriesbyDataFrame.squeeze或按位置选择第一行为Series：

my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.squeeze())
#alternative
#my_data.iloc[:, 3:] = my_data.iloc[:, 3:].div(cal_r.iloc[0])
print (my_data)
          date  type    id      1096       1097       1098
0   2014-06-13   cal     1  1.000000   1.000000   1.000000
1   2014-06-13   cow    32  0.029161  -0.052579  -0.098348
2   2014-06-13   cow    47  0.427644   0.401395   0.381012
3   2014-06-13   cow   107  0.231857   0.187632   0.156420
4   2014-06-13   cow   137  0.210654   0.157386   0.124890
5   2014-06-13   cow   255  0.214338   0.154428   0.116671
6   2014-06-13   cow   609  0.339715   0.297749   0.274782
7   2014-06-13   cow   721  0.203523   0.145147   0.105614
8   2014-06-14   cow   817  0.336754   0.303693   0.282788
9   2014-06-14   cow   837  0.537603   0.523857   0.509843
10  2014-06-14   cow   980  0.101236   0.033025  -0.006651
11  2014-06-14   cow  1730  0.474251   0.437866   0.408601
12  2014-06-14  dark     1  9.400010  10.332943  10.837319

python-3.x - 数据框和系列之间按行按元素划分

1 回答 1

Related

Reference