python - 如何计算多个因变量（y1 和 y2）上的自变量（x）的相关性？

Question

假设我们有 10 个自变量 x1,x2,x3,...xn，它们都是具有相同级别 0,1,2 的分类（例如，0 = 无颜色，1 = 红色，2 = 绿色），你有两个相关（响应）变量（例如，y1 = 裤长，单位为 m，y2 = 腰围，单位为 m）。我们如何确定哪些自变量 (x1,x2,x3,...xn) 驱动因变量 (y1 和 y2)？

数据示例如下：

| x1 | x2 | x3 | x4 | x5 | x6 | x7  | x8 | x9 | x10 | size(y1) | length(y2) |

|----|----|----|----|----|----|-----|----|----|-----|----------|------------|

|  0 |  1 |  2 |  1 |  0 |  0 |   2 |  1 |  0 |   2 |     0.36 |       0.84 |
|  0 |  1 |  1 |  0 |  2 |  1 |   0 |  2 |  0 |   1 |     0.84 |       1.23 |
|  1 |  2 |  0 |  1 |  0 |  1 |   0 |  1 |  0 |   2 |     1.92 |       3.86 |

我在 python 中尝试了 PLS 回归，这是我的代码

import pandas as pd
import numpy as np
df = pd.read_csv('data.csv', header = 0)

X =  pd.DataFrame.as_matrix(df[[x for x in df.columns if x not in ['waist_size', 'pant_length']]])
Y =  pd.DataFrame.as_matrix(df[[''waist_size', 'pant_length'']])

from sklearn.cross_decomposition import PLSRegression
pls = PLSRegression(n_components = 8)
pls.fit(X,Y)
coef = pls.coef_
sorted_index = np.argsort(np.abs(pls.coef_))

这种方法的实际结果如下：我正在为数据集中的所有行获取一个 numpy 数组，如下所示

[1, 0],
[1, 0],
[1, 0],
[1, 0],
[1, 0],
[0, 1],
[1, 0]
.....

如何解释这个？

而且，有没有办法计算这类问题中的直接相关性和特征重要性？

python - 如何计算多个因变量（y1 和 y2）上的自变量（x）的相关性？

0 回答 0

Related

Reference