0

我正在熟悉 Python 的 FactorAnalyzer 进行因子分析。

在打印带有因子系数和显着性参数的因子标签时需要帮助。

我使用 loadings_ 但它的输出非常混乱;

数据来源(我下载到我的笔记本):https : //vincentarelbundock.github.io/Rdatasets/datasets.html(基于性格评估的BFI数据集)。

import pandas as pd
from sklearn.datasets import load_iris
from factor_analyzer import FactorAnalyzer
import matplotlib.pyplot as plt


df = pd.read_csv("../Downloads/bfi.csv")
df.head()

# Dropping unnecessary columns
df.drop(['gender', 'education', 'age'],axis=1,inplace=True)

# Dropping missing values rows
df.dropna(inplace=True)

基于特征值,只有 6 个因素是显着的(值 >1.0)。因为我不能得到好的输出,我不知道这些因素。

fa = FactorAnalyzer( method='minres', n_factors=6, rotation="varimax")

fa.fit(df)

print(fa.loadings_)

输出:

array([[-0.02290301, -0.03247244, 0.03316871, -0.03809335, 0.00379506, 0.10374847], [ 0.09939617, 0.06047379, 0.02669442, -0.53078469, -0.12030937, 0.16363839], [ 0.03176731, 0.259875 , 0.1402256 , 0.64656946, 0.05577021, -0.09704963] , [-0.00525556, 0.40884857, 0.10953353, 0.5870038 , 0.01618433, 0.03914857], [-0.07926603, 0.25534237, 0.22930809, 0.39176034, -0.13629257, 0.03340065], [-0.14364476, 0.4910488 , 0.0856494 , 0.45108989, 0.00911123, 0.10588827], [ 0.00562295, 0.12364715, 0.54015018, 0.00422137, 0.18345833, 0.13879815], [ 0.08435816, 0.10650466, 0.65249593, 0.05653766, 0.0792028 , 0.20858043], [-0.03394649, 0.0497959 , 0.54587749, 0.10028627, -0.0123717 , 0.05447959], [ 0.23161662, 0.0089893 , -0.67278538, - 0.08998026,-0.15345088,0.226977],[0.29340234,-0.1436436,-0.55970426,-0.04706994,0。0256143 , 0.09577898], [ 0.05310218, -0.52147723, 0.02649196, -0.09054497, -0.05928098, 0.33201867], [ 0.26318891, -0.62292324, -0.11075758, -0.07455019, -0.03044005, 0.29120361], [ 0.00119 , 0.63056485, 0.07741736, 0.15388275, 0.21421252 , 0.09215221], [-0.14723885, 0.68281775, 0.10390412, 0.2065131 , -0.13327166, -0.03773659], [ 0.02197833, 0.50438366, 0.31238313, 0.04844782, 0.18521834, -0.11350852], [ 0.79096653, 0.033469 , -0.04001445, -0.19151604, -0.07737848, -0.16815916], [ 0.77708495, -0.01765921, -0.02173671, -0.15558624, 0.00764293, -0.19939099], [ 0.72818732, -0.03614561, -0.0674602 , -0.02313414, -0.01532483, 0.02192578], [ 0.59778566, -0.2770728 , -0.1837043 , 0.01861508 , 0.06451108, 0.18288879], [ 0.53479082, -0.11293748, -0.04097176, 0.09644977, -0.1645811 , 0.11185692], [-0.00891931, 0.3023172, 0.3023176, 300134206, 0.46434464, 0.16741622], [ 0.16146455, 0.02029611, -0.10051682, 0.04691938, -0.50064301, 0.08416413], [ 0.0196248 , 0.40211954, 0.07042896, 0.06363394, 0.54784203, 0.12081641], [ 0.22872114, -0.0926477 , -0.03000306, 0.14801512, 0.34628284, 0.20228616],[0.06801995,0.00091956,-0.06223948,-0.05313796,-0.57993276,0.10662123]])

这是我的问题。我想不出一种使输出可用的方法(像这样):

 Factor 1.  Factor 2. Factor 3.  factor 4.  Factor 5.  Factor 6

A1.    value 1.   value 2.  value 2.   value 4.   value 5.   Value 6

A1 是第一个变量。总共有 26+ 个变量*在下降之前)。有 2700 条记录。

  1. 如何以可用的方式打印 loadings_ 的输出?

  2. 如何仅使用我选择的标签打印因子(在我的情况下为 6 个)?

  3. fa.get_factor_variance() 的相同输出问题

    fa.get_factor_variance()

(array([2.76721162, 2.72814014, 2.07554605, 1.6108362 , 1.46335442, 0.62155903]), array([0.10643122, 0.10492847, 0.07982869, 0.06195524, 0.05628286, 0.02390612]), array([0.10643122, 0.21135968, 0.29118838, 0.35314362, 0.40942648, 0.43333259] ))

谢谢您的帮助!!

4

0 回答 0