3

I am a newbie with python and found this excellent PCA biplot suggestion (Plot PCA loadings and loading in biplot in sklearn (like R's autoplot)). Now I tried to add a legend to the plot for the different targets. But the command plt.legend() doesn't work.

Is there an easy way to do it? As an example, the iris data with the biplot code from the link above.

import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.decomposition import PCA
import pandas as pd
from sklearn.preprocessing import StandardScaler

iris = datasets.load_iris()
X = iris.data
y = iris.target
#In general a good idea is to scale the data
scaler = StandardScaler()
scaler.fit(X)
X=scaler.transform(X)    

pca = PCA()
x_new = pca.fit_transform(X)

def myplot(score,coeff,labels=None):
    xs = score[:,0]
    ys = score[:,1]
    n = coeff.shape[0]
    scalex = 1.0/(xs.max() - xs.min())
    scaley = 1.0/(ys.max() - ys.min())
    plt.scatter(xs * scalex,ys * scaley, c = y)
    for i in range(n):
        plt.arrow(0, 0, coeff[i,0], coeff[i,1],color = 'r',alpha = 0.5)
        if labels is None:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, "Var"+str(i+1), color = 'g', ha = 'center', va = 'center')
        else:
            plt.text(coeff[i,0]* 1.15, coeff[i,1] * 1.15, labels[i], color = 'g', ha = 'center', va = 'center')
plt.xlim(-1,1)
plt.ylim(-1,1)
plt.xlabel("PC{}".format(1))
plt.ylabel("PC{}".format(2))
plt.grid()

#Call the function. Use only the 2 PCs.
myplot(x_new[:,0:2],np.transpose(pca.components_[0:2, :]))
plt.show()

Any suggestions for PCA biplots are welcome! Also other codes, if the adding of the legend is easier in another way!

4

2 回答 2

6

我最近提出了一种将图例添加到散点图的简单方法,请参阅GitHub PR。这仍在讨论中。

与此同时,您需要从y. 对于它们中的每一个,您将创建一个Line2D具有与散点图中使用的相同标记的对象,并将它们作为参数提供给plt.legend.

scatter = plt.scatter(xs * scalex,ys * scaley, c = y)
labels = np.unique(y)
handles = [plt.Line2D([],[],marker="o", ls="", 
                      color=scatter.cmap(scatter.norm(yi))) for yi in labels]
plt.legend(handles, labels)

在此处输入图像描述

于 2018-06-02T10:19:04.027 回答
0

试试“pca”库。这将绘制解释的方差,并创建一个双标图。

pip install pca

from pca import pca

# Initialize to reduce the data up to the number of componentes that explains 95% of the variance.
model = pca(n_components=0.95)

# Or reduce the data towards 2 PCs
model = pca(n_components=2)

# Load example dataset
import pandas as pd
import sklearn
from sklearn.datasets import load_iris
X = pd.DataFrame(data=load_iris().data, columns=load_iris().feature_names, index=load_iris().target)

# Fit transform
results = model.fit_transform(X)

# Plot explained variance
fig, ax = model.plot()

解释方差 PC

# Scatter first 2 PCs
fig, ax = model.scatter()

# Make biplot with the number of features
fig, ax = model.biplot(n_feat=4)

PCA 双标图

结果是一个字典,其中包含 PC、负载等的许多统计信息

于 2020-06-19T19:25:10.487 回答