我正在尝试按照这个示例,使用我自己的数据,使用 scikit-learn 执行线性判别分析和主成分分析。我的数据看起来像:
id,mois,prot,fat,ash,sodium,carb,cal,brand
14069,27.82,21.43,44.87,5.11,1.77,0.77,4.93,a
14053,28.49,21.26,43.89,5.34,1.79,1.02,4.84,a
14025,28.35,19.99,45.78,5.08,1.63,0.8,4.95,a
14016,30.55,20.15,43.13,4.79,1.61,1.38,4.74,a
14005,30.49,21.28,41.65,4.82,1.64,1.76,4.67,a
14075,31.14,20.23,42.31,4.92,1.65,1.4,4.67,a
14082,31.21,20.97,41.34,4.71,1.58,1.77,4.63,a
14097,28.76,21.41,41.6,5.28,1.75,2.95,4.72,a
14117,28.22,20.48,45.1,5.02,1.71,1.18,4.93,a
14133,27.72,21.19,45.29,5.16,1.66,0.64,4.95,a
...
brand
是目标变量。
按照上面链接的示例,我从以下代码开始:
# Import libraries
import pylab as pl
%pylab inline
from sklearn import datasets
from sklearn.decomposition import PCA
from sklearn.lda import LDA
import pandas as pd
# Set up the data for the example
pizza_raw = pd.read_csv("C:\mypath\pizza.csv")
pizza_target = pizza_raw["brand"]
# select all but the last column as data
pizza_data = pizza_raw.ix[:,:-1]
pizza_names = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"]
# Principal Components
pca = PCA(n_components=2)
X_r = pca.fit(pizza_data).transform(pizza_data)
# Linear Discriminant Analysis
lda = LDA(n_components=2)
X_r2= lda.fit(pizza_data, pizza_target).transform(pizza_data)
# Percentage of variance explained for each components
print('PCA explained variance ratio (first two components): %s'
% str(pca.explained_variance_ratio_))
以上所有工作都按预期工作(我认为)。示例中的下一步是绘制数据。(该示例适用于 IRIS 数据集...)示例代码如下所示
pl.figure()
for c, i, target_name in zip("rgb", [0, 1, 2], target_names):
pl.scatter(X_r[y == i, 0], X_r[y == i, 1], c=c, label=target_name)
pl.legend()
pl.title('PCA of IRIS dataset')
pl.figure()
for c, i, target_name in zip("rgb", [0, 1, 2], target_names):
pl.scatter(X_r2[y == i, 0], X_r2[y == i, 1], c=c, label=target_name)
pl.legend()
pl.title('LDA of IRIS dataset')
pl.show()
那么两个问题:
- 到目前为止,我将数据拟合到教程的方法是否正确?
- 如何调整示例绘图代码以为我的数据生成相同的 PCA 和 LDA 图?