matlab - 使用内核 PCA (KPCA) 进行特征选择

Question

我已经尝试使用主成分分析 (PCA) 进行特征选择，它从一组九个特征中给了我 4 个最佳特征（绿色的平均值、绿色的方差、绿色的标准 div、红色的平均值、红色的方差、标准 div . of Red, Mean of Hue, Variance of Hue, Std. div. of Hue，即[MGcorr,VarGcorr,stdGcorr,MRcorr,VarRcorr,stdRcorr,MHcorr,VarHcorr,stdHcorr])用于将数据分类为两个簇。从文献来看，PCA 似乎不是很好的方法，而是更好地应用内核 PCA (KPCA) 进行特征选择。我想应用 KPCA 进行特征选择，我尝试了以下操作：

d=4; % number of features to be selected, or d: reduced dimension
[Y2 eigVector para ]=kPCA(feature,d); % feature is 300X9 matrix with 300 as number of  
                                      % observation and 9 features
                                      % Y: dimensionanlity-reduced data

以上 kPCA.m 函数可从以下网址下载：http: //www.mathworks.com/matlabcentral/fileexchange/39715-kernel-pca-and-pre-image-reconstruction/content/kPCA_v1.0/code/kPCA.m

在上面的实现中，我想知道如何从 9 个特征中找到 4 个特征来选择（即哪些顶级特征是最佳的）进行聚类。

或者，我还尝试了 KPCA 实施的以下功能：

options.KernelType = 'Gaussian';
options.t = 1;
options.ReducedDim = 4;
[eigvector, eigvalue] = KPCA(feature', options);

在上述实现中，我在从 9 个功能集中确定 4 个顶级/最佳功能时也遇到了同样的问题。

以上KPCA.m函数可以从： http: //www.cad.zju.edu.cn/home/dengcai/Data/code/KPCA.m下载

如果有人能帮助我为我的问题实现内核 PCA，那就太好了。

谢谢

score 5 · Accepted Answer

PCA doesn't provide optimal features per se. What it provides is a new set of features that are uncorrelated. When you select the "best" 4 features, you are picking the ones that have the greatest variance (largest eigenvalues). So for "normal" PCA, you simply select the 4 eigenvectors corresponding to the 4 largest eigenvalues, then you project the original 9 features onto those eigenvectors via matrix multiplication.

From the link you provided for the kernel PCA function, the return value Y2 appears to be the original data transformed to the top d features of the kernel-PCA space, so the transformation is already done for you.

matlab - 使用内核 PCA (KPCA) 进行特征选择

1 回答 1

Related

Reference