3

我正在尝试根据此文档在Python 中使用Iris 数据集绘制边界线。LDAsklearn

对于二维数据LDA.coef_,我们可以使用和轻松绘制线条LDA.intercept_

但是对于已缩减为两个分量的多维数据LDA.coef_,和LDA.intercept有很多维度,我不知道如何使用这些维度在二维缩减维度图中绘制边界线。

我尝试仅使用 and 的前两个元素进行绘图LDA.coef_LDA.intercept但它没有用。

import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

iris = datasets.load_iris()

X = iris.data
y = iris.target 
target_names = iris.target_names  

lda = LinearDiscriminantAnalysis(n_components=2)
X_r2 = lda.fit(X, y).transform(X)

x = np.array([-10,10])
y_hyperplane = -1*(lda.intercept_[0]+x*lda.coef_[0][0])/lda.coef_[0][1]

plt.figure()
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

plt.plot(x,y_hyperplane,'k')

for color, i, target_name in zip(colors, [0, 1, 2], target_names):
    plt.scatter(X_r2[y == i, 0], X_r2[y == i, 1], alpha=.8, color=color, 
lw=lw,
                label=target_name)
plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.title('LDA of IRIS dataset')

plt.show()

边界线的结果由生成lda.coef_[0]lda.intercept[0]显示一条不太可能在两个类别之间分开的线

在此处输入图像描述

我试过使用 np.meshgrid 来绘制类的区域。但我收到这样的错误

ValueError: X 每个样本有 2 个特征;期待 4

它期望原始数据的 4 维,而不是来自网格网格的 2D 点。

4

1 回答 1

4

Linear discriminant analysis (LDA) can be used as a classifier or for dimensionality reduction.

LDA for dimensionality reduction

Dimensionality reduction techniques reduces the number of features. Iris dataset has 4 features, lets use LDA to reduce it to 2 features so that we can visualise it.

from sklearn import datasets
iris = datasets.load_iris()
X = iris.data
y = iris.target

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X = sc.fit_transform(X)

lda = LinearDiscriminantAnalysis(n_components=2)
lda_object = lda.fit(X, y)
X = lda_object.transform(X)

for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

Output: enter image description here

LDA for multi class classification

LDA does multi class classification using One-vs-rest. If you have 3 classes you will get 3 hyperplanes (decision boundaries) for each class. If there are n features then each hyperplane is represented using n weights (coefficients) and 1 intersect. In general

coef_ : shape of (n_classes, n_features)
intercept_ :  shape of (n_classes,)

Sample, documented inline

import matplotlib.pyplot as plt
import numpy as np
np.random.seed(13)

# Generate 3 linearly separable dataset of 2 features
X = [[0,0]]*25+[[0,10]]*25+[[10,10]]*25
X = np.array(list(map(lambda x: list(map(lambda y: np.random.randn()+y, x)), X)))
y = np.array([0]*25+[1]*25+[2]*25)

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
lda = LinearDiscriminantAnalysis()
lda_object = lda.fit(X, y)

# Plot the hyperplanes
for l,c,m in zip(np.unique(y),['r','g','b'],['s','x','o']):
    plt.scatter(X[y==l,0],
                X[y==l,1],
                c=c, marker=m, label=l,edgecolors='black')

x1 = np.array([np.min(X[:,0], axis=0), np.max(X[:,0], axis=0)])

for i, c in enumerate(['r','g','b']):
    b, w1, w2 = lda.intercept_[i], lda.coef_[i][0], lda.coef_[i][1]
    y1 = -(b+x1*w1)/w2    
    plt.plot(x1,y1,c=c)

enter image description here

As you can see each decision boundary separates one class from the rest (follow the color of the decision boundary)

You case

You have dataset which is of 4 features, so you cannot visualise the data as well as the decision boundary (human visualisation is limited only upto 3D). One approach is to use LDA and reduce the dimentions to 2D and then again using LDA to classify these 2D features.

于 2019-09-11T07:28:50.380 回答