0

我想在使用 MinMaxScaler 对缩放数据集进行 K 均值聚类后恢复我的数据,这是我的代码示例

copy_df=scaled_df.copy()
kmeans = KMeans(n_clusters=3, random_state=42)
kmeans.fit(features)
copy_df['Cluster'] = kmeans.predict(features)

缩放器被保存;我试过类似的东西:x = scaler.inverse_transform(x)

与我的 scaled_df (簇号)相比,我的 copy_df 应该多一列

我想这就是为什么我得到:

ValueError: operands could not be broadcast together with shapes (3,5) (4,) (3,5) 

我怎样才能恢复我的数据?

我需要获取集群的真实数据或每个特征的平均值。

4

1 回答 1

0

There is a mismatch between the shape the MinMaxScaler() expected (based on the fit) and what you provided after the clustering (which has one more column the cluster membership). You could assign the cluster labels directly to the original data or if you really need to do the inverse then you could do it by first inverse_transform the scaled data again and then add the cluster labels to it. The both result in the same dataframe.

# Import the packages
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans

# Load the data
data = pd.DataFrame(load_iris()['data'])

# Initialize a scaler
scaler = MinMaxScaler()

# Perform scaling
data_scaled = pd.DataFrame(scaler.fit_transform(data))

# Initialize KMeans clustering
kmeans = KMeans(n_clusters=3, random_state=42)

# Obtain the clusters
clusters = kmeans.fit_predict(data_scaled)

# Add the cluster labels to the original data
data['clusters'] = clusters

OR

# Inverse the scaling and add the cluster labels as a new column
data_invscaled = pd.DataFrame(scaler.inverse_transform(data_scaled.iloc[:, 0:4]))
data_invscaled['clusters'] = clusters

# Check whether the two dfs are equal --> None means that the two dfs are equal
print(pd.testing.assert_frame_equal(data, data_invscaled, check_dtype=False))
 
于 2022-01-08T10:11:15.837 回答