我正在处理这个数据集(我已经清理了它,没有缺失值)。
Area No. of Bedrooms Resale latitude longitude price Alaknanda Badarpur Bharat Vihar Bindapur Burari Chattarpur Chittaranjan Park Delhi Delhi Meerut Expressway Dwarka Mor Dwarka More Govindpuri Greater Kailash Hari Nagar Jamia Nagar Jasola Kalkaji Kamla Nagar Mahavir Enclave Mansa Ram Park Mayur Vihar Mayur Vihar II Model Town Mundka Munirka New Ashok Nagar Noida Road Okhla Om Nagar Om Vihar Palam Paschim Vihar Pitampura Preet Vihar Punjabi Bagh Rohini Sector 9 Rohini sector 24 Roop Nagar Sainik Farms Saket Sarita Vihar Sector 10 Dwarka Sector 11 Dwarka Sector 12 Dwarka Sector 13 Dwarka Sector 13 Rohini Sector 17 Dwarka Sector 18A Dwarka Sector 19 Dwarka Sector 2 Dwarka Sector 22 Dwarka Sector 22 Rohini Sector 23 Dwarka Sector 23 Rohini Sector 24 Rohini Sector 3 Dwarka Sector 4 Dwarka Sector 5 Dwarka Sector 6 Dwarka Sector 7 Dwarka Sector 9 Dwarka Sector-18 Dwarka Shahdara Shanti Park Dwarka Shastri Nagar Uttam Nagar Vasant Kunj Vikas Puri West End West Punjabi Bagh nawada
0 1200 2 1 28.584311 77.057693 105.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 1000 3 0 28.619074 77.056686 60.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
2 1350 2 1 28.528574 77.288331 150.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 435 2 0 28.619074 77.056686 25.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 900 3 0 28.619310 77.033279 58.0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4993 540 2 1 28.603176 77.063060 25.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4994 540 2 1 28.603176 77.063060 30.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4995 415 1 1 28.544790 77.051083 26.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4996 415 1 1 28.544790 77.051083 55.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4997 900 3 1 28.619074 77.056686 42.0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4157 rows × 77 columns
应用随机森林回归器后效果不佳,因此我决定缩放特征 - (卧室面积转售纬度经度)和目标变量 - (价格)
但是在执行缩放之后:
from sklearn.preprocessing import StandardScaler
def scaleColumns(df, cols_to_scale):
for col in cols_to_scale:
scaler = StandardScaler()
df[col] = pd.DataFrame(scaler.fit_transform(df[col].values.reshape((-1,1))))
df
return df
scaled_df = scaleColumns(df,['Area', 'No. of Bedrooms', 'latitude', 'longitude', 'price'])
scaled_df
我明白了:
Area No. of Bedrooms Resale latitude longitude price Alaknanda Badarpur Bharat Vihar Bindapur Burari Chattarpur Chittaranjan Park Delhi Delhi Meerut Expressway Dwarka Mor Dwarka More Govindpuri Greater Kailash Hari Nagar Jamia Nagar Jasola Kalkaji Kamla Nagar Mahavir Enclave Mansa Ram Park Mayur Vihar Mayur Vihar II Model Town Mundka Munirka New Ashok Nagar Noida Road Okhla Om Nagar Om Vihar Palam Paschim Vihar Pitampura Preet Vihar Punjabi Bagh Rohini Sector 9 Rohini sector 24 Roop Nagar Sainik Farms Saket Sarita Vihar Sector 10 Dwarka Sector 11 Dwarka Sector 12 Dwarka Sector 13 Dwarka Sector 13 Rohini Sector 17 Dwarka Sector 18A Dwarka Sector 19 Dwarka Sector 2 Dwarka Sector 22 Dwarka Sector 22 Rohini Sector 23 Dwarka Sector 23 Rohini Sector 24 Rohini Sector 3 Dwarka Sector 4 Dwarka Sector 5 Dwarka Sector 6 Dwarka Sector 7 Dwarka Sector 9 Dwarka Sector-18 Dwarka Shahdara Shanti Park Dwarka Shastri Nagar Uttam Nagar Vasant Kunj Vikas Puri West End West Punjabi Bagh nawada
0 -0.156044 -0.846368 1 0.146719 0.197107 -0.154917 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
1 -0.361197 0.327590 0 0.154070 0.197058 -0.245661 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
2 -0.002180 -0.846368 1 0.134931 0.208280 -0.064172 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
3 -0.940754 -0.846368 0 0.154070 0.197058 -0.316239 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4 -0.463774 0.327590 0 0.154120 0.195924 -0.249694 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4993 NaN NaN 1 NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4994 NaN NaN 1 NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4995 NaN NaN 1 NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4996 NaN NaN 1 NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4997 NaN NaN 1 NaN NaN NaN 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
4157 rows × 77 columns
许多值现在变成了 NaN。我怎样才能解决这个问题?