好的,所以我正在尝试为机器学习项目清理数据。我正在使用 Z-Score 进行异常值检测。数据库包含不同类型的玻璃(从 1 到 7),我想遍历每种玻璃类型,找到异常值并将它们替换为给定类型玻璃(“Na”列)中所含钠的平均值。奇怪的是该算法适用于玻璃类型 1 和 2,但当涉及类型 3 时,它会给出 ValueError。你们知道似乎是什么问题吗?
z = stats.zscore(DataFrame.Na)
threshold = 1.99
for t in DataFrame.Type.unique():
z = stats.zscore(DataFrame.Na[DataFrame.Type==t])
print([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]])
DataFrame.Na[DataFrame.Type==t] = DataFrame.Na[DataFrame.Type==t].replace([DataFrame.Na[DataFrame.Type==t][(np.abs(z) > threshold)]],np.mean(DataFrame.Na[DataFrame.Type==t]))
输出是:
[17 14.36
21 14.77
Name: Na, dtype: float64]
[70 14.86
105 11.45
106 10.73
108 14.43
110 11.23
111 11.02
Name: Na, dtype: float64]
[149 12.16
Name: Na, dtype: float64]
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
/opt/conda/lib/python3.7/site-packages/ipykernel_launcher.py:9: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
if __name__ == '__main__':
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/opt/conda/lib/python3.7/site-packages/pandas/core/indexes/base.py in get_loc(self, key, method, tolerance)
2897 try:
-> 2898 return self._engine.get_loc(casted_key)
2899 except KeyError as err:
KeyError: 0
你们中的任何人都知道这可能有什么问题吗?如果您需要任何其他信息,我会提供,考虑了大约 2 个小时,我不知道...