您可以简单地重用对新数据的估计器可用.fit()
的调用。
这将是首选,尤其是在时间序列中,因为信号会发生变化,并且您不希望将较旧的非代表性数据理解为潜在的正常(或异常)。
如果旧数据很重要,您可以简单地将旧的训练数据和新的输入信号数据连接在一起,然后.fit()
再次调用。
另请注意,根据 sklearn 文档,它比使用joblib
更好pickle
具有以下资源的MRE:
# Model
from sklearn.ensemble import IsolationForest
# Saving file
import joblib
# Data
import numpy as np
# Create a new model
model = IsolationForest()
# Generate some old data
df1 = np.random.randint(1,100,(100,10))
# Train the model
model.fit(df1)
# Save it off
joblib.dump(model, 'isf_model.joblib')
# Load the model
model = joblib.load('isf_model.joblib')
# Generate new data
df2 = np.random.randint(1,500,(1000,10))
# If the original data is now not important, I can just call .fit() again.
# If you are using time-series based data, this is preferred, as older data may not be representative of the current state
model.fit(df2)
# If the original data is important, I can simply join the old data to new data. There are multiple options for this:
# Pandas: https://pandas.pydata.org/pandas-docs/stable/user_guide/merging.html
# Numpy: https://numpy.org/doc/stable/reference/generated/numpy.concatenate.html
combined_data = np.concatenate((df1, df2))
model.fit(combined_data)