python - 如何将矩阵拆分为训练测试数据，同时确保训练矩阵的行和列中至少存在一个值？

Question

我想将稀疏矩阵随机拆分为相同维度的训练和测试数据，同时确保训练集中没有充满零的列或行。

为了使我的算法正常工作，我需要在训练集的每一行和每一列中至少有一个值。

我曾尝试使用这个库函数： from sklearn.model_selection import train_test_split

例如给定矩阵：

[[0, 1, 3, 1],
[0, 0, 0, 1],
[8, 0, 0, 1]]

可以拆分矩阵以生成此训练矩阵：

[[0, 1, 0, 1],
[0, 0, 0, 0],
[0, 0, 0, 8]]

其中第二行仅包含 0。我怎样才能避免这种情况？

score 0 · Accepted Answer

from sklearn.model_selection import KFold 
import numpy as np 

# Create some dummy data
X = np.array([[1, 2], [3, 4], [1, 2], [3, 4], [0, 0])

# Remove rows having all of their columns equal to 0
X = X[~np.all(X == 0, axis=1)]

# Assuming 2-fold cross-validation
kf = KFold(n_splits=2)
kf.get_n_splits(X)

现在kf有两个训练/测试折叠：

for training, testing in kf.split(X):
    X_train, X_test = X[training], X[testing]

    # Do whatever you want with your model ...

    print(“Training:”, training, “Testing:”, testing)


>>> ('Training:', array([2, 3]), 'Testing:', array([0, 1]))
>>> ('Training:', array([0, 1]), 'Testing:', array([2, 3]))

python - 如何将矩阵拆分为训练测试数据，同时确保训练矩阵的行和列中至少存在一个值？

1 回答 1

Related

Reference