python - IndexError：索引过多

Question

我正在尝试在 scikit-learn 中使用一种算法来根据多个输入来预测输出。我似乎收到错误“索引过多”返回，但无法弄清楚原因。

CSV 文件培训：

 1.1    0.2 0.1 0   0.12    0.1
 1.4    0.2 0.1 0.1 0.14    0.1
 0.1    0.1 0.1 0   0.26    0.1
 24.5   0.1 0   0.1 0.14    0.1
 0.1    0.1 0.1 0   0.25    0.1

代码：

    fileCSVTraining = genfromtxt('TrainingData.csv', delimiter=',', dtype=None)

    #Define first 6 rows of data as the features
    t = fileCSVTraining[:, 6:]

    #Define which column to put prediction in
    r = fileCSVTraining[:, 0-6:]    
    #Create and train classifier 
    x, y = r, t
    clf = LinearSVC()
    clf = clf.fit(x, y)     
    #New data to predict
    X_new = [1.0, 2.1, 3.0, 2.4, 2.1]
    b = clf.predict(X_new)

错误：

 t = fileCSVTraining[:, 6:]
 IndexError: too many indices

score 4 · Accepted Answer

根据评论，我认为您想要：

fileCSVTraining = genfromtxt('TrainingData.csv')

然后，要获得“前 6 行”，您将使用

t = fileCSVTraining[:6, :]

（我假设您的实际数据文件比您显示的要长。您的示例只有 5 行。）

我怀疑您使用数组索引来获取r也不正确。

score 2 · Accepted Answer

请打印您的x和y变量，您可能会看到数据无效的原因并相应地进行修复。

最后一行：

X_new = [1.0, 2.1, 3.0, 2.4, 2.1]
b = clf.predict(X_new)

应该：

X_new = [[1.0, 2.1, 3.0, 2.4, 2.1]]
b = clf.predict(X_new)

正如 predict 期望的样本集合（的 2D 数组(n_new_samples, n_features)），而不是单个样本。

score 0 · Accepted Answer

获取 r 和 t 的数组索引不正确。使用：

  t = fileCSVTraining[:, 1-0:]

给了我所需的训练数据，留下了预测栏。

score 0 · Accepted Answer

指定 dtype=float 也很重要，因为“无”将允许将整数（如果您的数据中有任何整数）包含在数组中，这将强制一维数组而不是二维数组。如图所示，索引不适用于一维。

python - IndexError：索引过多

4 回答 4

Related

Reference