python - numpy：切掉 2 列

Question

考虑以下数据：

61  1  1 15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  1  2 14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  1  3 18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

前三列是年、月和日。
其余 12 列是当天一个国家 12 个地点的平均风速（节）。

我想要做的是丢失第 2 列和第 3 列（索引 1 和 2），以便获得以下数据：

61  15.04 14.96 13.17  9.29 13.96  9.87 13.67 10.25 10.83 12.58 18.50 15.04
61  14.71 16.88 10.83  6.50 12.62  7.67 11.50 10.04  9.79  9.67 17.54 13.83
61  18.50 16.88 12.33 10.13 11.17  6.17 11.25  8.04  8.50  7.67 12.75 12.71

以下工作，但我不喜欢它，因为如果我在数据中有很多列（即很多位置），它不会扩展。

import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = data[:,[0,3,4,5,6,7,8,9,10,11,12,13,14]]

是否可以在不枚举列号的情况下实现它？我可以通过切片来实现吗？

score 2 · Accepted Answer

您可以使用轻松生成索引数组r_。

In [165]: np.r_[0,3:15]                                                                  
Out[165]: array([ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

在幕后它只是在做

In [166]: np.concatenate([[0],np.arange(3,15)])                                          
Out[166]: array([ 0,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14])

np.delete，虽然方便，但最终的工作量相似。根据删除索引，它将连接片段或构建选择掩码。

无论采用哪种方法，结果都是一个新数组，其中包含所需数据的副本（不是视图）。

loadtxt接受作为usecols采用类似列索引数组的参数。

score 1 · Accepted Answer

您可以为此使用np.delete[numpy-doc]，并使用slice对象作为参数来删除：

>>> np.delete(data, slice(1, 3), 1)
array([[61.  , 15.04, 14.96, 13.17,  9.29, 13.96,  9.87, 13.67, 10.25,
        10.83, 12.58, 18.5 , 15.04],
       [61.  , 14.71, 16.88, 10.83,  6.5 , 12.62,  7.67, 11.5 , 10.04,
         9.79,  9.67, 17.54, 13.83],
       [61.  , 18.5 , 16.88, 12.33, 10.13, 11.17,  6.17, 11.25,  8.04,
         8.5 ,  7.67, 12.75, 12.71]])

当您使用切片符号时，您基本上是在传递一个slice对象。确实a[1:3]相当于a[slice(1,3)]。

此外，1此处指定了我们要删除的维度。由于我们希望删除第二维的数据，因此我们将其写1为第三个参数。

score 1 · Accepted Answer

这应该有效：

import numpy as np
data = np.loadtxt('wind.data')
data_nomonth_noday = np.zeros((data.shape[0],data.shape[1]-2))
data_nomonth_noday[:,0] = data[:,0]
data_nomonth_noday[:,1:] = data[:,3:]

在我看来，这比其他一些可能的方法更具可读性、灵活性和直观性

score 0 · Accepted Answer

如果a是您的numpy数组并且您想要删除列：1,2，您可以在一行中使用以下内容。

import numpy as np

delete_cols = [1,2] # list of column numbers to delete
a[:,list(set(np.arange(a.shape[-1])) - set(delete_cols))]

一些解释

您在这里需要的是正确索引数组a。

# list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
a[:, list_of_column_numbers]

您可以通过list_of_column_numbers以下方式之一制作：

# Method-1: Direct Declaration
list_of_column_numbers = [0, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

# Method-2A: Using Set and Dropping Columns not Needed
# a.shape[-1] = 15
delete_cols = [1,2] # list of column numbers to delete
list_of_column_numbers = list(set(np.arange(a.shape[-1])) - set(delete_cols))

# Method-2B: Make list of column numbers
# a.shape[-1] = 15
list_of_column_numbers = [0] + np.arange(3,a.shape[-1]).tolist()

python - numpy：切掉 2 列

4 回答 4

一些解释

Related

Reference