python - 如何从python中的数据表中选择除一（或两）之外的所有列

Question

在R data.table中，我可以像这样排除列

library(data.table)

foo <- data.table(x = c(1,2,3), y = c(4, 5, 6), z = c(7, 8, 9))
print(foo)
   x y z
1: 1 4 7
2: 2 5 8
3: 3 6 9

# exclude one column
foo[, !"x"]
   y z
1: 4 7
2: 5 8
3: 6 9

# exclude two columns
foo[, !c("x", "y")]
   z
1: 7
2: 8
3: 9

我如何在Python 数据表中做同样的事情？

import datatable as dt # v 1.0.0

foo = dt.Frame({'x': [1,2,3], 'y': [4,5,6], 'z': [7,8,9]})
print(foo)
   |     x      y      z
   | int32  int32  int32
-- + -----  -----  -----
 0 |     1      4      7
 1 |     2      5      8
 2 |     3      6      9
[3 rows x 3 columns]

# exclude one column
foo[:, !"x"]    # error
foo[:, !["x"]]  # error
foo[:, !f.x]    # error

# exclude two columns

编辑

抱歉没有明确说明这一点，但我知道明显的解决方案 - 做一个包含声明，我建立一个我想要包含并使用的所有列的列表，而不是一个排除声明，我使用一个列表要排除的列。但是，我发现包含技术笨拙/繁琐，而且读写起来不太自然。因此，我特意寻求一种排除解决方案，例如在data.table.

score 4 · Accepted Answer

这些方法列在有关如何取消选择行/列的文档中。

列出构建方法

names过滤掉值的理解：

cols = ["x"]
foo_filtered = foo[:, [name for name in foo.names if name not in cols]]

或filter等价物：

cols = ["x"]
foo_filtered = foo[:, list(filter(lambda n: n not in cols, foo.names))]

带有布尔值列表：

cols = ["x"]
foo_filtered = foo[:, [n not in cols for n in foo.names]]

或map等价物：

cols = ["x"]
foo_filtered = foo[:, list(map(lambda n: n not in cols, foo.names))]

foo_filtered：

   |     y      z
   | int32  int32
-- + -----  -----
 0 |     4      7
 1 |     5      8
 2 |     6      9
[3 rows x 2 columns]

排除方法

remove也可以使用，但是，这仅限于列选择：

from datatable import f


foo[:, f[:].remove(f['x', 'y'])]

   |     z
   | int32
-- + -----
 0 |     7
 1 |     8
 2 |     9
[3 rows x 1 column]

此方法并非来自文档。

子类方法

如果掩盖列表理解是目标，我们可以为其制作一个包装器tuple，可用于向names. 即，使用xor（或其他所需的操作）成为“排除运算符”。

（我之所以选择列，是xor因为pandas用于允许此运算符排除列的列。）

tuple然后可以将此自定义与的子类一起使用dt.Frame来包装属性names：

from __future__ import annotations

import datatable as dt  # v 1.0.0


class TableColumns(tuple):
    def __xor__(self, other) -> TableColumns:
        if isinstance(other, str):
            output = [n != other for n in self]
        elif any(isinstance(other, i) for i in [tuple, list, set]):
            output = [n not in other for n in self]
        else:
            raise TypeError(
                f'Unsupported type {type(other)} used to filter names'
            )
        return TableColumns(output)


class MyFrame(dt.Frame):
    @property
    def names(self) -> TableColumns:
        return TableColumns(super(MyFrame, self).names)

那么过滤就可以自然而然地完成了：

框架构造器（使用子类）：

foo = MyFrame({'x': [1, 2, 3], 'y': [4, 5, 6], 'z': [7, 8, 9]})

过滤List：

foo_filtered = foo[:, foo.names ^ ['x', 'z']]

Tuple：

foo_filtered = foo[:, foo.names ^ ('x', 'z')]

Set：

foo_filtered = foo[:, foo.names ^ {'x', 'z'}]

foo_filtered：

   |     y
   | int32
-- + -----
 0 |     4
 1 |     5
 2 |     6
[3 rows x 1 column]

排除单个列str：

foo_filtered = foo[:, foo.names ^ 'x']

   |     y      z
   | int32  int32
-- + -----  -----
 0 |     4      7
 1 |     5      8
 2 |     6      9
[3 rows x 2 columns]

score 1 · Accepted Answer

使用列表推导的可能解决方案：

>>> cols_to_exclude = ['x']
>>> print(foo[:, [col for col in foo.names if col not in cols_to_exclude]])
   |     z      y
   | int32  int32
-- + -----  -----
 0 |     7      4
 1 |     8      5
 2 |     9      6
[3 rows x 2 columns]

score 1 · Accepted Answer

您可以使用数据表中的布尔值选择数据，因此可以完成列表推导来为列选择生成所需的布尔值。

在这种情况下，那将是foo[:, ["x" not in name for name in foo.names]]

>>> import datatable as dt
>>> foo = dt.Frame({'x': [1,2,3], 'y': [4,5,6], 'z': [7,8,9]})
>>> foo[:, ["x" not in name for name in foo.names]]
   |     y      z
   | int32  int32
-- + -----  -----
 0 |     4      7
 1 |     5      8
 2 |     6      9
[3 rows x 2 columns]

python - 如何从python中的数据表中选择除一（或两）之外的所有列

编辑

3 回答 3

列出构建方法

排除方法

子类方法

Related

Reference