python - 在python数据表中切换列位置

Question

在python数据表中切换两列位置的最有效方法是什么？我写了下面的函数来做我想要的，但这可能不是最好的方法，特别是如果我的实际表很大。有可能做到这一点吗？我错过了一些明显的东西吗？

from datatable import Frame
dat = Frame(a=[1,2,3],b=[4,5,6],c=[7,8,9])

def switch_cols(data,col1,col2):
    data_n = list(data.names)
    data_n[data.colindex(col1)], data_n[data.colindex(col2)] =  data_n[data.colindex(col2)], data_n[data.colindex(col1)]
    return data[:, data_n]

dat = switch_cols(dat, "c","a")

   |     c      b      a
   | int32  int32  int32
-- + -----  -----  -----
 0 |     7      4      1
 1 |     8      5      2
 2 |     9      6      3
[3 rows x 3 columns]

为了在 R 中进行比较，我们可以这样做

dat = data.table(a=c(1,2,3), b=c(4,5,6), c=c(7,8,9))
switch_cols <- function(data,col1,col2) {
  indexes = which(names(dat) %in% c(col1,col2))
  datn = names(dat)
  datn[indexes] <- datn[c(indexes[2], indexes[1])]
  return(datn)
}

然后，我们可以像这样就地更改两列的顺序

setcolorder(dat, switch_cols(dat,"a","c"))

请注意，将值分配给每一列并不是我想要的。考虑这个例子，在 R 中。我构造了一个大的 data.table，如下所示：

dat = data.table(
  x = rnorm(10000000),
  y = sample(letters, 10000000, replace = T)
)

我制作了这个 data.table 的两份副本，d然后e

e = copy(dat)
d = copy(dat)

然后我比较这两个就地操作

setcolorder（只需重新索引 data.table 中两列的位置）
:=重新分配两列

microbenchmark::microbenchmark(
  list=alist("setcolorder" =  setcolorder(d, c("y", "x")),
             "`:=`" = e[,`:=`(x=y, y=x)]),
  times=1)

Unit: microseconds
        expr     min      lq    mean  median      uq     max neval
 setcolorder    81.5    81.5    81.5    81.5    81.5    81.5     1
        `:=` 53691.1 53691.1 53691.1 53691.1 53691.1 53691.1     1

正如预期的那样，setcolorder是在 R 中切换列位置的正确方法data.table。我在 python 中寻找类似的方法。

score 1 · Accepted Answer

我检查了它的文档后找到了一个方法

from datatable import Frame,f,update
dat = Frame(a=[1,2,3],b=[4,5,6],c=[7,8,9])

dat[:,update(a = f.c, c = f.a)]

在 R 中，您可以类似地执行此操作

dat[,`:=`(a = c, c = a)]

score 1 · Accepted Answer

经过一些考虑和时间安排，我发现最好的方法是：

from datatable import Frame
dat = Frame(a=[1,2,3],b=[4,5,6],c=[7,8,9])

   |     a      b      c
   | int32  int32  int32
-- + -----  -----  -----
 0 |     1      4      7
 1 |     2      5      8
 2 |     3      6      9
[3 rows x 3 columns]



def switch_cols(data,col1,col2):
    return data[:, [col1 if c==col2 else col2 if c==col1 else c for c in data.names]]

switch_cols(dat, "a","c")

   |     c      b      a
   | int32  int32  int32
-- + -----  -----  -----
 0 |     7      4      1
 1 |     8      5      2
 2 |     9      6      3
[3 rows x 3 columns]

python - 在python数据表中切换列位置

2 回答 2

Related

Reference