0

我想根据 python-datatable 语法中另一列中的正则表达式匹配将值分配给新列。

DT[通过正则表达式获取行,为新列赋值,]

import pandas as pd
import datatable as dt
from datatable import f, Frame
import re as re

DT = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
DT['new_col']=DT[:,f.b]
DT['new_col'] = Frame([re.sub('f.*','words starting with f', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()
DT['new_col'] = Frame([re.sub('c.*','words starting with c', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()

是否有另一种解决方案,无需在数据表包中使用“to_list()”等进行转换(没有循环)?

此处,此问题中正则表达式的结果不允许对整列进行操作: Python data.table row filter by regex This is for pandas but not datatable: How to filter rows in pandas by regex

4

1 回答 1

2

我认为现在您可以使用解决方案。并且随着数据表的增长,将查看所需的工具并将其添加到数据表中。

导入库

import pandas as pd
import datatable as dt
from datatable import f,by
import re as re

创建一个 DT

DT_X = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})

并进行所需的操作

DT_X[:,f[:].extend({'new_col':dt.Frame([re.sub('f.*','words starting with f', s) for s in DT_X[:, f.b].to_list()[0]])})]

输出:

  |  a  b    new_col              
-- + --  ---  ---------------------
 0 |  1  hi   hi                   
 1 |  2  foo  words starting with f
 2 |  3  fat  words starting with f
 3 |  4  cat  cat
于 2020-06-19T09:22:18.497 回答