python - Python Datatable/Pydatatable：如何通过正则表达式过滤数据表中的行并根据过滤器为新变量赋值

Question

我想根据 python-datatable 语法中另一列中的正则表达式匹配将值分配给新列。

DT[通过正则表达式获取行，为新列赋值，]

import pandas as pd
import datatable as dt
from datatable import f, Frame
import re as re

DT = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})
DT['new_col']=DT[:,f.b]
DT['new_col'] = Frame([re.sub('f.*','words starting with f', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()
DT['new_col'] = Frame([re.sub('c.*','words starting with c', s) for s in DT[:, "new_col"].to_list()[0]])
DT.head()

是否有另一种解决方案，无需在数据表包中使用“to_list()”等进行转换（没有循环）？

此处，此问题中正则表达式的结果不允许对整列进行操作： Python data.table row filter by regex This is for pandas but not datatable: How to filter rows in pandas by regex

score 2 · Accepted Answer

我认为现在您可以使用解决方案。并且随着数据表的增长，将查看所需的工具并将其添加到数据表中。

导入库

import pandas as pd
import datatable as dt
from datatable import f,by
import re as re

创建一个 DT

DT_X = dt.Frame({'a' : [1,2,3,4], 'b' : ['hi', 'foo', 'fat', 'cat']})

并进行所需的操作

DT_X[:,f[:].extend({'new_col':dt.Frame([re.sub('f.*','words starting with f', s) for s in DT_X[:, f.b].to_list()[0]])})]

输出：

  |  a  b    new_col              
-- + --  ---  ---------------------
 0 |  1  hi   hi                   
 1 |  2  foo  words starting with f
 2 |  3  fat  words starting with f
 3 |  4  cat  cat

python - Python Datatable/Pydatatable：如何通过正则表达式过滤数据表中的行并根据过滤器为新变量赋值

1 回答 1

Related

Reference