python - 如何在 python 数据表 h2oai 中删除重复项

Question

python中的数据表包（https://github.com/h2oai/datatable/）可以计算一列中唯一值的数量，有没有办法用这个包删除重复值或者我必须使用慢熊猫包裹？

score 5 · Accepted Answer

如果要在单个列中查找唯一值，则可以使用 function dt.unique()，它接受一列并返回一个新列，其中包含原始列中的所有唯一值：

>>> import datatable as dt
>>> DT = dt.Frame(A=[1, 3, 2, 1, 4, 2, 1], B=list("ABCDEFG"))
>>> dt.unique(DT["A"])
   |  A
-- + --
 0 |  1
 1 |  2
 2 |  3
 3 |  4

[4 rows x 1 column]

另一方面，如果您有一个多列框架，并且您只想在其中一列中保留具有唯一值的行，那么这相当于按该列分组，并且可以这样处理：

>>> from datatable import f, by, first
>>> DT[:, first(f[1:]), by(f[0])]
   |  A  B 
-- + --  --
 0 |  1  A 
 1 |  2  C 
 2 |  3  B 
 3 |  4  E 

[4 rows x 2 columns]

python - 如何在 python 数据表 h2oai 中删除重复项

1 回答 1

Related

Reference