python-polars - 聚类具有相同值的行而不进行排序

Question

按特定列排序将这些列下具有相同元组的所有行聚集在一起。我想用相同的值对所有行进行聚类，但保持组的第一个成员出现的顺序相同。

像这样的东西：

import polars as pl

df = pl.DataFrame(dict(x=[1,0,1,0], y=[3,1,2,4]))

df.cluster('x')
# shape: (4, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1   ┆ 2   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0   ┆ 1   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0   ┆ 4   │
# └─────┴─────┘

score 0 · Accepted Answer

这可以通过以下方式完成：

临时存储行索引
将行索引设置为窗口内感兴趣列上的最小值
按该最小索引排序
删除临时行索引列

import polars as pl

df = pl.DataFrame(dict(x=[1,0,1,0], y=[3,1,2,4]))

(
df
  .with_column(pl.arange(0, pl.count()).alias('_index'))
  .with_column(pl.min('_index').over('x'))
  .sort('_index')
  .drop('_index')
)
# shape: (4, 2)
# ┌─────┬─────┐
# │ x   ┆ y   │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1   ┆ 3   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1   ┆ 2   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0   ┆ 1   │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 0   ┆ 4   │
# └─────┴─────┘

python-polars - 聚类具有相同值的行而不进行排序

1 回答 1

Related

Reference