Find centralized, trusted content and collaborate around the technologies you use most.
Teams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
如何在pyspark中对数据框进行过采样?
df.sample(fractions, seed)
仅对 df 的一小部分进行采样,不能过采样。
您可以通过使用示例方法进行过度采样,如下所示:
df.sample(withReplacement=True, total_percent_of_upsample, seed) sample(withReplacement, fraction, seed=None)
True表示您要进行替换采样。
True