问题标签 [pyspark-pandas]

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

0 投票
1 回答
52 浏览

pyspark - 'DataFrame' 对象没有属性 'to_delta'

我的代码曾经工作。为什么我的代码不再工作了?我更新到较新的 Databricks 运行时 10.2,所以我不得不更改一些早期的代码以在 pyspark 上使用 pandas。

我得到的错误是'DataFrame' object has no attribute 'to_delta'

0 投票
0 回答
11 浏览

pyspark - 使用 Pyspark 将多个数组列拆分为多行

我有一个数据框,它有一行和几列。一些列是单个值,而其他列是列表。所有列表列的长度相同。我想将每个列表列拆分为单独的行,同时保留任何非列表列

0 投票
0 回答
12 浏览

apache-spark - Pandas UDF for pyspark - Package not found error

I am using the pandas UDF approach to scale my models. However, I am getting an error with the pmdarima package not found. The code works fine till I run it on my notebook on the pandas dataframe itself. So the package is available for use in the notebook. From few answers online, the error seems in package not being available on the worker nodes where the code is trying to parallelize. Can someone help on how to resolve this? How can I also install the package on my worker nodes, if that's the case.

FYI - I am working on Azure Databricks.