问题标签 [pyspark-pandas]
For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.
pyspark - 'DataFrame' 对象没有属性 'to_delta'
我的代码曾经工作。为什么我的代码不再工作了?我更新到较新的 Databricks 运行时 10.2,所以我不得不更改一些早期的代码以在 pyspark 上使用 pandas。
我得到的错误是'DataFrame' object has no attribute 'to_delta'
apache-spark - Pandas UDF for pyspark - Package not found error
I am using the pandas UDF approach to scale my models. However, I am getting an error with the pmdarima package not found. The code works fine till I run it on my notebook on the pandas dataframe itself. So the package is available for use in the notebook. From few answers online, the error seems in package not being available on the worker nodes where the code is trying to parallelize. Can someone help on how to resolve this? How can I also install the package on my worker nodes, if that's the case.
FYI - I am working on Azure Databricks.