I am new to spark. I need to execute a function myfunc()
in parallel and then just append all the generated dataframes.
Currently i am using for loop which I guess runs in sequence. How can I improve it?
import databricks.koalas as ks
appended_data=[]
for path in paths_list:
data = myfunc(path)
appended_data.append(data)
appended_data = ks.concat(appended_data)