1

我正在使用 datashader 和 dask,但是在尝试使用正在运行的集群进行绘图时遇到问题。为了使其更具体,我有以下示例(嵌入在散景图中):

import holoviews as hv
import pandas as pd
import dask.dataframe as dd
import numpy as np
from holoviews.operation.datashader import datashade
import datashader.transfer_functions as tf

#initialize the client/cluster
cluster = LocalCluster(n_workers=4, threads_per_worker=1)
dask_client = Client(cluster)


def datashade_plot():
    hv.extension('bokeh')
    #create some random data (in the actual code this is a parquet file with millions of rows, this is just an example)
    delta = 1/1000
    x = np.arange(0, 1, delta)
    y = np.cumsum(np.sqrt(delta)*np.random.normal(size=len(x)))
    df = pd.DataFrame({'X':x, 'Y':y})

    #create dask dataframe
    points_dd = dd.from_pandas(df, npartitions=3)

    #create  plot
    points = hv.Curve(points_dd)
    return  hd.datashade(points)

dask_client.submit(datashade_plot,).result()

这提出了一个:

TypeError: can't pickle weakref objects

我有这样的理论,因为您无法在集群中分发数据阴影操作。抱歉,如果这是一个菜鸟问题,我将非常感谢您能给我的任何建议。

4

1 回答 1

2

我想你想走另一条路。也就是说,将 datashader 传递给 dask 数据帧而不是 pandas 数据帧:

>>> from dask import dataframe as dd
>>> import multiprocessing as mp
>>> dask_df = dd.from_pandas(df, npartitions=mp.cpu_count())
>>> dask_df.persist()
...
>>> cvs = datashader.Canvas(...)
>>> agg = cvs.points(dask_df, ...)

外部参考:https ://datashader.org/user_guide/Performance.html

于 2020-06-19T18:36:22.073 回答