在寻找最佳解决方案后,我决定依靠一种将图像编码为数据/字节的解决方法。这消除了从谷歌驱动器读取 URL 的问题,我找不到解决方案。正如我所怀疑的那样,以数据/字节编码图像会使 HTML 变大,然而,令人惊讶的是(对我来说)加载速度一点也不慢。我想这是我想做的最好的事情。
在下面的示例中,get_data
函数获取图像的数据/字节。Altair
我将它放入作为输入的数据框的列中。
def plot_(images_from):
import altair as alt
import pandas as pd
import numpy as np
np.random.seed(0)
n_objects = 20
n_times = 50
# Create one (x, y) pair of metadata per object
locations = pd.DataFrame({
'id': range(n_objects),
'x': np.random.randn(n_objects),
'y': np.random.randn(n_objects)
})
def get_data(p):
import base64
with open(p, "rb") as f:
return "data:image/jpeg;base64,"+base64.b64encode(f.read()).decode()
import urllib.request
if images_from=='url':
l1=[f"https://vega.github.io/vega-datasets/data/{k}.png" for k in ['ffox','7zip','gimp']]
elif images_from=='data':
l1=[get_data(urllib.request.urlretrieve(f"https://vega.github.io/vega-datasets/data/{k}.png",f'/tmp/{k}.png')[0]) for k in ['ffox','7zip','gimp']]
np.random.seed(0)
locations['img']=np.random.choice(l1, size=len(locations))
# Create a 50-element time-series for each object
timeseries = pd.DataFrame(np.random.randn(n_times, n_objects).cumsum(0),
columns=locations['id'],
index=pd.RangeIndex(0, n_times, name='time'))
# Melt the wide-form timeseries into a long-form view
timeseries = timeseries.reset_index().melt('time')
# Merge the (x, y) metadata into the long-form view
timeseries['id'] = timeseries['id'].astype(int) # make merge not complain
data = pd.merge(timeseries, locations, on='id')
# Data is prepared, now make a chart
selector = alt.selection_single(empty='none', fields=['id'])
base = alt.Chart(data).properties(
width=250,
height=250
).add_selection(selector)
points = base.mark_point(filled=True, size=200).encode(
x='mean(x)',
y='mean(y)',
color=alt.condition(selector, 'id:O', alt.value('lightgray'), legend=None),
)
timeseries = base.mark_line().encode(
x='time',
y=alt.Y('value', scale=alt.Scale(domain=(-15, 15))),
color=alt.Color('id:O', legend=None)
).transform_filter(
selector
)
images=base.mark_image(filled=True, size=200).encode(
x='x',
y='y',
url='img',
).transform_filter(
selector
)
chart=points | timeseries | images
chart.save(f'test/chart_images_{images_from}.html')
# generate htmls
plot_(images_from='url') # generate the HTML using URLs
plot_(images_from='data') # generate the HTML using data/bytes
使用这些数据生成的 HTML 比使用 URL 生成的 HTML 大约大 78 倍(~12Mb vs ~0.16Kb),但没有明显变慢。
更新:我后来发现谷歌网站不允许嵌入超过 1Mb 大小的 HTML 文件。所以最后,对图像进行编码并没有真正的帮助。