python - 数据存储以简化 Python 中的数据插值

Question

我有 20 多个类似于表 1 的表。所有字母都代表实际值。

Table 1:
$ / cars |<1 | 2 | 3 | 4+
<10,000  | a | b | c | d
20,000   | e | f | g | h
30,000   | i | j | k | l
40,000+  | m | n | o | p

例如，用户输入可以是 (2.4, 24594)，它是 f、g、j 和 k 之间的值。我的 Python 函数定义和计算这个双线性插值的伪代码如下。

def bilinear_interpolation( x_in, y_in, x_high, x_low, y_low, y_high ):
   # interpolate with respect to x
   # interpolate with respect to y
   # return result

我应该如何存储表 1 中的数据（文件、字典、元组的元组或列表的字典），以便我可以最有效和正确地执行双线性插值？

score 7 · Accepted Answer

如果您想要我能想到的计算效率最高的解决方案并且不限于标准库，那么我会推荐 scipy/numpy。首先，将 a..p 数组存储为 2D numpy 数组，然后将 $4k-10k 和 1-4 数组存储为 1D numpy 数组。如果两个一维数组都是单调递增的，则使用 scipy 的 interpolate.interp1d ，如果不是，则使用 interpolate.bsplrep （双变量样条表示），并且您的示例数组与您的示例一样小。或者干脆自己写而不用 scipy。这里有些例子：

# this follows your pseudocode most closely, but it is *not*
# the most efficient since it creates the interpolation 
# functions on each call to bilinterp
from scipy import interpolate
import numpy
data = numpy.arange(0., 16.).reshape((4,4))  #2D array
prices = numpy.arange(10000., 50000., 10000.)
cars = numpy.arange(1., 5.)
def bilinterp(price,car):
    return interpolate.interp1d(cars, interpolate.interp1d(prices, a)(price))(car)
print bilinterp(22000,2)

我最后一次检查（2007 年的 scipy 版本）只适用于 x 和 y 的单调递增数组）

对于像这个 4x4 数组这样的小数组，我想你想使用这个： http ://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.bisplrep.html#scipy.interpolate.bisplrep 它将处理更有趣的形状表面和功能只需要创建一次。对于较大的数组，我认为您想要这个（不确定这是否与 interp1d 具有相同的限制）： http://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.interp2d.html#scipy。 interpolate.interp2d 但它们都需要与上面示例中的三个数组不同且更详细的数据结构。

score 3 · Accepted Answer

我会保留第一列的排序列表，并使用bisect标准库中的模块来查找值——这是获得立即较低和立即较高索引的最佳方法。每隔一列可以保留为与该列平行的另一个列表。

score 0 · Accepted Answer

双线性插值并没有什么特别之处，会让你的用例特别奇怪；您只需要进行两次查找（对于完整行/列的存储单元）或四次查找（对于数组类型的存储）。最有效的方法取决于您的访问模式和数据结构。

如果您的示例真正具有代表性，总共有 16 个条目，您可以根据需要存储它，它对于任何类型的正常加载都足够快。

python - 数据存储以简化 Python 中的数据插值

3 回答 3

Related

Reference