python - 从 .csv 文件中获取数据

Question

我正在开发一个 python 项目，其中有一个 .csv 文件，如下所示：

freq,ae,cl,ota
825,1,2,3
835,4,5,6
850,10,11,12
880,22,23,24
910,46,47,48
960,94,95,96
1575,190,191,192
1710,382,383,384
1750,766,767,768

我需要在运行时快速从文件中获取一些数据。
举个例子：

我以 880MHz 的频率进行采样，我想对样本进行一些计算，并利用 .csv 文件的 880 行中的数据。

我通过使用频率冒号作为索引来做到这一点，然后只使用采样频率来获取数据，但棘手的部分是，如果我以 900MHz 进行采样，则会出现错误。我希望它获取上下最近的数据，在本例中为 880 和 910，从这些到行我将使用这些数据对 900MHz 的数据进行线性估计。

我的主要问题是如何快速搜索数据，如果不存在完美匹配，如何获取最近的两行？

score 3 · Accepted Answer

取之前的行/系列和之后的行

In [11]: before, after = df1.loc[:900].iloc[-1], df1.loc[900:].iloc[0]

In [12]: before
Out[12]:
ae     22
cl     23
ota    24
Name: 880, dtype: int64

In [13]: after
Out[13]:
ae     46
cl     47
ota    48
Name: 910, dtype: int64

在中间放一个空行并进行插值（编辑：默认插值只取两者的平均值，所以我们需要设置method='values'）：

In [14]: sandwich = pd.DataFrame([before, pd.Series(name=900), after])

In [15]: sandwich
Out[15]:
     ae  cl  ota
880  22  23   24
900 NaN NaN  NaN
910  46  47   48

In [16]: sandwich.apply(apply(lambda col: col.interpolate(method='values'))
Out[16]:
     ae  cl  ota
880  22  23   24
900  38  39   40
910  46  47   48

In [17]: sandwich.apply(apply(lambda col: col.interpolate(method='values')).loc[900]
Out[17]:
ae     38
cl     39
ota    40
Name: 900, dtype: float64

笔记：

df1 = pd.read_csv(csv_location).set_index('freq')

您可以将其包装在某种函数中：

def interpolate_for_me(df, n):
    if n in df.index:
        return df.loc[n]
    before, after = df1.loc[:n].iloc[-1], df1.loc[n:].iloc[0]
    sandwich = pd.DataFrame([before, pd.Series(name=n), after])
    return sandwich.apply(lambda col: col.interpolate(method='values')).loc[n]

score 0 · Accepted Answer

0

该bisect模块将在排序序列内执行二等分。

于 2013-05-17T17:26:04.253 回答

score 0 · Accepted Answer

import csv
import bisect

def interpolate_data(data, value):
    # check if value is in range of the data.
    if data[0][0] <= value <= data[-1][0]: 
        pos = bisect.bisect([x[0] for x in data], value)
        if data[pos][0] == value:
            return data[pos][0]
        else:
            prev = data[pos-1]
            curr = data[pos]
            factor = 1+(value-prev[0])/(curr[0]-prev[0])
            return [value]+[x*factor for x in prev[1:]]

with open("data.csv", "rb") as csvfile:
    f = csv.reader(csvfile)
    f.next() # remove the header
    data = [[float(x) for x in row] for row in f] # convert all to float

# test value 1200:
interpolate_data(data, 1200)
# = [1200, 130.6829268292683, 132.0731707317073, 133.46341463414632]

对我有用，而且很容易理解。

python - 从 .csv 文件中获取数据

3 回答 3

Related

Reference