2

I'm looking for the equivalent to the vlookup function in excel. I have a script where I read in a csv file. I would like to be able to query an associated value from another column in the .csv. Script so far:

import matplotlib
import matplotlib.mlab as mlab
import glob

for files in glob.glob("*.csv"):
    print files

    r = mlab.csv2rec(files)
    r.cols = r.dtype.names

    depVar = r[r.cols[0]]
    indVar = r[r.cols[1]]
    print indVar

This will read in from .csv files in the same folder the script is in. In the above example depVar is the first column in the .csv, and indVar is the second column. In my case, I know a value for indVar, and I want to return the associated value for depVar. I'd like to add a command like:

depVar = r[r.cols[0]]
indVar = r[r.cols[1]]
print indVar
depVarAt5 = lookup value in depVar where indVar = 5 (I could sub in things for the 5 later)

In my case, all values in all fields are numbers and all of the values of indVar are unique. I want to be able to define a new variable (depVarAt5 in last example) equal to the associated value.

Here's example .csv contents, name the file anything and place it in same folder as script. In this example, depVarAt5 should be set equal to 16.1309.

Temp,Depth
16.1309,5
16.1476,94.4007
16.2488,100.552
16.4232,106.573
16.4637,112.796
16.478,118.696
16.4961,124.925
16.5105,131.101
16.5462,137.325
16.7016,143.186
16.8575,149.101
16.9369,155.148
17.0462,161.187
4

3 回答 3

1

我认为这可以直接解决您的问题:

import numpy
import glob

for f in glob.glob("*.csv"):
    print f

    r = numpy.recfromcsv(f)
    print numpy.interp(5, r.depth, r.temp)

我很确定 numpy 是 matplotlib 的先决条件。

于 2013-06-01T06:55:42.057 回答
0

Not sure what that r object is, but since it has a member called cols, I'm going to assume it also has a member called rows which contains the row data.

>>> r.rows
[[16.1309, 5], [16.1476, 94.4007], ...]

In that case, your pseudocode very nearly contains a valid generator expression/list comprehension.

depVarAt5 = lookup value in depVar where indVar = 5 (I could sub in things for the 5 later)

becomes

depVarAt5 = [row[0] for row in r.rows if row[1] == 5]

Or, more generally

depVarValue = [row[depVarColIndex] for row in r.rows if row[indVarColIndex] == searchValue]

so

def vlookup(rows, searchColumn, dataColumn, searchValue):
    return [row[dataColumn] for row in rows if row[searchColumn] == searchValue]

Throw a [0] on the end of that if you can guarantee there will be exactly one output per input.

There's also a csv module in the Python standard libary which you might prefer to work with. =)

于 2013-06-01T02:54:24.480 回答
0

对于任意排序和完全匹配,您可以使用返回的索引indVar.index()depVar为其编制索引。

如果indVar是有序的并且(嗯,“或”,有点)你需要最接近的匹配,那么你应该看看使用bisecton indVar

于 2013-06-01T07:27:24.383 回答