2

我正在为一个必须完成一些事情的项目用python编写代码;1) 逐列从 xls 文件中读取数据 2) 以三组为一组平均每行的列 3) 然后平均结果列

我已经完成了 1 和 2,但似乎无法获得 3,我认为我遇到的很多麻烦都源于我使用浮点数,但是我需要将数字保留到小数点后 6 位。感谢任何帮助和耐心,我对python很陌生

v = open("Pt_2_Test_Data.xls", 'wb') #created file to write output to
w = open("test2.xls")

count = 0

for row in w: #read in file
    for line in w:
        columns = line.split("\t") #split up into columns
        date = columns[0]
        time = columns[1]
        a = columns[2]
        b = columns[3]
        c = columns[4]
        d = columns[5]
        e = columns[6]
        f = columns[7]
        g = columns[8]
        h = columns[9]
        i = columns[10]
        j = columns[11]
        k = columns[12]
        l = columns[13]
        m = columns[14]
        n = columns[15]
        o = columns[16]
        p = columns[17]
        q = columns[18]
        r = columns[19]
        s = columns[20]
        t = columns[21]
        u = columns[22]
        LZA = columns[23]
        SZA = columns[24]
        LAM = columns[25]

        count += 1

        A = 0
        if count != 0:  # gets rid of column tiles
            filter1 = ((float(a) + float(b) + float(c))/3)
            filter1 = ("%.6f" %A)
            filter2 =  (float(d) + float(e) + float(f))/3
            filter2 = ("%.6f" %filter2)
            filter3 =  (float(g) + float(h) + float(i))/3
            filter3 = ("%.6f" %filter3)
            filter4 =  (float(j) + float(k) + float(l))/3
            filter4 = ("%.6f" %filter4)
            filter5 =  (float(m) + float(n) + float(o))/3
            filter5 = ("%.6f" %filter5)
            filter6 =  (float(p) + float(q) + float(r))/3
            filter6 = ("%.6f" %filter6)
            filter7 =  (float(s) + float(t) + float(u))/3
            filter7 = ("%.6f" %filter7)
            A = [filter1, filter2, filter3, filter4, filter5, filter6, filter7]
            A = ",".join(str(x) for x in A).join('[]')

            print A
            avg = [float(sum(col))/float(len(col)) for col in zip(*A)]
            print avg

我也尝试过像这样格式化数据:

            A = ('{0}    {1}    {2}     {3}    {4}    {5}    {6}    {7}    {8}'.format(date, time, float(filter1), float(filter2), float(filter3), float(filter4), float(filter5), float(filter6), float(filter7))+'\n') # average of triplets
            print A

认为我可以访问每列的值并通过像使用字典时那样调用它们来对它们进行必要的数学运算,但这并不成功:它似乎将数据识别为一行(因此尝试访问任何列by [0] 超出范围)或单个字符,而不是数字列表。这与使用浮动功能有关吗?

4

3 回答 3

1

您可以使用该decimal模块显示确切的数字。

from decimal import *
getcontext().prec = 6 # sets the precision to 6

请注意,使用了浮点数,这意味着:

print(Decimal(1)/(Decimal(7)) # 0.142857
print(Decimal(100)/(Decimal(7)) # results in 14.2857

这意味着您可能需要将精度设置为更高的值以获得小数点后 6 位...例如:

from decimal import *
getcontext().prec = 28
print("{0:.6f}".format(Decimal(100) / Decimal(7))) # 14.285714

为了完整回答您的问题,您能否解释一下您要寻找的平均值是多少?所有(21)列的平均值?你可以发布一些 test_data.xls 吗?

于 2013-07-05T14:49:31.100 回答
1

我不确定我是否了解您要在 3) 中平均哪些列,但也许这可以满足您的要求:

with open("test2.xls") as w:
    w.next()  # skip over header row
    for row in w:
        (date, time, a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t,
         u, LZA, SZA, LAM) = row.split("\t")  # split columns into fields

        A = [(float(a) + float(b) + float(c))/3,
             (float(d) + float(e) + float(f))/3,
             (float(g) + float(h) + float(i))/3,
             (float(j) + float(k) + float(l))/3,
             (float(m) + float(n) + float(o))/3,
             (float(p) + float(q) + float(r))/3,
             (float(s) + float(t) + float(u))/3]
        print ('['+ ', '.join(['{:.6f}']*len(A)) + ']').format(*A)
        avg = sum(A)/len(A)
        print avg

您可以使用如下代码更简洁地执行相同的操作:

avg = lambda nums: sum(nums)/float(len(nums))

with open("test2.xls") as w:
    w.next()  # skip over header row
    for row in w:
        cols = row.split("\t")  # split into columns
        # then split that into fields
        date, time, values, LZA, SZA, LAM = (cols[0], cols[1],
                                             map(float, cols[2:23]), 
                                             cols[23], cols[24], cols[25])
        A = [avg(values[i:i+3]) for i in xrange(0, 21, 3)]
        print ('['+ ', '.join(['{:.6f}']*len(A)) + ']').format(*A)
        print avg(A)
于 2013-07-05T15:05:28.543 回答
0

我会考虑使用 numpy. 我不确定如何读取 xls 文件,但似乎有提供此功能的软件包。我会做这样的事情:

import numpy as np

with open("test2.txt") as f:
    for row in f:
        # row is a string, split on tabs, but ignore the values that
        # don't go into the average.  If you need to keep those you 
        # might want to look into genfromtxt and defining special datatypes
        data = (np.array(row.split('\t')[2:23])).astype(np.float)
        # split the data array into 7 separate arrays (3 columns each) and average on those
        avg = np.mean(np.array_split(data,7))
        print avg

我不确定上面的 avg 是否正是您想要的。您可能需要保存较小的数组 ( smallArrays = np.array_split(data,7)),然后对其进行迭代,计算平均值。

即使这不是您想要的,我也建议您查看 numpy. 我发现它非常易于使用,并且在您尝试进行计算时非常有用。

于 2013-07-05T19:05:57.423 回答