python - Manipulating csv files in python

Question

I have been trying to do the following in a long csv file with three columns:

for every row, getting the max and min of the entries of the previous 250 rows. The data is like this - column 1 is an index (1-5300), column 2 is where the data is and column 3 is another one, but not used here. This is the code I have till now. Note that 'i' is the row index which looks at column 1. Column 2 is where the data is stored (i.e. the data whose max and min I want).

The problem I have is that the csv.reader starts at the end of the file always and throws the whole algorithm out of the window. Don't know what I am doing wrong. Please help

max1 = 0
min1 = 1000000    

i = 3476
f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
    serial          = int(col[0])
    if serial <i-250:
        spyy = float(col[1])
        print spyy

    for j in range(0,250):
        spyy = float(col[1])          
        max1 = max(max1,spyy)
        min1 = min(min1,spyy)
        file_reader.next()
        #print spyy

f1.close()

print 'max =' +str(max1) + 'min = ' + str(min1)

score 1 · Accepted Answer

In your code, this line

for col in file_reader:

is actually iterating through the lines or rows of the file, not the columns

and for each col, you later advance the reader 250 lines in this code

for j in range(0,250):
    spyy = float(col[1]) # here you're grabbing the same second item 250 times
    max1 = max(max1,spyy) # setting the new max to the same value 250 times
    min1 = min(min1,spyy) # setting the new min to the same value 250 times
    file_reader.next() # now you advance, but col is the same so ...
    # it's like you're skipping 250 lines

this means that each row stored in col is actually 250 lines after the previous row stored in col. It's like your skipping through the file in steps of 250.

I rewrote it, based on what you said you wanted to do. See if this makes more sense:

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)

spyy_values = []
mins = []
maxes = []

# just saying 'for x in file_reader' is all you need to iterate through the rows
# you don't need to use file_reader.next()
# here I'm also using the enumerate() function
# which automatically returns an index for each row
for row_index, row in enumerate(file_reader):
    # get the value
    spyy_values.append( float(row[1]) )

    if row_index >= 249:
        # get the min of the last 250 values,
        # including this line
        this_min = min(spyy_values[-250:])
        mins.append(this_min)
        # get the max of the last 250 values,
        # including this line
        this_max = max(spyy_values[-250:])
        maxes.append(this_max)

print "total max:", max(maxes)
print "total min:", min(mins)
print "you have %s max values" % len(maxes)
print "you have %s min values" % len(mins)
print "here are the maxes", maxes
print "here are the mins", mins

Keep in mind that csv.reader is an iterator, so the for loop will automatically advance through each line. Check out the example in the documentation.

score 0 · Accepted Answer

Seems like you are doing file_reader.next() at the incorrect place. As per the code you posted, file_reader.next() would get executed within the inner FOR loop, this may be the reason it ends up at the EOF after processing the first column itself.

The correct code would be:

max1 = 0
min1 = 1000000    

i = 3476
f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
    serial          = int(col[0])
    if serial <i-250:
        spyy = float(col[1])
        print spyy

    for j in range(0,250):
        spyy = float(col[1])          
        max1 = max(max1,spyy)
        min1 = min(min1,spyy)
# you move to the next row after processing the current row
file_reader.next()
 #print spyy

f1.close()

print 'max =' +str(max1) + 'min = ' + str(min1)

Let me know if this works

score 0 · Accepted Answer

Since your first two columns are numbers this may help you. You may readlines and split by "," on your own. (Just a workaround).

Use

file_reader=  open('PUT/PUT_SELLING.csv').readlines()
for line in file_reader:
    col = line.split(",")
    serial          = int(col[0])

in place of

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
for col in file_reader:
   serial          = int(col[0])

score 0 · Accepted Answer

f1=  open('PUT/PUT_SELLING.csv')
file_reader = csv.reader(f1)
which_str = raw_input('Comma seperated list of indices to show: ')
which_to_show = [int(i) for i in which_str.split(',')]
vals = []
for cols in file_reader:  # This will iteratate the rows
    vals.append(float(col[1]))  # Accumulate the results
    index = int(cols[0])
    if index > 249:      # enough to show min,max
        mini = (min(vals))  # add to vals
        maxi = (max(vals))
        del vals[0]  # remove the first entry
    if index in which_to_show:
         print 'index %d min=%f max=%f' % (index, mini, maxi)  # Format vals

f1.close()

python - Manipulating csv files in python

4 回答 4

Related

Reference