0

I'm using the following loop to run an iterative ODBC query on chunks of 50,000 unique numbers taken in succession from a greater list of close to one million records. (Background: I need data from an ODBC source, but the source is too big to pull down, and I don't have write access to the DB. I know this is a hacky workaround, but I haven't found a way around it - bear with me.) "Key" below is the field I would join on to pull from the ODBC.

    for (i in 0:n) 
    {
    batch <- data.frame(key[(50000*i)+1:50000*(i+1),])
    *(other stuff)*
    }

I expected this to iterate on i to give me dynamic record ranges. I.e. for i=0, 1:50000; for i=1, 50001:100000. This works fine for the first iteration - where i=0 - but at higher values of i I noticed that the script is actually skipping individual rows, where the number of rows skipped is equal to i. So, where i=10, it'll start at row 500,000 of the base data set, but the second and third records will be rows 500,010 and 500,020 from the base set.

I'm sure this means that R is misreading some piece of my script, but I can't find the error (/I'm not experienced enough for it to jump out.)

Any thoughts? Alternatively, if there are other ways to go about this I'd love to hear them...

Thanks for reading.

4

1 回答 1

1

实际上,我认为误解了 R. ;) 试试这个:

((i * 50000) + 1):((i+1) * 50000)

范围从i0 到n.

于 2013-06-06T22:37:11.307 回答