I'm using the following loop to run an iterative ODBC query on chunks of 50,000 unique numbers taken in succession from a greater list of close to one million records. (Background: I need data from an ODBC source, but the source is too big to pull down, and I don't have write access to the DB. I know this is a hacky workaround, but I haven't found a way around it - bear with me.) "Key" below is the field I would join on to pull from the ODBC.
for (i in 0:n)
{
batch <- data.frame(key[(50000*i)+1:50000*(i+1),])
*(other stuff)*
}
I expected this to iterate on i to give me dynamic record ranges. I.e. for i=0, 1:50000; for i=1, 50001:100000. This works fine for the first iteration - where i=0 - but at higher values of i I noticed that the script is actually skipping individual rows, where the number of rows skipped is equal to i. So, where i=10, it'll start at row 500,000 of the base data set, but the second and third records will be rows 500,010 and 500,020 from the base set.
I'm sure this means that R is misreading some piece of my script, but I can't find the error (/I'm not experienced enough for it to jump out.)
Any thoughts? Alternatively, if there are other ways to go about this I'd love to hear them...
Thanks for reading.