I'm looking into chunking my data source for optimial data import into solr and was wondering if it was possible to use a master url that chunked data into sections.
For example File 1 may have
<chunks>
<chunk url="http://localhost/chunker?start=0&stop=100" />
<chunk url="http://localhost/chunker?start=100&stop=200" />
<chunk url="http://localhost/chunker?start=200&stop=300" />
<chunk url="http://localhost/chunker?start=300&stop=400" />
<chunk url="http://localhost/chunker?start=400&stop=500" />
<chunk url="http://localhost/chunker?start=500&stop=600" />
</chunks>
with each chunk url leading to something like
<items>
<item data1="info1" />
<item data1="info2" />
<item data1="info3" />
<item data1="info4" />
</iems>
I'm working with 500+ million records so I think that the data will need to be chunked to avoid memory issues (ran into that when using the SQLEntityProcessor). I would also like to avoid making 500+ Million web requests as that could get expensive I think