We're working on a feature to allow our users to import their own customer/marketing database into our system from a CSV file they upload to our servers. We're using PHP on Ubuntu 10.04 on Amazon EC2 backed by MySQL on Amazon RDS.
What we've currently got is a script that uses LOAD DATA LOCAL INFILE but it's somewhat slow, and will be very slow when real users start uploading CSV files with 100,000+ rows.
We do have an automation server that runs several tasks in the background to support out application, so maybe this is something that's handed over to that server (or group of servers)? So a user would upload a CSV file, we'd stick it in an S3 bucket and either drop a line in a database somewhere linking that file to a given user, or use SQS or something to let the automation server know to import it, then we just tell the user their records are importing and will show up gradually over the next few minutes/hours?
Has anybody else had any experience with this? Is my logic right or should we be looking in a entirely different direction?
Thanks in advance.