1

We're working on a feature to allow our users to import their own customer/marketing database into our system from a CSV file they upload to our servers. We're using PHP on Ubuntu 10.04 on Amazon EC2 backed by MySQL on Amazon RDS.

What we've currently got is a script that uses LOAD DATA LOCAL INFILE but it's somewhat slow, and will be very slow when real users start uploading CSV files with 100,000+ rows.

We do have an automation server that runs several tasks in the background to support out application, so maybe this is something that's handed over to that server (or group of servers)? So a user would upload a CSV file, we'd stick it in an S3 bucket and either drop a line in a database somewhere linking that file to a given user, or use SQS or something to let the automation server know to import it, then we just tell the user their records are importing and will show up gradually over the next few minutes/hours?

Has anybody else had any experience with this? Is my logic right or should we be looking in a entirely different direction?

Thanks in advance.

4

1 回答 1

1

My company does exactly that, via cron.

We allow the user to upload a CSV, which is then sent to a directory to wait. A cron running every 5 minutes checks a database entry that is made on upload, which records the user, file, date/time, etc. If a file that has not been parsed is found in the DB, it accesses the file based on the filename, checks to ensure the data is valid, runs USPS address verification, and finally puts it in the main user database.

We have similarly setup functions to send large batches of emails, model abstractions of user cross-sections, etc. All in all, it works quite well. Three servers can adequately handle millions of records, with tens of thousands being loaded per import.

于 2012-06-01T20:38:13.080 回答