ETL seems to be a pretty common task. I am basically reading some ETL mistakes which designers make with very large data on http://it.toolbox.com/blogs/infosphere/17-mistakes-that-etl-designers-make-with-very-large-data-19264
I need some practical insights for the following points
a) Incorporating Inserts, Updates, and Deletes in to the same data flow / same process.. How is that a problem?
b) Sourcing multiple systems at the same time, depending on heterogeneous systems of data.
c) Not producing the correct indexes on the sources/ lookups that need to be accessed.
d) Believing that ‘ I need to process all the data in one pass because it’s the fastest way to do it ‘</p>
Any help?