I'm working on a project related to text detection in natural images. I have to train a classifier and for that i'm using Pytables to store information. I have:
62 classes (a-z,A-Z,0-9)
Each class has between 100 and 600 tables
Each table has 1 single column to store a 32bit Float
Each column has between 2^2 and 2^8 rows (depending on parameters)
My problem is that after I train the classifier, it takes a lot of time to read the information in the test. For example: One database has 27900 tables (62 classes * 450 tables per class) and there are 4 rows per table , it took aprox 4hs to read and retrieve all the information I need. The test program read each table 390 times (for classes A-Z, a-z) and 150 times for classes 0-9 to get all the info I need. Is that normal? I tried to use the index option for the unique column , but I dont see any performance. I work on a VirtualMachine with 2GB Ram on a HP Pavillion Dv6 (4GB Ram DDR3, Core2 Duo).