performance - Pytables time performance

Question

I'm working on a project related to text detection in natural images. I have to train a classifier and for that i'm using Pytables to store information. I have:

62 classes (a-z,A-Z,0-9)
Each class has between 100 and 600 tables
Each table has 1 single column to store a 32bit Float
Each column has between 2^2 and 2^8 rows (depending on parameters)

My problem is that after I train the classifier, it takes a lot of time to read the information in the test. For example: One database has 27900 tables (62 classes * 450 tables per class) and there are 4 rows per table , it took aprox 4hs to read and retrieve all the information I need. The test program read each table 390 times (for classes A-Z, a-z) and 150 times for classes 0-9 to get all the info I need. Is that normal? I tried to use the index option for the unique column , but I dont see any performance. I work on a VirtualMachine with 2GB Ram on a HP Pavillion Dv6 (4GB Ram DDR3, Core2 Duo).

score 0 · Accepted Answer

这可能是因为表上的列查找是您可以执行的较慢的操作之一，而这是您所有信息的所在。您有两个基本选项来提高具有多列和少行的表的性能：

旋转此结构，使您拥有一个多行少列的表格。
移动到更高效的数据结构，如每行/列的 CArray 或 EArray。

此外，您可以尝试使用压缩来加快速度。这是一种通用的建议，因为您没有包含任何代码。

performance - Pytables time performance

1 回答 1

Related

Reference