2

I'm working on a project related to text detection in natural images. I have to train a classifier and for that i'm using Pytables to store information. I have:

  • 62 classes (a-z,A-Z,0-9)

  • Each class has between 100 and 600 tables

  • Each table has 1 single column to store a 32bit Float

  • Each column has between 2^2 and 2^8 rows (depending on parameters)

    My problem is that after I train the classifier, it takes a lot of time to read the information in the test. For example: One database has 27900 tables (62 classes * 450 tables per class) and there are 4 rows per table , it took aprox 4hs to read and retrieve all the information I need. The test program read each table 390 times (for classes A-Z, a-z) and 150 times for classes 0-9 to get all the info I need. Is that normal? I tried to use the index option for the unique column , but I dont see any performance. I work on a VirtualMachine with 2GB Ram on a HP Pavillion Dv6 (4GB Ram DDR3, Core2 Duo).

4

1 回答 1

0

这可能是因为表上的列查找是您可以执行的较慢的操作之一,而这是您所有信息的所在。您有两个基本选项来提高具有多列和少行的表的性能:

  1. 旋转此结构,使您拥有一个多行少列的表格。

  2. 移动到更高效的数据结构,如每行/列的 CArray 或 EArray。

此外,您可以尝试使用压缩来加快速度。这是一种通用的建议,因为您没有包含任何代码。

于 2013-07-21T23:18:53.397 回答