我创建了一个数据集,每个字母有 1 个生成的训练图像,每个字母大约有 10 个真实的测试图像。
它们都是 10x14px,黑白(在预处理阶段很好地二值化)。
生成的模型将所有符号识别为“1”(甚至是测试集中的实际图像),所以基本上它根本不起作用。
谁能指出我正确的方向?
这是 CreateML 输出 -
Extracting image features from full data set.
Analyzing and extracting image features.
+------------------+--------------+------------------+
| Images Processed | Elapsed Time | Percent Complete |
+------------------+--------------+------------------+
| 1 | 1.74s | 2.5% |
| 2 | 1.96s | 5.25% |
| 3 | 2.17s | 8% |
| 4 | 2.39s | 10.75% |
| 5 | 2.60s | 13.5% |
| 10 | 3.68s | 27% |
| 25 | 6.90s | 67.5% |
| 37 | 9.48s | 100% |
| 36 | 9.26s | 97.25% |
+------------------+--------------+------------------+
Skipping automatic creation of validation set; training set has fewer than 50 points.
Beginning model training on processed features.
Calibrating solver; this may take some time.
+-----------+--------------+-------------------+
| Iteration | Elapsed Time | Training Accuracy |
+-----------+--------------+-------------------+
| 0 | 0.038845 | 0.027027 |
| 1 | 0.139269 | 0.837838 |
| 2 | 0.268821 | 0.945946 |
| 3 | 0.317312 | 0.945946 |
| 4 | 0.367944 | 0.972973 |
| 5 | 0.422657 | 0.972973 |
| 10 | 0.713325 | 1.000000 |
| 24 | 1.495230 | 1.000000 |
+-----------+--------------+-------------------+
SUCCESS: Optimal solution found.
Extracting image features from evaluation data.
Analyzing and extracting image features.
+------------------+--------------+------------------+
| Images Processed | Elapsed Time | Percent Complete |
+------------------+--------------+------------------+
| 1 | 211.661ms | 0.25% |
| 2 | 425.538ms | 0.75% |
| 3 | 641.33ms | 1.25% |
| 4 | 861.215ms | 1.75% |
| 5 | 1.07s | 2.25% |
| 10 | 2.16s | 4.75% |
| 25 | 5.39s | 12% |
| 50 | 10.75s | 24% |
| 75 | 16.12s | 36% |
| 100 | 21.51s | 48% |
| 125 | 26.88s | 60% |
| 150 | 32.24s | 72% |
| 175 | 37.61s | 84% |
| 200 | 42.97s | 96% |
| 208 | 44.69s | 100% |
| 207 | 44.47s | 99.5% |
+------------------+--------------+------------------+
Trained model successfully saved at /mypath/ocr.mlmodel.