ios - OCR：图像到文本？

Question

在标记为复制或重复问题之前，请先阅读整个问题。

我现在能做的如下：

获取图像并裁剪 OCR 所需的部分。
tesseract使用和处理图像leptonica。
当应用的文档被裁剪成块，即每张图像 1 个字符时，它提供了 96% 的准确度。
如果我不这样做并且文档背景为白色而文本为黑色，则它提供几乎相同的准确性。

例如，如果输入是这张照片：

照片开始

在此处输入图像描述

照片结束

我想要的是能够在在此处输入图像描述
不生成块的情况下为这张照片获得相同的精度。

我用来初始化 tesseract 并从图像中提取文本的代码如下：

对于 tesseract 的初始化

在 .h 文件中

tesseract::TessBaseAPI *tesseract;
uint32_t *pixels;

在 .m 文件中

tesseract = new tesseract::TessBaseAPI();
tesseract->Init([dataPath cStringUsingEncoding:NSUTF8StringEncoding], "eng");
tesseract->SetPageSegMode(tesseract::PSM_SINGLE_LINE);
tesseract->SetVariable("tessedit_char_whitelist", "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ");
tesseract->SetVariable("language_model_penalty_non_freq_dict_word", "1");
tesseract->SetVariable("language_model_penalty_non_dict_word ", "1");
tesseract->SetVariable("tessedit_flip_0O", "1");
tesseract->SetVariable("tessedit_single_match", "0");
tesseract->SetVariable("textord_noise_normratio", "5");
tesseract->SetVariable("matcher_avg_noise_size", "22");
tesseract->SetVariable("image_default_resolution", "450");
tesseract->SetVariable("editor_image_text_color", "40");
tesseract->SetVariable("textord_projection_scale", "0.25");
tesseract->SetVariable("tessedit_minimal_rejection", "1");
tesseract->SetVariable("tessedit_zero_kelvin_rejection", "1");

从图像中获取文本

- (void)processOcrAt:(UIImage *)image
{
    [self setTesseractImage:image];

    tesseract->Recognize(NULL);
    char* utf8Text = tesseract->GetUTF8Text();
    int conf = tesseract->MeanTextConf();

    NSArray *arr = [[NSArray alloc]initWithObjects:[NSString stringWithUTF8String:utf8Text],[NSString stringWithFormat:@"%d%@",conf,@"%"], nil];

    [self performSelectorOnMainThread:@selector(ocrProcessingFinished:)
                           withObject:arr
                        waitUntilDone:YES];
    free(utf8Text);
}

- (void)ocrProcessingFinished0:(NSArray *)result
{
    UIAlertView *alt = [[UIAlertView alloc]initWithTitle:@"Data" message:[result objectAtIndex:0] delegate:self cancelButtonTitle:nil otherButtonTitles:@"OK", nil];
   [alt show];
}

但是我没有得到正确的车牌图像输出，或者它是空的，或者它为图像提供了一些垃圾数据。

如果我使用第一个图像，即白色背景，文本为黑色，那么输出的准确率是 89% 到 95%。

请帮帮我。

任何建议将不胜感激。

更新

感谢@jcesar 提供链接，也感谢@konstantin pribluda 提供有价值的信息和指导。

我能够将图像转换为正确的黑白形式（几乎）。因此所有图像的识别效果都更好:)

需要帮助进行正确的图像二值化。任何想法将不胜感激

score 6 · Accepted Answer

Hi all Thanks for your replies, from all of that replies I am able to get this conclusion as below:

I need to get the only one cropped image block with number plate contained in it.
From that plate need to find out the portion of the number portion using the data I got using the method provided here.
Then converting the image data to almost black and white using the RGB data found through the above method.
Then the data is converted to the Image using the method provided here.

Above 4 steps are combined in to one method like this as below :

-(void)getRGBAsFromImage:(UIImage*)image
{
    NSInteger count = (image.size.width * image.size.height);
    // First get the image into your data buffer
    CGImageRef imageRef = [image CGImage];
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    unsigned char *rawData = (unsigned char*) calloc(height * width * 4, sizeof(unsigned char));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef context = CGBitmapContextCreate(rawData, width, height,
                                                 bitsPerComponent, bytesPerRow, colorSpace,
                                                 kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);

    CGContextDrawImage(context, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(context);

    // Now your rawData contains the image data in the RGBA8888 pixel format.
    int byteIndex = 0;
    for (int ii = 0 ; ii < count ; ++ii)
    {
        CGFloat red   = (rawData[byteIndex]     * 1.0) ;
        CGFloat green = (rawData[byteIndex + 1] * 1.0) ;
        CGFloat blue  = (rawData[byteIndex + 2] * 1.0) ;
        CGFloat alpha = (rawData[byteIndex + 3] * 1.0) ;

        NSLog(@"red %f \t green %f \t blue %f \t alpha %f rawData [%d] %d",red,green,blue,alpha,ii,rawData[ii]);
        if(red > Required_Value_of_red || green > Required_Value_of_green || blue > Required_Value_of_blue)//all values are between 0 to 255
        {
            red = 255.0;
            green = 255.0;
            blue = 255.0;
            alpha = 255.0;
            // all value set to 255 to get white background.
        }
        rawData[byteIndex] = red;
        rawData[byteIndex + 1] = green;
        rawData[byteIndex + 2] = blue;
        rawData[byteIndex + 3] = alpha;

        byteIndex += 4;
    }

    colorSpace = CGColorSpaceCreateDeviceRGB();
    CGContextRef bitmapContext = CGBitmapContextCreate(
                                                       rawData,
                                                       width,
                                                       height,
                                                       8, // bitsPerComponent
                                                       4*width, // bytesPerRow
                                                       colorSpace,
                                                       kCGImageAlphaNoneSkipLast);

    CFRelease(colorSpace);

    CGImageRef cgImage = CGBitmapContextCreateImage(bitmapContext);

    UIImage *img = [UIImage imageWithCGImage:cgImage];

    //use the img for further use of ocr

    free(rawData);
}

Note:

The only drawback of this method is the time consumed and the RGB value to convert to white and other to black.

UPDATE :

    CGImageRef imageRef = [plate CGImage];
    CIContext *context = [CIContext contextWithOptions:nil]; // 1
    CIImage *ciImage = [CIImage imageWithCGImage:imageRef]; // 2
    CIFilter *filter = [CIFilter filterWithName:@"CIColorMonochrome" keysAndValues:@"inputImage", ciImage, @"inputColor", [CIColor colorWithRed:1.f green:1.f blue:1.f alpha:1.0f], @"inputIntensity", [NSNumber numberWithFloat:1.f], nil]; // 3
    CIImage *ciResult = [filter valueForKey:kCIOutputImageKey]; // 4
    CGImageRef cgImage = [context createCGImage:ciResult fromRect:[ciResult extent]];
    UIImage *img = [UIImage imageWithCGImage:cgImage];

Just replace the above method's(getRGBAsFromImage:) code with this one and the result is same but the time taken is just 0.1 to 0.3 second only.

score 4 · Accepted Answer

我能够使用提供的演示照片以及生成正确的字母来实现近乎即时的结果。

我使用GPUImage对图像进行了预处理

// Pre-processing for OCR
GPUImageLuminanceThresholdFilter * adaptiveThreshold = [[GPUImageLuminanceThresholdFilter alloc] init];
[adaptiveThreshold setThreshold:0.3f];
[self setProcessedImage:[adaptiveThreshold imageByFilteringImage:_image]];

然后将处理后的图像发送到 TESS

- (NSArray *)processOcrAt:(UIImage *)image {
    [self setTesseractImage:image];

    _tesseract->Recognize(NULL);
    char* utf8Text = _tesseract->GetUTF8Text();

    return [self ocrProcessingFinished:[NSString stringWithUTF8String:utf8Text]];
}

- (NSArray *)ocrProcessingFinished:(NSString *)result {
    // Strip extra characters, whitespace/newlines
    NSString * results_noNewLine = [result stringByReplacingOccurrencesOfString:@"\n" withString:@""];
    NSArray * results_noWhitespace = [results_noNewLine componentsSeparatedByCharactersInSet:[NSCharacterSet whitespaceCharacterSet]];
    NSString * results_final = [results_noWhitespace componentsJoinedByString:@""];
    results_final = [results_final lowercaseString];

    // Separate out individual letters
    NSMutableArray * letters = [[NSMutableArray alloc] initWithCapacity:results_final.length];
    for (int i = 0; i < [results_final length]; i++) {
        NSString * newTile = [results_final substringWithRange:NSMakeRange(i, 1)];
        [letters addObject:newTile];
    }

    return [NSArray arrayWithArray:letters];
}

- (void)setTesseractImage:(UIImage *)image {
    free(_pixels);

    CGSize size = [image size];
    int width = size.width;
    int height = size.height;

    if (width <= 0 || height <= 0)
        return;

    // the pixels will be painted to this array
    _pixels = (uint32_t *) malloc(width * height * sizeof(uint32_t));
    // clear the pixels so any transparency is preserved
    memset(_pixels, 0, width * height * sizeof(uint32_t));

    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();

    // create a context with RGBA pixels
    CGContextRef context = CGBitmapContextCreate(_pixels, width, height, 8, width * sizeof(uint32_t), colorSpace,
                                                 kCGBitmapByteOrder32Little | kCGImageAlphaPremultipliedLast);

    // paint the bitmap to our context which will fill in the pixels array
    CGContextDrawImage(context, CGRectMake(0, 0, width, height), [image CGImage]);

    _tesseract->SetImage((const unsigned char *) _pixels, width, height, sizeof(uint32_t), width * sizeof(uint32_t));
}

这为 - 留下了 ' 标记，但这些也很容易删除。根据您拥有的图像集，您可能需要对其进行微调，但它应该能让您朝着正确的方向前进。

如果您在使用它时遇到问题，请告诉我，它来自我正在使用的一个项目，我不想剥离所有内容或从头开始创建一个项目。

score 1 · Accepted Answer

我敢说 tesseract 对你的目的来说太过分了。您不需要字典匹配来提高识别质量（您没有这本字典，但可能意味着计算许可证号的校验和），并且您已经为 OCR 优化了字体。最重要的是，您有标记（附近的橙色和蓝色区域很好）来查找图像中的区域。

在我的 OCR 应用程序中，我使用人工辅助的兴趣区域检索（只是瞄准帮助覆盖在相机预览上）。通常使用 haar cascade 之类的东西来定位有趣的特征，比如人脸。您也可以计算橙色区域的质心，或者只是通过遍历所有图像并存储合适颜色的最左边/最右边/最顶部/最底部像素来计算橙色像素的边界框

至于识别本身，我建议使用不变矩（不确定是否在 tesseract 中实现，但您可以轻松地将其从 java 项目中移植：http: //sourceforge.net/projects/javaocr/）

我在监视器图像上尝试了我的演示应用程序，它识别了运动中的数字（未接受字符训练）

至于二值化（将黑色与白色分开），我会推荐 sauvola 方法，因为这可以提供最佳的亮度变化容差（也在我们的 OCR 项目中实施）

ios - OCR：图像到文本？

3 回答 3

Related

Reference