我正在查看 Apple 的Vision API 文档,我看到一些与文本检测相关的类UIImages
:
1)class VNDetectTextRectanglesRequest
看起来他们可以检测字符,但我看不到对字符做任何事情的方法。一旦你检测到字符,你会如何将它们变成可以被解释的东西NSLinguisticTagger
?
这是一篇简要概述Vision
.
感谢您的阅读。
我正在查看 Apple 的Vision API 文档,我看到一些与文本检测相关的类UIImages
:
1)class VNDetectTextRectanglesRequest
看起来他们可以检测字符,但我看不到对字符做任何事情的方法。一旦你检测到字符,你会如何将它们变成可以被解释的东西NSLinguisticTagger
?
这是一篇简要概述Vision
.
感谢您的阅读。
这是如何做到的......
//
// ViewController.swift
//
import UIKit
import Vision
import CoreML
class ViewController: UIViewController {
//HOLDS OUR INPUT
var inputImage:CIImage?
//RESULT FROM OVERALL RECOGNITION
var recognizedWords:[String] = [String]()
//RESULT FROM RECOGNITION
var recognizedRegion:String = String()
//OCR-REQUEST
lazy var ocrRequest: VNCoreMLRequest = {
do {
//THIS MODEL IS TRAINED BY ME FOR FONT "Inconsolata" (Numbers 0...9 and UpperCase Characters A..Z)
let model = try VNCoreMLModel(for:OCR().model)
return VNCoreMLRequest(model: model, completionHandler: self.handleClassification)
} catch {
fatalError("cannot load model")
}
}()
//OCR-HANDLER
func handleClassification(request: VNRequest, error: Error?)
{
guard let observations = request.results as? [VNClassificationObservation]
else {fatalError("unexpected result") }
guard let best = observations.first
else { fatalError("cant get best result")}
self.recognizedRegion = self.recognizedRegion.appending(best.identifier)
}
//TEXT-DETECTION-REQUEST
lazy var textDetectionRequest: VNDetectTextRectanglesRequest = {
return VNDetectTextRectanglesRequest(completionHandler: self.handleDetection)
}()
//TEXT-DETECTION-HANDLER
func handleDetection(request:VNRequest, error: Error?)
{
guard let observations = request.results as? [VNTextObservation]
else {fatalError("unexpected result") }
// EMPTY THE RESULTS
self.recognizedWords = [String]()
//NEEDED BECAUSE OF DIFFERENT SCALES
let transform = CGAffineTransform.identity.scaledBy(x: (self.inputImage?.extent.size.width)!, y: (self.inputImage?.extent.size.height)!)
//A REGION IS LIKE A "WORD"
for region:VNTextObservation in observations
{
guard let boxesIn = region.characterBoxes else {
continue
}
//EMPTY THE RESULT FOR REGION
self.recognizedRegion = ""
//A "BOX" IS THE POSITION IN THE ORIGINAL IMAGE (SCALED FROM 0... 1.0)
for box in boxesIn
{
//SCALE THE BOUNDING BOX TO PIXELS
let realBoundingBox = box.boundingBox.applying(transform)
//TO BE SURE
guard (inputImage?.extent.contains(realBoundingBox))!
else { print("invalid detected rectangle"); return}
//SCALE THE POINTS TO PIXELS
let topleft = box.topLeft.applying(transform)
let topright = box.topRight.applying(transform)
let bottomleft = box.bottomLeft.applying(transform)
let bottomright = box.bottomRight.applying(transform)
//LET'S CROP AND RECTIFY
let charImage = inputImage?
.cropped(to: realBoundingBox)
.applyingFilter("CIPerspectiveCorrection", parameters: [
"inputTopLeft" : CIVector(cgPoint: topleft),
"inputTopRight" : CIVector(cgPoint: topright),
"inputBottomLeft" : CIVector(cgPoint: bottomleft),
"inputBottomRight" : CIVector(cgPoint: bottomright)
])
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: charImage!, options: [:])
//SOME OPTIONS (TO PLAY WITH..)
self.ocrRequest.imageCropAndScaleOption = VNImageCropAndScaleOption.scaleFill
//FEED THE CHAR-IMAGE TO OUR OCR-REQUEST - NO NEED TO SCALE IT - VISION WILL DO IT FOR US !!
do {
try handler.perform([self.ocrRequest])
} catch { print("Error")}
}
//APPEND RECOGNIZED CHARS FOR THAT REGION
self.recognizedWords.append(recognizedRegion)
}
//THATS WHAT WE WANT - PRINT WORDS TO CONSOLE
DispatchQueue.main.async {
self.PrintWords(words: self.recognizedWords)
}
}
func PrintWords(words:[String])
{
// VOILA'
print(recognizedWords)
}
func doOCR(ciImage:CIImage)
{
//PREPARE THE HANDLER
let handler = VNImageRequestHandler(ciImage: ciImage, options:[:])
//WE NEED A BOX FOR EACH DETECTED CHARACTER
self.textDetectionRequest.reportCharacterBoxes = true
self.textDetectionRequest.preferBackgroundProcessing = false
//FEED IT TO THE QUEUE FOR TEXT-DETECTION
DispatchQueue.global(qos: .userInteractive).async {
do {
try handler.perform([self.textDetectionRequest])
} catch {
print ("Error")
}
}
}
override func viewDidLoad() {
super.viewDidLoad()
// Do any additional setup after loading the view, typically from a nib.
//LETS LOAD AN IMAGE FROM RESOURCE
let loadedImage:UIImage = UIImage(named: "Sample1.png")! //TRY Sample2, Sample3 too
//WE NEED A CIIMAGE - NOT NEEDED TO SCALE
inputImage = CIImage(image:loadedImage)!
//LET'S DO IT
self.doOCR(ciImage: inputImage!)
}
override func didReceiveMemoryWarning() {
super.didReceiveMemoryWarning()
// Dispose of any resources that can be recreated.
}
}
您会发现这里包含的完整项目是经过训练的模型!
SwiftOCR
我刚刚让 SwiftOCR 处理少量文本。
https://github.com/garnele007/SwiftOCR
用途
https://github.com/Swift-AI/Swift-AI
它使用 NeuralNet-MNIST 模型进行文本识别。
TODO : VNTextObservation > SwiftOCR
一旦我将一个连接到另一个,将使用 VNTextObservation 发布它的示例。
OpenCV + 正方体 OCR
我尝试使用 OpenCV + Tesseract,但出现编译错误,然后找到了 SwiftOCR。
还请参见:Google Vision iOS
注意 Google Vision Text Recognition - Android sdk 有文本检测,但也有 iOS cocoapod。因此,请密切关注它,最终应该将文本识别添加到 iOS。
https://developers.google.com/vision/text-overview
//更正:刚试了一下,只有安卓版本的sdk支持文本检测。
https://developers.google.com/vision/text-overview
如果您订阅版本: https ://libraries.io/cocoapods/GoogleMobileVision
单击 SUBSCRIBE TO RELEASES 可以看到 TextDetection 何时添加到 Cocoapod 的 iOS 部分
苹果终于更新了 Vision 来做 OCR。打开一个 Playground 并在 Resources 文件夹中转储几个测试图像。就我而言,我称它们为“demoDocument.jpg”和“demoLicensePlate.jpg”。
新类称为VNRecognizeTextRequest
。把它扔到操场上,试一试:
import Vision
enum DemoImage: String {
case document = "demoDocument"
case licensePlate = "demoLicensePlate"
}
class OCRReader {
func performOCR(on url: URL?, recognitionLevel: VNRequestTextRecognitionLevel) {
guard let url = url else { return }
let requestHandler = VNImageRequestHandler(url: url, options: [:])
let request = VNRecognizeTextRequest { (request, error) in
if let error = error {
print(error)
return
}
guard let observations = request.results as? [VNRecognizedTextObservation] else { return }
for currentObservation in observations {
let topCandidate = currentObservation.topCandidates(1)
if let recognizedText = topCandidate.first {
print(recognizedText.string)
}
}
}
request.recognitionLevel = recognitionLevel
try? requestHandler.perform([request])
}
}
func url(for image: DemoImage) -> URL? {
return Bundle.main.url(forResource: image.rawValue, withExtension: "jpg")
}
let ocrReader = OCRReader()
ocrReader.performOCR(on: url(for: .document), recognitionLevel: .fast)
WWDC19 对此进行了深入讨论
如果有人有更好的解决方案,请添加我自己的进展:
我已经成功地在屏幕上绘制了区域框和字符框。Apple 的视觉 API 实际上是非常高性能的。您必须将视频的每一帧转换为图像并将其提供给识别器。它比直接从相机提供像素缓冲区要准确得多。
if #available(iOS 11.0, *) {
guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {return}
var requestOptions:[VNImageOption : Any] = [:]
if let camData = CMGetAttachment(sampleBuffer, kCMSampleBufferAttachmentKey_CameraIntrinsicMatrix, nil) {
requestOptions = [.cameraIntrinsics:camData]
}
let imageRequestHandler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer,
orientation: 6,
options: requestOptions)
let request = VNDetectTextRectanglesRequest(completionHandler: { (request, _) in
guard let observations = request.results else {print("no result"); return}
let result = observations.map({$0 as? VNTextObservation})
DispatchQueue.main.async {
self.previewLayer.sublayers?.removeSubrange(1...)
for region in result {
guard let rg = region else {continue}
self.drawRegionBox(box: rg)
if let boxes = region?.characterBoxes {
for characterBox in boxes {
self.drawTextBox(box: characterBox)
}
}
}
}
})
request.reportCharacterBoxes = true
try? imageRequestHandler.perform([request])
}
}
现在我正在尝试实际重新调整文本。Apple 不提供任何内置的 OCR 模型。我想使用 CoreML 来做到这一点,所以我试图将 Tesseract 训练的数据模型转换为 CoreML。
您可以在这里找到 Tesseract 模型:https ://github.com/tesseract-ocr/tessdata ,我认为下一步是编写一个支持这些类型输入和输出 .coreML 文件的 coremltools 转换器。
或者,您可以直接链接到 TesseractiOS,并尝试使用您从 Vision API 获得的区域框和字符框来提供它。
感谢 GitHub 用户,您可以测试一个示例:https ://gist.github.com/Koze/e59fa3098388265e578dee6b3ce89dd8
- (void)detectWithImageURL:(NSURL *)URL
{
VNImageRequestHandler *handler = [[VNImageRequestHandler alloc] initWithURL:URL options:@{}];
VNDetectTextRectanglesRequest *request = [[VNDetectTextRectanglesRequest alloc] initWithCompletionHandler:^(VNRequest * _Nonnull request, NSError * _Nullable error) {
if (error) {
NSLog(@"%@", error);
}
else {
for (VNTextObservation *textObservation in request.results) {
// NSLog(@"%@", textObservation);
// NSLog(@"%@", textObservation.characterBoxes);
NSLog(@"%@", NSStringFromCGRect(textObservation.boundingBox));
for (VNRectangleObservation *rectangleObservation in textObservation.characterBoxes) {
NSLog(@" |-%@", NSStringFromCGRect(rectangleObservation.boundingBox));
}
}
}
}];
request.reportCharacterBoxes = YES;
NSError *error;
[handler performRequests:@[request] error:&error];
if (error) {
NSLog(@"%@", error);
}
}
问题是,结果是每个检测到的字符的边界框数组。根据我从 Vision 的会话中收集到的信息,我认为您应该使用 CoreML 来检测实际的字符。
推荐的 WWDC 2017 演讲:Vision Framework: Building on Core ML(也还没看完),请看 25:50 的类似示例,称为 MNISTVision
这是另一个漂亮的应用程序,演示了使用 Keras (Tensorflow) 训练使用 CoreML 进行手写识别的 MNIST 模型:Github
对于那些仍在寻找解决方案的人,我编写了一个快速库来执行此操作。它同时使用 Vision API 和 Tesseract,可用于通过一种方法完成问题描述的任务:
func sliceaAndOCR(image: UIImage, charWhitelist: String, charBlackList: String = "", completion: @escaping ((_: String, _: UIImage) -> Void))
此方法将在您的图像中查找文本,返回找到的字符串和显示文本所在位置的原始图像切片
我正在使用 Google 的 Tesseract OCR 引擎将图像转换为实际的字符串。您必须使用 cocoapods 将其添加到您的 Xcode 项目中。尽管 Tesseract 会执行 OCR,即使您只是将包含文本的图像提供给它,但使其性能更好/更快的方法是使用检测到的文本矩形来提供实际包含文本的图像片段,这就是 Apple 的 Vision Framework派上用场。这是引擎的链接: Tesseract OCR 这是我项目当前阶段的链接,该项目已经实现了文本检测+ OCR: Out Loud - Camera to Speech 希望这些可以有所用。祝你好运!
Firebase ML Kit 使用其设备上的Vision API为 iOS(和 Android)做到了这一点,它的性能优于 Tesseract 和 SwiftOCR。