1

我不知道我或 tesseract 库是否有问题,但它的工作原理很糟糕。

Tesseract* tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];

    [tesseract setVariableValue:@"0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZéèô" forKey:@"tessedit_char_whitelist"]; //limit search
    [tesseract setImage:[UIImage imageNamed:@"sampledoc.jpg"]]; //image to check
    [tesseract recognize];

    NSLog(@"%@", [tesseract recognizedText]);

    [tesseract clear];

这是我要从中提取文本的示例图像:

在此处输入图像描述

这就是我运行后得到的:

THE SILVER CHAIR
by r 5 Lawn
CHAPTER ow
BEHIND THE cm

lr W1C a dull aulumn day and llll Pole vmscrylng ulmo mo gym
She ms clymg because Illey had been bullymg her Hus Is not gmng In baa school oolyl se I
shall say 15 lane is poslble Ibvlll lllrs schwll which lsnol 1 plusinl subjzrl II was Tcr
eduummlr o sdsooV rm bolh boysuld glrlsl Mm used no he cnllcd o wmxodl schonll some
said on wax ml nculy so mixed as an mlndsohhe people whn an n These penple had um mu
m boyund glrlsshauld loeullma mdn who my mo And unlonunalcb mm ml or
mom aflhc hlggzsl bays mo girls liked best was bullying Ihe mm All suns orlllmgsl hound
mmgso went on Much u an nvdmlry saloon wnuld mm bum flwnd om ma snowed m lulfn
R1my hm al Ilus school xhcy vlucrfl Or mu Iflhcy mo mo people who am am wxc not
expellad m pomsloa The mm no they Mile lntntesilng psycholoycnl msxs mdsaul for
them and mm mlhem for hnun Mo Ifyml knew lhe nghl sorlofdnngxmsay In mo um
mo maul result wos um vou became mlhev 1 fmounlelhan olllnrwlsc
no mswmy ml Pole W crymg on ml dull autumn my on me dlmp Vmlc pith Much runs
bellman um um arm gym ma Ihe lhvubbezy mm ole mam nearly nmulea her ay whan
boy came round Ihz oomuonhogym Mxmlmg mm ms lnmlds m ms pocktu I12 mm In
lmo nu
 CuIV yuu look when yolfre gomw ma JIH Fob
Mu nglur sud me km won mam man a and am he mom hen rm ll WV Polef he
not was upv
ml only mndc lung the am you mm mo yodic llymg oo my somclhmg um um Ihn lfyou
spnk you1l smrl ctymg owl
 lfs mum I suww l as mualr sand me hwy Mlmlbx ouggmg ms hlnds nmm mm ms vovkals
ml waded Them wlsw moo forhurm sly llH1hVlIgoCVOllWiIE ooolo have Said u They both
knew
wow laok has said the beyl Wherek no gond us all r
He mezm WEIL am he am mlk mum mo mlnmne begmnmg n lecmne ml suddenly liew mm a
lmxpcr hvmdl Isqnllc Illkcly llllng Io hlppen Ifyou law been mmrupled in n cryl

I

我应该怎么做?

4

2 回答 2

1

他指的是像素分辨率(PPI),而不是图像尺寸。

我将图像(从 96 DPI)重新缩放到 300 DPI,几乎所有文本都被正确识别。图像肯定需要在 OCR 步骤之前进行预处理。

于 2013-10-21T00:00:07.120 回答
0
Tesseract *tesseract = [[Tesseract alloc] initWithDataPath:@"tessdata" language:@"eng"];
[tesseract setImage:chosenImage];
[tesseract recognize];

NSLog(@"%@",[tesseract recognizedText]);
于 2013-11-28T06:26:01.323 回答