python-3.x - 来自 TIFF 图像的 OCR：仅从第一页获取输出

问问题 2022-02-17T17:49:37.190

27 次

我是初学者，需要您回答下面代码中的 3 个问题。

我只从多页 tiff 的第一页获取输出
在哪里使用 --OEM 3 和 --psm 6 以获得正确的输出。

使用什么额外的代码来获得 HOCR 输出。

'''

from PIL import Image import pytesseract as pt import os pt.pytesseract.tesseract_cmd = r'C:\Users\admin\AppData\Local\Programs\Tesseract- OCR\tesseract.exe' def main(): # 文件夹路径获取原始图像路径 =“D:\folder1\tiff”

 # path for the folder for getting the output
 tempPath ="D:\\folder\\txt"

 # iterating the images inside the folder
 for imageName in os.listdir(path):

     inputPath = os.path.join(path, imageName)
     img = Image.open(inputPath)


     # applying ocr using pytesseract for python

     text = pt.image_to_string(img, lang ="eng")


     fullTempPath = os.path.join(tempPath, 'time_'+imageName+".txt")
     print(text)

     # saving the text for every image in a separate .txt file
     file1 = open(fullTempPath, "w")
     file1.write(text)
     file1.close()

如果名称== '主要': main()

'''

python-3.x - 来自 TIFF 图像的 OCR：仅从第一页获取输出

0 回答 0

Related

Reference