I'm using tesseract to extract text from an image. However, there are some problems I'm running into with certain images:
The text is extracted perfectly fine from the image below:
However, the text is not extracted from the image below, note that the square around the text is smaller now
Questions
what are some things I can do to the original image to better extract the text from the second image. I am already making the image BW usint imagemagick's -monochrome
filter.
In the images I do not care about anything but the text. Is there a technique I can use to crop the image and make a new image with nothing but white background and text? I won't always know the coordinates for the square circle so I would need a cropping function that would automatically detect the co-ordinates of white background..