I was interested in coding a long time. So this year I got a job where I was supposed to scrape text from old event programs. The pictures where terrible quality and results with normal OCR where horrible. I checked the google vision api and tested it manually and the results where great, so I used this opportunity to learn about coding.( I did some python before but the lack of usefulness always drove me away).
I wrote this program, I know its walking on crouches but it worked and did exactly what I wanted for 3 months now. I dont use it regularly but today when I wanted to use it again, it didnt work any more it just jumps to the end of the program, and finishes without making the api requests, at least so it seems to me.
I am really mostly finished with my work and this request doesnt make sense in terms of efficiency but I am just curious why the program I created suddenly stopped working.
If somebody can hint me in the right direction I would highly appreciate it, also if somebody wants to use the program, if it works for them sure let it work instead of yourself :D
I am not sure but I am using linux mint and maybe it stopped working because of some updates of python or vision-api or something.
# coding=utf-8
from google.cloud import vision
import os
import io
import sys
reload(sys)
sys.setdefaultencoding('utf8')
directory = "/home/weareone/Documents/programming/test/here"
os.environ["GOOGLE_APPLICATION_CREDENTIALS"]="/home/weareone/Documents/programming/test/key.json"
def workinghard(page):
client = vision.ImageAnnotatorClient()
#file_name = os.path.join( os.path.dirname(__file__), page) # Loads the image into memory
#page_er = os.path.abspath(os.path.join(os.path.dirname(page))) <--- my improvisation
with io.open( page, 'rb') as image_file: # after io.open.(file_name <---exchanged with "page")
content = image_file.read()
request = {
"image": {
"content": content
},
"features": [
{
"type": "DOCUMENT_TEXT_DETECTION"
}
]
}
response = client.annotate_image(request)
storage = response.full_text_annotation.text
return storage
def listdirs(folder):
return [
d for d in (os.path.join(folder, d1) for d1 in os.listdir(folder))
if os.path.isdir(d)
]
directories = listdirs(directory)
for year in directories:
logtxt = open(year + ".txt", "w+" )
for root, dirs, files in os.walk(year):
files.sort()
for file23 in files:
if file23.endswith('.jpg'):
pathparent = os.path.join(year,file23)
logtxt.write(workinghard(pathparent))
logtxt.write("-------------------------------------------------------------------------")
print(pathparent)
logtxt.close()
print("DONE")
Thank you very much dear internet
EDIT: I solved it by changing this line the statement apaprently equaled to FALSE.
if file23.endswith('.JPG'):