I'm a beginner at programming and Python, and I'm writing a script to do stuff with .srt subtitle files. My problem is that I don't know how to: read through a file, and analyze text first between the beginning of the text and the first empty line and then between that empty line and the next empty line till the end of the file ("analyze" by e.g. calculate the length of a part of it, convert another part to numbers etc.).
You can read about the .srt format specification and see an example here (type: Plain); there's an empty line at the end of the file. I want to compare the display time/duration of each subtitle against the number of characters in it. Starting from the beginning of the file, each subtitle (with its number, duration info and text) is separated from the next one by an empty line (a "\n", I can find them with sth like if "\n" in line and len(line) == 2:
). The time codes always contain a "-->" and always end in three digits, so if I have that in a string, I can figure out where it is. The problem is, I need to somehow do the following:
Read the subtitle text, which can be 1-3 lines with line breaks, calculate its character length.
Read the duration, convert to duration in seconds.
Read the line number (to be able to output it somewhere with my results, e.g. "duration of line 44 is 4.54 s").
I can do the second easily, but I'm not sure how to go over the whole file and tell Python: find the end of each subtitle's text, calculate the length of characters in each line, add that, read the duration, divide these, output this with the line number, and do the same with the next subtitle until you reach the end of the file. If it was one subtitle, I could do it easily, but I'm not sure how to do that check on a single one and then seek the next one. I've been looking for 2 hours for this and can't find anything like that.