0

我有一个带时间的多行文本,采用 MM:SS 格式,带有视频中的字幕行。我想将 MM:SS 格式转换为 ass 格式,即 00:MM:SS,000 并使用间隔制表符输出。我写了这段代码

text = """02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research."""
for line in text.splitlines():
    words_in_line = line.split('\t')
    for word in words_in_line:
        if ":" in word:
                ass= "00:"+word +",000"
                final_line = line.replace(word,ass)
                print(final_line)

它转换格式,但它只转换每一行中的一个时间,然后在单独的行上转换另一个,给出这样的输出

00:02:42,000    02:47   And so that Wayne Gretzky method for sort of going into the future and
02:42   00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    02:51   imagining what that future might look like, again, is a good idea for research.
02:47   00:02:51,000    imagining what that future might look like, again, is a good idea for research.

如何更改代码以获得这样的输出?

00:02:42,000    00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    00:02:51,000    imagining what that future might look like, again, is a good idea for research.
4

2 回答 2

1

使用 regex sub 进行搜索和替换,\\1对应括号中的部分。

import re
text = """02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research."""
print(re.sub('(\d\d:\d\d)', '00:\\1,000', text))

您可以进一步指定正则表达式,例如

print(re.sub('^(\d\d:\d\d)\t(\d\d:\d\d)', '00:\\1,000   00:\\2,000', text))

以避免错误的替换。检查 regex101.com 为您的数据找到匹配的。

于 2022-01-12T08:36:18.020 回答
1

像这样的东西似乎可以解决问题:

text = """
02:42 02:47   And so that Wayne Gretzky method for sort of going into the future and
02:47   02:51   imagining what that future might look like, again, is a good idea for research.
"""


def convert_time(t):
    return f"00:{t},000"


for line in text.splitlines():
    try:
        start, end, text = line.split(None, 2)
    except ValueError:  # if the line is out of spec, just print it
        print(line)
        continue
    start = convert_time(start)
    end = convert_time(end)
    print(start, end, text, sep="\t")

输出是

00:02:42,000    00:02:47,000    And so that Wayne Gretzky method for sort of going into the future and
00:02:47,000    00:02:51,000    imagining what that future might look like, again, is a good idea for research.
于 2022-01-12T09:09:20.687 回答