1

我需要将 .ass 字幕文件转换为 .xml 文件。到目前为止,我是手工完成的,但我必须做的越来越多。

这就是过程的样子:

输入 .ass 文件:

Dialogue: 0,0:00:08.03,0:00:10.57,Default,,0000,0000,0000,,Actor says something
Dialogue: 0,0:00:11.28,0:00:21.05,Default,,0000,0000,0000,,Actor says something
etc.

输出 .xml 文件:

<p begin="00:00:08.03" end="00:00:10.57">Actor says something</p>
<p begin="00:00:11.28" end="00:00:21.05">Actor says something</p>
etc.

我不知道如何解决这个任务。

4

3 回答 3

1

首先,您应该从源文件中提取相关信息。由于数据是,分开的,您可以使用 python csv 模块或做一个简单的split(',').

这是它看起来如何的示例方法:

def extract(source):
    for line in iter(source):
        _, start, end, _, _, _, _, _, _, text = line.strip().split(',', 9)
        yield start, end, text

下一步是将提取的数据转换为所需的 xml 格式。与第一种方法中的数据良好配合的函数可能如下所示(使用简单的字符串格式):

xml = '<p begin="{start}" end="{end}">{text}</p>'
def to_xml(start, end, text):
    return xml.format(start=start, end=end, text=text)

最后,打开文件并使用方法来编写输出:

with open('input.ass') as infile, open('output.xml', 'w') as outfile:
    for start, end, text in extract(infile):
        outfile.write(to_xml(start, end, text) + '\n')

虽然您当然可以使这个更小(更少的 LOC),但恕我直言,这是一种可读的方法。

于 2012-08-17T11:52:00.250 回答
0

又快又脏:

>>> subs = """Dialogue: 0,0:00:08.03,0:00:10.57,Default,,0000,0000,0000,,Actor s
ays something, then some more
... Dialogue: 0,0:00:11.28,0:00:21.05,Default,,0000,0000,0000,,Actor says someth
ing"""
>>> for line in subs.split("\n"):
...     print('<p begin="{0[1]}" end="{0[2]}">{0[9]}</p>'.format(
...            line.split(",", 9))) # Split no more than 9 times
...
<p begin="0:00:08.03" end="0:00:10.57">Actor says something, then some more</p>
<p begin="0:00:11.28" end="0:00:21.05">Actor says something</p>
于 2012-08-17T11:48:35.790 回答
0
src = [
'Dialogue: 0,0:00:08.03,0:00:10.57,Default,,0000,0000,0000,,Actor says something',
'Dialogue: 0,0:00:11.28,0:00:21.05,Default,,0000,0000,0000,,Actor says something',
]
tpl = '<p begin="0%s" end="0%s">%s</p>'
for i in src:
    fields = i.split(',')
    start, end, txt = fields[1], fields[2], fields[-1]
    print tpl % (start, end, txt)
于 2012-08-17T11:51:05.437 回答