python - Python程序从文本文件中提取文本？

Question

我有一个通过转换 .srt 文件获得的文本文件。内容如下：

1
0:0:1,65 --> 0:0:7,85
你好，我的名字是加雷斯，在这个
视频，我将讨论列表推导


2
0:0:7,85 --> 0:0:9,749
在 Python 中。

我只想要文本文件中的单词，以便输出是一个新的文本文件 op.txt，输出表示为：

你好
我的
姓名
是
加雷斯
和

等等。

这是我正在开发的程序：

import os, re
f= open("D:\captionsfile.txt",'r')
k=f.read()
g=str(k)
f.close()
w=re.search('[a-z][A-Z]\s',g)
fil=open('D:\op.txt','w+')
fil.append(w)
fil.close()

但是我为这个程序得到的输出是：

没有任何
没有任何
没有任何

score 3 · Accepted Answer

如果我们假设m是一个单词和缩写am，那in.txt就是你的文本文件，你可以使用

import re

with open('in.txt') as intxt:
    data = intxt.read()

x = re.findall('[aA-zZ]+', data)
print(x)

这将产生

['Hello', 'my', 'name', 'is', 'Gareth', 'and', 'in', 'this', 'video', 'I', 'm', 'going', 'to', 'talk', 'about', 'list', 'comprehensions', 'in', 'Python']

您现在可以x使用以下命令写入新文件：

with open('out.txt', 'w') as outtxt:
    outtxt.write('\n'.join(x))

要得到

I'm

代替

I
m

您可以使用re.findall('[aA-zZ\']+')

score 1 · Accepted Answer

with open("out.txt","a") as f1:
    with open("b.txt")  as f:
        for line in f:
            if not line[0].isdigit():
                for word in line.split():
                    f1.write(re.sub(r'[,.!]', "", word)) # replace any punctuation you don't want
                    f1.write("\n")

python - Python程序从文本文件中提取文本？

2 回答 2

Related

Reference