我正在解析 txt 文件(长 +100 页),并想提取字符串“公开发行价格”第一次出现的句子。另外,我想清除那句话中的“ ”字符。
我在一系列文件(file_list)上运行以下代码:
test1 = [] #create a new list to store my desired output
for eachfile in file_list:
with open(eachfile, 'r') as f:
for line in f:
if "public offering price" in line:
test1.append(line.replace(' ','').split('.')[0])
print(test1)
使用上面的代码,我成功地清除" "
了“。”时的字符和拆分元素。存在(有助于我想要的输出的东西),但获得以下输出:
['public offering price will be between $and $per share', 'toadditional shares of our common stock at the initial public offering price', '(2)an initial public offering price of $per share']
上面的输出给了我所有的句子,包括我想要的字符串,但我只想保留第一次出现:
['public offering price will be between $and $per share]
知道如何获得这样的输出吗?鉴于我运行的代码,它必须很容易实现,但无法弄清楚如何......
非常感谢您提前,
编辑:在没有替换或拆分('.')[0] 的情况下获得的输出如下:
['public offering price will be between $ and $ per share. We intend to apply to list the common stock on\n', 'to additional shares of our common stock at the initial public offering price.</FONT>\n', '(2) an initial public offering price of $ per share, the midpoint of the initial public offering range indicated on the cover of this prospectus. </FONT> <FONT SIZE=2>\n']