1

我正在解析 txt 文件(长 +100 页),并想提取字符串“公开发行价格”第一次出现的句子。另外,我想清除那句话中的“ ”字符。

我在一系列文件(file_list)上运行以下代码:

test1 = [] #create a new list to store my desired output
    for eachfile in file_list:
        with open(eachfile, 'r') as f:
            for line in f:
                if "public offering price" in line:
                    test1.append(line.replace(' ','').split('.')[0])
    print(test1)

使用上面的代码,我成功地清除" "了“。”时的字符和拆分元素。存在(有助于我想要的输出的东西),但获得以下输出:

['public offering price will be between $and $per share', 'toadditional shares of our common stock at the initial public offering price', '(2)an initial public offering price of $per share']

上面的输出给了我所有的句子,包括我想要的字符串,但我只想保留第一次出现:

['public offering price will be between $and $per share]

知道如何获得这样的输出吗?鉴于我运行的代码,它必须很容易实现,但无法弄清楚如何......

非常感谢您提前,

编辑:在没有替换或拆分('.')[0] 的情况下获得的输出如下:

['public offering price will be between $&nbsp;&nbsp;&nbsp;and $&nbsp;&nbsp;&nbsp;&nbsp;per share. We intend to apply to list the common stock on\n', 'to&nbsp;&nbsp;&nbsp;&nbsp;additional shares of our common stock at the initial public offering price.</FONT>\n', '(2)&nbsp;an initial public offering price of $&nbsp;&nbsp;&nbsp;&nbsp;per share, the midpoint of the initial public offering range indicated on the cover of this prospectus. </FONT> <FONT SIZE=2>\n']
4

3 回答 3

0

取列表的第一个元素:

first_elem = test1[0]
print(first_elem)

编辑:获取每个文件的第一个所需字符串:


test2 = [] #create a list to store all lists 
    for eachfile in file_list:
    test1 = [] #create a new list to store my desired output
        with open(eachfile, 'r') as f:
            for line in f:
                if "public offering price" in line:
                    test1.append(line.replace('&nbsp;','').split('.')[0])
        test2.append(test1)

    for test1 in test2:
        print(test1[0]) #print first element of each nested list

于 2019-09-11T18:26:39.457 回答
0

您可以使用break退出循环:

test1 = [] #create a new list to store my desired output

for eachfile in file_list:
    line2 = ""  # Create var with lines
    with open(eachfile, 'r') as f:
        for line in f:
            line2 = line2 + line
            if "public offering price" in line:
                test1.append(line.replace('&nbsp;','').split('.')[0])
                break

print(test1)
于 2019-09-11T18:39:23.580 回答
0

尝试break在 for 循环中使用 a 以跳到下一个文件。

test1 = [] #create a new list to store my desired output
    for eachfile in file_list:
        line2 = ""  # Create var with lines
        with open(eachfile, 'r') as f:
            for line in f:
                line2 = line2 + line
                if "public offering price" in line:
                    test1.append(line.replace('&nbsp;','').split('.')[0])
                    break
    print(test1)
于 2019-09-11T18:36:24.633 回答