python - Python从列表中提取第一个元素并清理字符串

Question

我正在解析 txt 文件（长 +100 页），并想提取字符串“公开发行价格”第一次出现的句子。另外，我想清除那句话中的“ ”字符。

我在一系列文件（file_list）上运行以下代码：

test1 = [] #create a new list to store my desired output
    for eachfile in file_list:
        with open(eachfile, 'r') as f:
            for line in f:
                if "public offering price" in line:
                    test1.append(line.replace('&nbsp;','').split('.')[0])
    print(test1)

使用上面的代码，我成功地清除" "了“。”时的字符和拆分元素。存在（有助于我想要的输出的东西），但获得以下输出：

['public offering price will be between $and $per share', 'toadditional shares of our common stock at the initial public offering price', '(2)an initial public offering price of $per share']

上面的输出给了我所有的句子，包括我想要的字符串，但我只想保留第一次出现：

['public offering price will be between $and $per share]

知道如何获得这样的输出吗？鉴于我运行的代码，它必须很容易实现，但无法弄清楚如何......

非常感谢您提前，

编辑：在没有替换或拆分（'.'）[0] 的情况下获得的输出如下：

['public offering price will be between $&nbsp;&nbsp;&nbsp;and $&nbsp;&nbsp;&nbsp;&nbsp;per share. We intend to apply to list the common stock on\n', 'to&nbsp;&nbsp;&nbsp;&nbsp;additional shares of our common stock at the initial public offering price.</FONT>\n', '(2)&nbsp;an initial public offering price of $&nbsp;&nbsp;&nbsp;&nbsp;per share, the midpoint of the initial public offering range indicated on the cover of this prospectus. </FONT> <FONT SIZE=2>\n']

score 0 · Accepted Answer

取列表的第一个元素：

first_elem = test1[0]
print(first_elem)

编辑：获取每个文件的第一个所需字符串：


test2 = [] #create a list to store all lists 
    for eachfile in file_list:
    test1 = [] #create a new list to store my desired output
        with open(eachfile, 'r') as f:
            for line in f:
                if "public offering price" in line:
                    test1.append(line.replace('&nbsp;','').split('.')[0])
        test2.append(test1)

    for test1 in test2:
        print(test1[0]) #print first element of each nested list

score 0 · Accepted Answer

您可以使用break退出循环：

test1 = [] #create a new list to store my desired output

for eachfile in file_list:
    line2 = ""  # Create var with lines
    with open(eachfile, 'r') as f:
        for line in f:
            line2 = line2 + line
            if "public offering price" in line:
                test1.append(line.replace('&nbsp;','').split('.')[0])
                break

print(test1)

score 0 · Accepted Answer

尝试break在 for 循环中使用 a 以跳到下一个文件。

test1 = [] #create a new list to store my desired output
    for eachfile in file_list:
        line2 = ""  # Create var with lines
        with open(eachfile, 'r') as f:
            for line in f:
                line2 = line2 + line
                if "public offering price" in line:
                    test1.append(line.replace('&nbsp;','').split('.')[0])
                    break
    print(test1)

python - Python从列表中提取第一个元素并清理字符串

3 回答 3

Related

Reference