0

我想提取文件中特定句子之后的文本。

4

1 回答 1

1

您是否特别需要 BeautifulSoup?如果不使用以下内容:

要在特定句子后立即拆分文本,请尝试此操作,因为我不确定您在句子后具体要提取什么,所以我假设您的意思是句子后的所有内容,

例如,如果我有一个文件file.txt:

Lorem ipsum dolor sit amet, consectetur adipiscing elit。Vivamus congue mattis risus,坐在 amet elementum lorem gravida eu。Cras vitae ante vel erat feugiat scelerisque。Etiam nec urna sed enim blandit blandit non nec odio。Quisque lacinia tempus rhoncus。Mauris euismod leo ut velit lobortis feugiat。Phasellus ultrices nunc sit amet tortor pretium eu mollis neque condimentum。Fusce placerat bibendum diam eget euismod。Phasellus ultricies erat nibh, sed volutpat quam。Nunc quis mauris sed purus aliquet aliquam。整数 viverra rutrum arcu ac tempor。

我的句子是,Mauris euismod leo ut velit lobortis feugiat.

你可以这样做:

with open("file.txt") as file: #open a file securily, then automitaclly close it
    seperator = "Mauris euismod leo ut velit lobortis feugiat." #assign pre-opt variable for the sentence
    for line in file:
        text = line.split(seperator,1)[1]
    print text

>>> Phasellus ultrices nunc sit amet tortor pretium eu mollis neque condimentum. Fusce placerat bibendum diam eget euismod. Phasellus ultricies erat nibh, sed volutpat quam. Nunc quis mauris sed purus aliquet aliquam. Integer viverra rutrum arcu ac tempor.

使用BeautifulSoup您可以从文件中提取所有文本,如果您需要更具体的内容,请告诉我。

from bs4 import BeautifulSoup

soup = """<html><body><div style="DISPLAY: block; TEXT-INDENT: 0pt"><br/></div> <div align="justify" style="DISPLAY: block; MARGIN-LEFT: 0pt; TEXT-INDENT: 0pt; MARGIN-RIGHT: 0pt"><font style="DISPLAY: inline; FONT-WEIGHT: bold; FONT-SIZE: 10pt; FONT-FAMILY: Arial">Our Earnings are Significantly Affected by General Business and Economic Conditions</font></div></body></html>"""

print(soup.get_text())

输出:

 Our Earnings are Significantly Affected by General Business and Economic Conditions
于 2012-12-03T23:46:57.453 回答