我有一个字符串,
data = 'very <strong class="keyword">Awesome</strong> <strong class="keyword">Book</strong> discount'
我想将列表中的输出作为
ans = ['very','<strong class="keyword">Awesome</strong>','<strong class="keyword">Book</strong>','discount']
所以我可以知道单词的位置以及标签中出现的单词。我使用 BeautifulSoup 提取单词 in 和单词 with are not in 。但我需要找到位置。我试过的代码。
from bs4 import BeautifulSoup as BS
data = 'very <strong class="keyword">Awesome</strong> <strong class="keyword">Book</strong>'
soup = BS(data)
to_extract = soup.findAll('strong')
[comment.extract() for comment in to_extract]
soup = str(soup)
notInStrongWords = []
for t in to_extract:
t_soup = BS('{0}'.format(t))
t_tag = t_soup.strong
matchWords.append(t_tag.string)
soup = re.sub("[^A-Za-z0-9\\-\\.\\(\\)\\\\\/\\&': ]+",' ', soup)
soup = re.findall('[(][^)]*[)]|\S+', soup)
InStrongWords = []
InStrongWords = [x for x in soup]
提前致谢。