我正在抓取的网站:链接
我要解析的标签: START - <p id="p-1">
, FINISH -</p>
我的代码:
from urllib import urlopen
from bs4 import BeautifulSoup
import re
html = urlopen('http://mansci.journal.informs.org/gca?gca=mansci%3B6%2F2%2F141&gca=mansci%3B6%2F2%2F149&gca=mansci%3B6%2F2%2F165&gca=mansci%3B6%2F2%2F172&gca=mansci%3B6%2F2%2F187&gca=mansci%3B6%2F2%2F191&gca=mansci%3B6%2F2%2F197&gca=mansci%3B6%2F2%2F205&gca=mansci%3B6%2F2%2F215&submit=Get+All+Checked+Abstracts').read()
a = re.compile('<p id="p-1">(.*)</p>')
b = re.findall(a,html)
我遇到的问题是我的代码逐行查看,我不知道如何解析整个段落。