python - Python 抓取</h1> <div id="body"><p>I want to fetch the title of a webpage which I open using urllib2. What is the best way to do this, to parse the html and find what I need (for now only the -tag but might need more in the future).</a></h1> <div class="ml12 aside-cta flex--item print:d-none sm:ml0 sm:mb12 sm:order-first sm:as-end"> <a href="https://stackoverflow.com/questions/ask" target="_blank" class="ws-nowrap s-btn s-btnprimary">问问题</a></div> </div> <div class="d-flex fw-wrap pb8 mb16 bb bc-black-075"> <div class="flex--item ws-nowrap mr16 mb8"> <span class="fc-light mr2"></span> </div> <div class="flex--item ws-nowrap mr16 mb8" title="2022-04-17 15:46:40Z"> <span class="fc-light mr2">问问题</span> <time itemprop="dateCreated" datetime="2009-11-02T09:48:59.420">2009-11-02T09:48:59.420</time> </div> <div class="flex--item ws-nowrap mb8" title="Viewed 6 times"> <span class="fc-light mr2"></span> 5738 次 </div> </div> <div id="mainbar" role="main" aria-label="question and answers"> <div class="question" data-questionid="4" data-position-on-page="0" data-score="763" id="question"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="4"> <button class="js-vote-up-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-peeufs8c"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button> <div id="--stacks-s-tooltip-peeufs8c" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This question shows research effort; it is useful and clear<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value=""> 5 </div> <button class="js-vote-down-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-04106eqn"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button><div id="--stacks-s-tooltip-04106eqn" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This question does not show any research effort; it is unclear or not useful<div class="s-popover--arrow"></div></div> <div id="--stacks-s-tooltip-tgvwendx" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">Bookmark this question.<div class="s-popover--arrow"></div></div> <a class="js-post-issue flex--item s-btn s-btnunset c-pointer py6 mx-auto" data-shortcut="T" data-ks-title="timeline" data-controller="s-tooltip" data-s-tooltip-placement="right" aria-label="Timeline" aria-describedby="--stacks-s-tooltip-abwmy15k"><svg aria-hidden="true" class="mln2 mr0 svg-icon iconHistory" width="19" height="18" viewBox="0 0 19 18"><path d="M3 9a8 8 0 1 1 3.73 6.77L8.2 14.3A6 6 0 1 0 5 9l3.01-.01-4 4-4-4h3L3 9Zm7-4h1.01L11 9.36l3.22 2.1-.6.93L10 10V5Z"></path></svg></a><div id="--stacks-s-tooltip-abwmy15k" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">Show activity on this post.<div class="s-popover--arrow"></div></div> </div> </div> <div class="postcell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> </div> <div class="mt24 mb12"> <div class="post-taglist d-flex gs4 gsy fd-column"> <div class="d-flex ps-relative fw-wrap"> <a href="/tags/python" class="post-tag js-gps-track" title="show questions tagged 'python'" rel="tag">python</a><a href="/tags/urllib2" class="post-tag js-gps-track" title="show questions tagged 'urllib2'" rel="tag">urllib2</a> </div> </div> </div> </div> <span class="d-none" itemprop="commentCount">4</span> </div> </div> <div class="js-zone-container zone-container-responsive"> <div id="dfp-isb" class="everyonelovesstackoverflow everyonelovesinline-sidebar mx-auto" style="min-height: auto; height: auto; display: none;"></div> <div class="js-report-ad-button-container mx-auto" style="width: 300px"></div> </div> <div id="answers"> <a name="tab-top"></a> <div id="answers-header"> <div class="answers-subheader d-flex ai-center mb8"> <div class="flex--item fl1"> <h2 class="mb0" data-answercount=""> 4 回答 <span style="display:none;" itemprop="answerCount">4</span> </h2> </div> </div> </div> <a name="7"></a> <div id="answer-7" class="answer js-answer accepted-answer" data-answerid="7" data-parentid="4" data-score="506" data-position-on-page="1" data-highest-scored="1" data-question-has-accepted-highest-score="1" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="7"> <button class="js-vote-up-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-dgvag2l3"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button><div id="--stacks-s-tooltip-dgvag2l3" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value="9"> 9 </div> <button class="js-vote-down-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-gn8ppsfv"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button> </div> </div> <div class="answercell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> <p>是的，我会推荐<a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">BeautifulSoup</a></p> <p>如果你得到标题，它很简单：</p> <pre><code>soup = BeautifulSoup(html) myTitle = soup.html.head.title </code></pre> <p>或者</p> <pre><code>myTitle = soup('title') </code></pre> <p>取自<a href="http://www.crummy.com/software/BeautifulSoup/documentation.html" rel="noreferrer">文档</a></p> <p>它非常健壮，无论它多么混乱，它都会解析 html。</p> </div> <div class="mt24"> <div class="user-action-time" style="color:#999;text-align:right;">于 2009-11-02T09:55:11.267 回答</div> </div> </div> </div> </div><a name="7"></a> <div id="answer-7" class="answer js-answer accepted-answer" data-answerid="7" data-parentid="4" data-score="506" data-position-on-page="1" data-highest-scored="1" data-question-has-accepted-highest-score="1" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="7"> <button class="js-vote-up-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-dgvag2l3"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button><div id="--stacks-s-tooltip-dgvag2l3" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value="5"> 5 </div> <button class="js-vote-down-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-gn8ppsfv"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button> </div> </div> <div class="answercell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> <p>尝试<a href="http://www.crummy.com/software/BeautifulSoup/" rel="noreferrer">美丽的汤</a>：</p> <pre><code>url = 'http://www.example.com' response = urllib2.urlopen(url) html = response.read() soup = BeautifulSoup(html) title = soup.html.head.title print title.contents </code></pre> </div> <div class="mt24"> <div class="user-action-time" style="color:#999;text-align:right;">于 2009-11-02T09:55:06.360 回答</div> </div> </div> </div> </div><a name="7"></a> <div id="answer-7" class="answer js-answer accepted-answer" data-answerid="7" data-parentid="4" data-score="506" data-position-on-page="1" data-highest-scored="1" data-question-has-accepted-highest-score="1" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="7"> <button class="js-vote-up-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-dgvag2l3"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button><div id="--stacks-s-tooltip-dgvag2l3" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value="0"> 0 </div> <button class="js-vote-down-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-gn8ppsfv"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button> </div> </div> <div class="answercell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> <p>使用<a href="http://www.crummy.com/software/BeautifulSoup/" rel="nofollow noreferrer">美丽的汤</a>。</p> <pre><code>html = urllib2.urlopen("...").read() from BeautifulSoup import BeautifulSoup soup = BeautifulSoup(html) print soup.title.string </code></pre> </div> <div class="mt24"> <div class="user-action-time" style="color:#999;text-align:right;">于 2009-11-02T09:54:09.690 回答</div> </div> </div> </div> </div><a name="7"></a> <div id="answer-7" class="answer js-answer accepted-answer" data-answerid="7" data-parentid="4" data-score="506" data-position-on-page="1" data-highest-scored="1" data-question-has-accepted-highest-score="1" itemprop="suggestedAnswer" itemscope="" itemtype="https://schema.org/Answer"> <div class="post-layout"> <div class="votecell post-layout--left"> <div class="js-voting-container d-flex jc-center fd-column ai-stretch gs4 fc-black-200" data-post-id="7"> <button class="js-vote-up-btn flex--item s-btn s-btnunset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Up vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-dgvag2l3"> <svg aria-hidden="true" class="svg-icon iconArrowUpLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 25h32L18 9 2 25Z"></path></svg> </button><div id="--stacks-s-tooltip-dgvag2l3" class="s-popover s-popovertooltip pe-none" aria-hidden="true" role="tooltip">This answer is useful<div class="s-popover--arrow"></div></div> <div class="js-vote-count flex--item d-flex fd-column ai-center fc-black-500 fs-title" itemprop="upvoteCount" data-value="0"> 0 </div> <button class="js-vote-down-btn flex--item s-btn s-btn__unset c-pointer " data-controller="s-tooltip" data-s-tooltip-placement="right" aria-pressed="false" aria-label="Down vote" data-selected-classes="fc-theme-primary" data-unselected-classes="" aria-describedby="--stacks-s-tooltip-gn8ppsfv"> <svg aria-hidden="true" class="svg-icon iconArrowDownLg" width="36" height="36" viewBox="0 0 36 36"><path d="M2 11h32L18 27 2 11Z"></path></svg> </button> </div> </div> <div class="answercell post-layout--right"> <div class="s-prose js-post-body" itemprop="text"> <p>你们为什么要为一项任务导入整个额外的库。没有正则表达式？不是第三方的 urllib 请求不是 bs4 或 mech 吗？与标准库有关，解析 html 并匹配字符串，然后<code>'>'</code> <code>'<'</code>用 re 或 whateves 拆分。</p> <pre><code>N=(len(html)) for a in html(N): if '<title>' in a: Title=(str(a)) </code></pre> <p>这就是python 2，我认为，你可以剥离它</p> </div> <div class="mt24"> <div class="user-action-time" style="color:#999;text-align:right;">于 2014-12-01T13:58:17.213 回答</div> </div> </div> </div> </div></div> </div> <div id="sidebar" class="show-votes" role="complementary" aria-label="sidebar"> <div class="module sidebar-related"> <h4 id="h-related">Related</h4> <div class="related js-gps-related-questions" data-tracker="rq=1"> <div class="spacer"> <a href="/questions/18036447" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">2</div> </a> <a href="/questions/18036447" class="question-hyperlink">vim - 在 vim 中加载特定文件时如何激活颜色方案</a> </div><div class="spacer"> <a href="/questions/18036448" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">4</div> </a> <a href="/questions/18036448" class="question-hyperlink">javascript - 多个单选按钮，一个已选中</a> </div><div class="spacer"> <a href="/questions/18036457" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/18036457" class="question-hyperlink">meteor - 允许任何访问者更新或插入记录？</a> </div><div class="spacer"> <a href="/questions/18036458" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">5</div> </a> <a href="/questions/18036458" class="question-hyperlink">java - 为什么我们不能有静态外部类</a> </div><div class="spacer"> <a href="/questions/18036462" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/18036462" class="question-hyperlink">php - 解析错误：语法错误，意外的“索引”</a> </div><div class="spacer"> <a href="/questions/18036468" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">1</div> </a> <a href="/questions/18036468" class="question-hyperlink">python - Python vlc 绑定输出错误</a> </div><div class="spacer"> <a href="/questions/18036469" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">2</div> </a> <a href="/questions/18036469" class="question-hyperlink">ssl - SSL 客户端证书认证</a> </div><div class="spacer"> <a href="/questions/18036471" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">4</div> </a> <a href="/questions/18036471" class="question-hyperlink">php - WordPress 元查询数组</a> </div><div class="spacer"> <a href="/questions/18036472" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">0</div> </a> <a href="/questions/18036472" class="question-hyperlink">php - 仅在发生更改时使用 AJAX 检查更改并进行更新</a> </div><div class="spacer"> <a href="/questions/18036474" title="Question score (upvotes - downvotes)"> <div class="answer-votes large">2</div> </a> <a href="/questions/18036474" class="question-hyperlink">php - 构建 API 的正确方法</a> </div> </div> </div> <div class="module js-gps-related-tags" id="related-tags"> <h4 id="h-related-tags">Reference</h4> <div data-name="javascript"> <a href="https://php.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">php</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">1429865</span> </span> </div> <div data-name="javascript"> <a href="https://c-cpp.com" class="post-tag no-tag-menu js-gps-track" target="_blank">c/c++</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">756500</span> </span> </div> <div data-name="javascript"> <a href="https://nginx.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">nginx</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49975</span> </span> </div> <div data-name="javascript"> <a href="https://mongodb.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">mongodb</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">159057</span> </span> </div> <div data-name="javascript"> <a href="https://mybatis.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">mybatis</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">3233</span> </span> </div> <div data-name="javascript"> <a href="https://anaconda.org.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">anaconda</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">13410</span> </span> </div> <div data-name="javascript"> <a href="https://pycharm.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">pycharm</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">14671</span> </span> </div> <div data-name="javascript"> <a href="https://python.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">python</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">1902243</span> </span> </div> <div data-name="javascript"> <a href="https://vscode.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">vscode</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">56040</span> </span> </div> <div data-name="javascript"> <a href="https://dockerdocs.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">docker</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">110988</span> </span> </div> <div data-name="javascript"> <a href="https://github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">github</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49000</span> </span> </div> <div data-name="javascript"> <a href="https://flask.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">flask</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">49129</span> </span> </div> <div data-name="javascript"> <a href="https://ffmpeg.github.net.cn" class="post-tag no-tag-menu js-gps-track" target="_blank">ffmpeg</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">24037</span> </span> </div> <div data-name="javascript"> <a href="https://jmeter.net" class="post-tag no-tag-menu js-gps-track" target="_blank">jmeter</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">16910</span> </span> </div> <div data-name="javascript"> <a href="https://matplotlib.net" class="post-tag no-tag-menu js-gps-track" target="_blank">matplotlib</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">63493</span> </span> </div> <div data-name="javascript"> <a href="https://getbootstrap.net" class="post-tag no-tag-menu js-gps-track" target="_blank">bootstrap</a> <span class="item-multiplier"><span class="item-multiplier-x">×</span> <span class="item-multiplier-count">54641</span> </span> </div> </div> </div> </div> </div> </div> </div> <footer id="footer" class="site-footer js-footer" role="contentinfo"> <div class="site-footer--container"> <div class="site-footer--logo"> <a href="https://stackoverflow.com"><svg aria-hidden="true" class="native svg-icon iconLogoGlyphMd" width="32" height="37" viewBox="0 0 32 37"><path d="M26 33v-9h4v13H0V24h4v9h22Z" fill="#BCBBBB"/><path d="m21.5 0-2.7 2 9.9 13.3 2.7-2L21.5 0ZM26 18.4 13.3 7.8l2.1-2.5 12.7 10.6-2.1 2.5ZM9.1 15.2l15 7 1.4-3-15-7-1.4 3Zm14 10.79.68-2.95-16.1-3.35L7 23l16.1 2.99ZM23 30H7v-3h16v3Z" fill="#F48024"/></svg></a> </div> <nav class="site-footer--nav"> <div class="site-footer--col"> <h5 class="-title"><a href="https://stackoverflow.org.cn" class="js-gps-track" data-gps-track="footer.click({ location: 3, link: 15})">Stack Overflow 中文网</a></h5> <p>遵从 CC BY-SA 知识共享许可协议。</p> </div> </nav> </div> </footer> <script> var _hmt = _hmt || []; (function() { var hm = document.createElement("script"); hm.src = "https://hm.baidu.com/hm.js?709ff2ad9744e86b5b0eee677fc13ede"; var s = document.getElementsByTagName("script")[0]; s.parentNode.insertBefore(hm, s); })(); </script>  <script async src="https://www.googletagmanager.com/gtag/js?id=G-1MW5BV8G8E"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-1MW5BV8G8E'); </script> <script async src="https://pagead2.googlesyndication.com/pagead/js/adsbygoogle.js?client=ca-pub-6117966252207595" crossorigin="anonymous"></script> </body> </html>