python - 如何在 Python 中获取当前 URL 或将 BeautifulSoup 附加到当前 URL

Question

我是 Python 新手，我正在尝试使用 BeautifulSoup 解析 HTML 页面并提取一些内容。我遇到的问题是我需要解析的 URL 是动态的，所以我不能像 BeautifulSoup 展示的所有示例一样将它硬编码到 urllib2.urlopen 中。

我试图使用 SELF 从浏览器中提取当前 URL，但我无法让它工作。谁能发布一个示例，说明如何使用 SELF 从浏览器中提取当前 URL，或者如何将 BeautifulSoup 附加到当前 URL？

任何帮助将不胜感激。

到目前为止，这是我的代码：

import os
import time

import win32api
import win32com.client
import win32con

from pywinauto import application

class A(object):
  def __init__(self):
    self.x = self.request.url

  def method_a(self):
    print self.x

#start IE with a start URL of what was passed in
app = application.Application()
app.Start(r"c:\program files\internet explorer\iexplore.exe %s"% "http://www.cyclestreets.net/journey")
time.sleep(3)
#ie = app.window_(title_re = "CycleStreets Cycle journey planner")
ie = app.window_(title_re = ".*CycleStreets.*")

a = A()
a.method_a()

当我运行它时，我收到一条消息说 AttributeError: 'A' object has no attribute 'request'

score 1 · Accepted Answer

您可以使用 urllib 获取当前 url，参见下面的示例：

from urllib import request,response
url = "http://www.example.com"
response=request.Request(url,headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'})
print(response.get_full_url())

这可能对你有帮助！......

score 0 · Accepted Answer

想想你已经有点糊涂了。在你的“A”类中，你有这个：

class A(object):
  def __init__(self):
    self.x = self.request.url

您在其中将 x 的值设置为 init 函数中的 self.request.url。这就是抱怨，因为此时您的对象中不存在 self.request 。

python - 如何在 Python 中获取当前 URL 或将 BeautifulSoup 附加到当前 URL

2 回答 2

Related

Reference