0

我有一个 Python 脚本,其目标是根据用户输入打开一个网页,然后从该网页中抓取特定信息。此脚本以以下导入语句开头:

import socks
import socket
from urllib.request import urlopen
from time import sleep
from bs4 import BeautifulSoup

socks.set_default_proxy(socks.SOCKS5, "127.0.0.1", 9050)
socket.socket = socks.socksocket

发生错误的部分涉及处理所需网页的 url。

url_name = "http://<website name>"
print("url name is : " + url_name)
print("About to open the web page")
sleep(5)
**webpage = urlopen(url_name)**
print("Web page opened successfully")
sleep(5)
html = webpage.read().decode("utf-8")
soup = BeautifulSoup(html, "html.parser")
print("HTML extracted")
sleep(5)
print("Printing soup object text")
sleep(5)
print(soup.get_text())

当脚本到达突出显示的语句(调用 urlopen 方法的位置)时,我收到以下错误消息:

1599147846 WARNING torsocks[20820]: [connect] Connection to a local address are denied since it might be a TCP DNS query to a local DNS server. Rejecting it for safety reasons. (in tsocks_connect() at connect.c:193)
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/socks.py", line 832, in connect
    super(socksocket, self).connect(proxy_addr)
PermissionError: [Errno 1] Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 946, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 917, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
  File "/usr/lib/python3/dist-packages/socks.py", line 100, in wrapper
    return function(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/socks.py", line 844, in connect
    raise ProxyConnectionError(msg, error)
socks.ProxyConnectionError: Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 68, in <module>
    webpage = urlopen(url_name)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error Error connecting to SOCKS5 proxy 127.0.0.1:9050: [Errno 1] Operation not permitted>


此外,我的 torsocks 运行在与此脚本相同的 VM 中,即 Ubuntu v20.04。

有人提到用这个脚本运行“sudo”。但是在这样做的过程中,发生了这种情况:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 5, in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

因此,最初使用“sudo”运行此脚本时,我什至无法进入数据输入提示。然而,以普通用户身份运行这个脚本,它可以识别 socks 模块,从而让我走得更远。

在运行这个脚本之前,我确保我已经安装了 socks、socket 和 beautifulsoup4。我什至尝试安装 bs4('beautifulsoup4' 的缩写)。这是显示的内容:

$ pip3 install bs4
Collecting bs4
  Downloading bs4-0.0.1.tar.gz (1.1 kB)
Requirement already satisfied: beautifulsoup4 in ./.local/lib/python3.8/site-packages (from bs4) (4.9.1)
Requirement already satisfied: soupsieve>1.2 in ./.local/lib/python3.8/site-packages (from beautifulsoup4->bs4) (2.0.1)
Building wheels for collected packages: bs4
  Building wheel for bs4 (setup.py) ... done
  Created wheel for bs4: filename=bs4-0.0.1-py3-none-any.whl size=1273 sha256=912f922932a07d98aa26eca2ba3dde8e761813eea766dfe42617135f038943e4
  Stored in directory: /home/jbottiger/.cache/pip/wheels/75/78/21/68b124549c9bdc94f822c02fb9aa3578a669843f9767776bca
Successfully built bs4
Installing collected packages: bs4
Successfully installed bs4-0.0.1

我使用“sudo”重新运行了脚本,但收到了相同的错误消息:

$ sudo python3 dark_web_scrape_main.py 
Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 5, in <module>
    from bs4 import BeautifulSoup
ModuleNotFoundError: No module named 'bs4'

我发现我没有正确安装 bs4 模块。所以我确保模块安装正确:

sudo apt-get install python3-bs4

重新运行“sudo python3 dark_web_scrape_main.py”,终于打通了输入法部分,但是这次尝试执行urlopen方法时,显示如下错误信息:

About to open the web page
Traceback (most recent call last):
  File "/usr/lib/python3.8/urllib/request.py", line 1326, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib/python3.8/http/client.py", line 1240, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1286, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1235, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1006, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 946, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 917, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 787, in create_connection
    for res in getaddrinfo(host, port, 0, SOCK_STREAM):
  File "/usr/lib/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "dark_web_scrape_main.py", line 68, in <module>
    webpage = urlopen(url_name)
  File "/usr/lib/python3.8/urllib/request.py", line 222, in urlopen
    return opener.open(url, data, timeout)
  File "/usr/lib/python3.8/urllib/request.py", line 525, in open
    response = self._open(req, data)
  File "/usr/lib/python3.8/urllib/request.py", line 542, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
  File "/usr/lib/python3.8/urllib/request.py", line 502, in _call_chain
    result = func(*args)
  File "/usr/lib/python3.8/urllib/request.py", line 1355, in http_open
    return self.do_open(http.client.HTTPConnection, req)
  File "/usr/lib/python3.8/urllib/request.py", line 1329, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>

我想我无法在我的 Ubuntu v20.04 VM 中的 Firefox 浏览器上打开洋葱站点。因此,为了欢笑和笑声,我打开了 Firefox,并在浏览器窗口中输入:“http://xmh57jrzrnw6insl.onion”。它返回“我们无法连接到位于 'http://xmh57jrzrnw6insl.onion' 的服务器”。

我在https://protonmail.com/support/knowledge-base/firefox-onion-sites/上研究了这个特定问题,并按照以下步骤操作:

  1. 在 Firefox 中,在浏览器 URL 字段(也称为搜索栏)中输入“about:config”。
  2. 选择按钮“接受风险并继续”。
  3. 在搜索栏中输入“network.dns.blockDotOnion”。
  4. 此属性的当前设置为“True”;切换为“假”。

重试访问该洋葱站点。还是不行。

我什至通过从以下语句中删除注释标记来更新 /etc/tor/torrc 文件:

ControlPort 9051
CookieAuthorization 1

我还将“CookieAuthorization”属性值修改为“0”。仍然无法访问洋葱站点。

最后,我在 Firefox 的“about:preferences”部分意识到,当我使用 localhost:9050 设置手动代理配置时,我忘记取消选择“Enable DNS over HTTPS”并选择“Proxy DNS when using SOCKS v5”。现在我可以在我的 Firefox 浏览器中访问洋葱站点。但是,在我的脚本中调用 urlopen 方法时,我仍然会遇到错误。请指教。

我的教授建议我在“python3 <script_name>.py 调用前加上“torsocks”。但是,我似乎不能同时使用“sudo”和“torsocks”作为前言。

4

0 回答 0