0

我正在尝试找出一种可以使用python从craigslists获取电子邮件或电话的方法。

我已经使用 python-craigslist 来获取帖子,但我找不到任何关于电子邮件或其他联系信息的信息

我试过这个:

import requests

url = "https://chandigarh.craigslist.org/reply/ixc/hum/7220389776/mailto"

head = {
"Accept": "*/*",
"Accept-Encoding": "gzip, deflate, br",
"Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
"Connection": "keep-alive",
"Content-Length": "344",
"Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
"DNT": "1",
"Host": "chandigarh.craigslist.org",
"Origin": "https://chandigarh.craigslist.org",
"Referer": "https://chandigarh.craigslist.org/hum/d/hr-outsourcing-company-in-mohali/7220389776.html",
"Sec-Fetch-Dest": "empty",
"Sec-Fetch-Mode": "cors",
"Sec-Fetch-Site": "same-origin",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
"X-Requested-With": "XMLHttpRequest"
}
post_data = {"n": "U2FsdGVkX184MDg3MTgwOF5nEvY336v771unnxU7b9fc52-DzxhmmxcCYwQ6uylAsvUK2atZ1Ot3zWsSF4ukqvM9BMFMnNA_L00i0jQ5DhiZkfobQq1avkovyPJ3IcQbWM4327VdEQUipMzU6XfOXn5xsLqQ9Tt-L1qJdM55e2Ac11nzeaFCRV7HgpYmmIdrjpESKZpp0dhTh2p5d826f9CSBa4ldNRg0pLswm5P3JXaYGTe4Z7Fe5NB1Jfs3-CBWFdy2ZzqIA345q_YfXUatIMoq1TwN3lc_ee8rKnLKJmQwPHPpLoQHRP9aioeMOBv17okylBLm8uhduZ6HawCRg"}

resp = requests.post(url, headers=head,data= post_data)

print(resp.text)

但没有回应

4

2 回答 2

1

以下是我成功收到电子邮件的代码。

  1. 获取电子邮件 iram 以获取此链接 在此处输入图像描述

  2. 使用 2captch 解决验证码

  3. 发送已解决的验证码响应以获取下一个令牌

  4. 而不是使用该令牌向 mailto 发送发布请求以获取电子邮件。

  5. 它还为每个请求使用不同的标头,因此我为每个请求硬编码了不同的标头。

    resp = r.post(url, headers=head, data=post_data, allow_redirects=True)
    asd = resp.json()
    capin = str(asd["nonce"])
    print("Capcha In Request Successfull...")
    
    solver = TwoCaptcha("API_KEY")
    
    try:
        print("Solving Captcha Using 2Captcha")
        result = solver.hcaptcha(
            sitekey='0c3a1de8-e8df-4e01-91b6-6995c4ade451',
            url=ifram_url
        )
    
    except Exception as e:
        print(e)
    
    else:
        captcha = result["code"]
        print("Captcha Solved Successfully")
    
    post_data1 = {"h-captcha-response": str(captcha),
                  "n": capin
                  }
    url1 = capt_link
    c_len = len(captcha)+len(capin)+22
    head1 = {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
        "Connection": "keep-alive",
        "Content-Length": str(c_len),
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Cookie": cookie,
        "DNT": "1",
        "Host": host,
        "Origin": origin,
        "Referer": main_url,
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    
    resp = r.post(url1, headers=head1, data=post_data1, allow_redirects=True)
    bsd = resp.json()
    capout = str(bsd["nonce"])
    print("Captcha Out Request Successfull...")
    
    c_len1 = len(capout)+2
    
    post_data2 = {"n": capout}
    url2 = milto_link
    head2 = {
        "Accept": "*/*",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-GB,en-US;q=0.9,en;q=0.8",
        "Connection": "keep-alive",
        "Content-Length": str(c_len1),
        "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
        "Cookie": cookie,
        "DNT": "1",
        "Host": host,
        "Origin": origin,
        "Referer": main_url,
        "Sec-Fetch-Dest": "empty",
        "Sec-Fetch-Mode": "cors",
        "Sec-Fetch-Site": "same-origin",
        "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36",
        "X-Requested-With": "XMLHttpRequest"
    }
    
    resp = r.post(url2, headers=head2, data=post_data2, allow_redirects=True)
    
    defg = resp.json()
    soup = BeautifulSoup(defg["email"], "html.parser")
    email = soup.find("input").get("value")
    print(email)
    Final_Email.append(email)
    
于 2020-12-11T16:31:19.957 回答
0

要抓取 Craigslist,请使用 pyquery Python 包:https ://pypi.python.org/pypi/pyquery

有关电子邮件地址/电话号码的正则表达式,请参阅此页面上的示例:http ://www.regular-expressions.info/

要存储电子邮件地址,您可以只输出到 csv。您可以在那里阅读如何操作:https ://docs.python.org/2/library/csv.html

如果您想使用此电子邮件并向他们发送消息,您可能需要检查此插件:https ://addons.mozilla.org/en-US/thunderbird/addon/mail-merge/

另外,我建议您保持乐观并在那里阅读更多内容:http ://en.wikipedia.org/wiki/Morality

于 2020-11-17T23:48:10.147 回答