import scrapy
import logging

class CountriesSpider(scrapy.Spider):
    name = 'countries'
    allowed_domains = ['www.worldometers.info']
    start_urls = ['https://www.worldometers.info/world-population/population-by-country/']
    def parse(self, response):
        countries = response.xpath("//td/a")
        for country in countries:
        name = country.xpath(".//text()").get()
        link = country.xpath(".//@href").get()
        # absolute_url = f"https://www.worldometers.info{link}"
        # absolute_url = response.urljoin(link)

        yield response.follow(url=link, callback=self.parse_country, meta={'country_name':name})

def parse_country(self, response):
    name = response.request.meta['country_name']
    rows = response.xpath("(//table[@class='table table-striped table-bordered table-hover table-condensed table-list'])[1])[1]/tbody/tr")
    for row in rows:
        year = row.xpath(".//td[1]/text()").get()
        population = row.xpath(".//td[2]/strong/text()").get()
        yield {
            'year': year,


(new_Virtual_workspace) SubhrajyotisAir:worldometer subhrajyotisaha$ scrapy crawl countries

2021-05-29 23:33:14 [scrapy.utils.log] INFO: Scrapy 2.4.1 started (bot: worldometer)

2021-05-29 23:33:14 [scrapy.utils.log] INFO: Versions: lxml, libxml2 2.9.10, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 21.2.0, Python 3.8.10 (default, May 19 2021, 11:01:55) - [Clang 10.0.0 ], pyOpenSSL 20.0.1 (OpenSSL 1.1.1k  25 Mar 2021), cryptography 3.4.7, Platform macOS-10.14.1-x86_64-i386-64bit

2021-05-29 23:33:14 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.selectreactor.SelectReactor

2021-05-29 23:33:14 [scrapy.crawler] INFO: Overridden settings:

{'BOT_NAME': 'worldometer',

 'NEWSPIDER_MODULE': 'worldometer.spiders',


 'SPIDER_MODULES': ['worldometer.spiders']}

2021-05-29 23:33:14 [scrapy.extensions.telnet] INFO: Telnet Password: 87f0a20eef9428d7

2021-05-29 23:33:14 [scrapy.core.engine] INFO: Spider opened

2021-05-29 23:33:14 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)

2021-05-29 23:33:14 [scrapy.extensions.telnet] INFO: Telnet console listening on

2021-05-29 23:33:18 [scrapy.core.engine] DEBUG: Crawled (404) <GET https://www.worldometers.info/robots.txt> (referer: None)

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 2 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 10 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 12 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 14 without any user agent to enforce it on.

2021-05-29 23:33:18 [protego] DEBUG: Rule at line 16 without any user agent to enforce it on.

2021-05-29 23:33:19 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.worldometers.info/world-population/population-by-country/> (referer: None)

2021-05-29 23:33:20 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.worldometers.info/world-population/ethiopia-population/> (referer: https://www.worldometers.info/world-population/population-by-country/)

2021-05-29 23:33:20 [scrapy.core.scraper] ERROR: Spider error processing <GET https://www.worldometers.info/world-population/ethiopia-population/> (referer: https://www.worldometers.info/world-population/population-by-country/)

Traceback (most recent call last):

  File "/Users/subhrajyotisaha/opt/anaconda3/envs/new_Virtual_workspace/lib/python3.8/site-packages/parsel/selector.py", line 236, in xpath

    result = xpathev(query, namespaces=nsp,

  File "src/lxml/etree.pyx", line 1582, in lxml.etree._Element.xpath

  File "src/lxml/xpath.pxi", line 305, in lxml.etree.XPathElementEvaluator.__call__

  File "src/lxml/xpath.pxi", line 225, in lxml.etree._XPathEvaluatorBase._handle_result

lxml.etree.XPathEvalError: Invalid expression


我正在使用 conda 虚拟工作空间环境和 vs code - macos。


