2

I am using Scrapy, XPath, and Python to scrape a website. When I get the results, it has \r\n. A google search has yielded that I need to use normalize-space() on my XPath. When I do it, see below, it does not work.

item ['runs'] = stats.select((normalize-space('//tr[@class="cell1"]/td[3]/text()')[count])).extract()

I get a "Global name normalize is not defined error.

Any ideas?

4

1 回答 1

7

normalize-space is a part of XPath, not Python. So there is no such a function in Python or some other libs. The right usage of it is like this (just for a sample):

stats.select('''//tr[normalize-space(td/text()) = 'User Name']''').extract()

Just for drop the whitespaces of a a string in python, you can use str methods. For example: strip will remove the leading and trailing whitespaces.

>>> '\r\n\rsample\r\n'.strip()
'sample'

Something like normalize-space:

>>> ' '.join('\r\ns  am  \r\n ple\r\n'.split())
's am ple'
于 2013-08-06T05:30:24.747 回答