我需要从 URL 路径中提取一些文本,但我对正则表达式知之甚少。
import re
url = '/s/GETTHISDATA/canBeIgnored/canBeIgnored'
myData = #needs to equal GETTHISDATA
我需要从 URL 路径中提取一些文本,但我对正则表达式知之甚少。
import re
url = '/s/GETTHISDATA/canBeIgnored/canBeIgnored'
myData = #needs to equal GETTHISDATA
看这个:
>>> import re
>>> url = '/s/GETTHISDATA/canBeIgnored/canBeIgnored'
>>> re.findall('(?<=/).*?(?=/)', url)
['s', 'GETTHISDATA', 'canBeIgnored']
>>> re.findall('(?<=/).*?(?=/)', url)[1]
'GETTHISDATA'
>>>
/
这样做是在两个's之间捕获零个或多个字符(非贪婪地) 。为了更清楚,这里有一个细分:
(?<=/) # Poisitive look-back assertion to test if text is preceded by a /
.*? # Zero or more non-whitespace characters that are matched non-greedily
(?=/) # Positive look-ahead assertion to test if text is followed by a /
但是,一个更清洁的非正则表达式解决方案是拆分/
:
>>> url.split('/')
['', 's', 'GETTHISDATA', 'canBeIgnored', 'canBeIgnored']
>>> url.split('/')[2]
'GETTHISDATA'
>>>
就个人而言,我会使用第二种解决方案。正则表达式在这里似乎有点矫枉过正。