我正在为经验和知识编写一个搜索引擎。现在,我正在构建一个爬虫及其附带的实用程序。其中之一是 URL 规范化器。这就是我现在正在尝试构建的内容,更具体地说,我被困在我必须制定一种方法来获取 url,并将“%”符号后面的字母大写的地方。到目前为止我的代码:
def escape_sequence_capitalization(url):
''' The method that capitalizes letters in escape sequences.
All letters within a percent - encoding triplet (e.g. '%2C') are case
insensitive and should be capitalized.
'''
next_encounter = None
url_list = []
while True:
next_encounter = url.find('%')
if next_encounter == -1:
break
for letter in url[:next_encounter]:
url_list.append(letter)
new_character = url[next_encounter + 1].upper()
url_list.append(new_character)
url = url[next_encounter:]
for letter in url:
url_list.append(letter)
return ''.join(url_list)
有人可以指导我发现我的错误在哪里吗?我将感激不尽。谢谢你。
编辑:这就是我想要实现的目标:
http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b