python - 在 Python 2.x 中拆分 URL

Question

我在一些 HTML 代码中解析了一个链接，如下所示：-

"http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

我要做的是从第二次出现 http: 开始提取代码的第二部分：所以在上述情况下，我想提取

"http://truelink.com/football/abcde.html?"

我已经考虑将 URL 分割成段，但是我不确定随着时间的推移结构是否会与第一部分保持不变。

是否可以识别“http”的第二次出现，然后从那里解析出代码到最后？

score 3 · Accepted Answer

link = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"

link[link.rfind("http://"):]

返回：

"http://truelink.com/football/abcde.html?"

这就是我会做的。rfind查找“http”的最后一次出现并返回索引。在您的示例中，这种情况显然是真实的原始 url。然后，您可以提取以该索引开头的子字符串，直到结束。

因此，如果您有一些字符串myStr，则会在 python 中使用类似数组的表达式提取子字符串：

myStr[0]    # returns the first character
myStr[0:5]  # returns the first 5 letters, so that 0 <= characterIndex < 5
myStr[5:]   # returns all characters from index 5 to the end of the string
myStr[:5]   # is the same like myStr[0:5]

score 0 · Accepted Answer

我会做这样的事情：

addr = "http://advert.com/go/2/12345/0/http://truelink.com/football/abcde.html?"
httpPart = 'http://'
split = addr.split(httpPart)
res = []
for str in split:
    if (len(str) > 0):
        res.append(httpPart+str);
print res

python - 在 Python 2.x 中拆分 URL

2 回答 2

Related

Reference