python - 简单的python / Beautiful Soup 类型的问题

Question

我正在尝试使用Beautiful Soup提取的超链接的 href 属性进行一些简单的字符串操作：

from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup('<a href="http://www.some-site.com/">Some Hyperlink</a>')
href = soup.find("a")["href"]
print href
print href[href.indexOf('/'):]

我得到的是：

Traceback (most recent call last):
  File "test.py", line 5, in <module>
    print href[href.indexOf('/'):]
AttributeError: 'unicode' object has no attribute 'indexOf'

我应该如何将任何href内容转换为普通字符串？

score 10 · Accepted Answer

Python 字符串没有indexOf方法。

采用href.index('/')

href.find('/')类似。但如果找不到字符串则find返回，同时引发一个.-1indexValueError

所以正确的做法是使用index（因为 '...'[-1] 将返回字符串的最后一个字符）。

score 0 · Accepted Answer

href 是一个 unicode 字符串。如果您需要常规字符串，请使用

regular_string = str(href)

score 0 · Accepted Answer

0

您的意思是 find()，而不是 indexOf()。

关于字符串的 Python 文档。

于 2009-07-20T13:47:23.797 回答

python - 简单的python / Beautiful Soup 类型的问题

3 回答 3

Related

Reference