我试图使用 TumblrAPI,具体来说是 PyTumblr,以抓取带有特定标签的帖子中的一些图像,
这就是我使用的代码,非常简单:
import pytumblr
from bs4 import BeautifulSoup
# Authenticate via API Key
client = pytumblr.TumblrRestClient('#Here is my API Key#')
print client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0)
所以结果是这样的:
{
"meta": {
"status": 200,
"msg": "OK"
},
"response": {
"blog": {
"title": "W é r G i d A",
"name": "wergida",
"total_posts": 1181,
"posts": 1181,
"url": "http://wergida.tumblr.com/",
"updated": 1466319493,
"description": "Ha bárkit érdekelne",
"is_nsfw": false,
"ask": false,
"ask_page_title": "Ask me anything",
"ask_anon": false,
"share_likes": true,
"likes": 1131
},
"posts": [
{
"blog_name": "wergida",
"id": 136740690571,
"post_url": "http://wergida.tumblr.com/post/136740690571/bernhard-bernd-becher-1931-2007-and-hilla",
"slug": "bernhard-bernd-becher-1931-2007-and-hilla",
"type": "photo",
"date": "2016-01-06 11:30:23 GMT",
"timestamp": 1452079823,
"state": "published",
"format": "html",
"reblog_key": "TiOl8nWT",
"tags": [
"industrial facades",
"bernd and hilla becher",
"photography",
"eisenhüttenstadt",
"brandenburg"
],
"short_url": "https://tmblr.co/ZaE70t1-MOLgB",
"summary": "Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT...",
"recommended_source": null,
"recommended_color": null,
"highlighted": [],
"note_count": 2,
"caption": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br/></p>",
"reblog": {
"tree_html": "",
"comment": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br></p>"
},
"trail": [
{
"blog": {
"name": "wergida",
"active": true,
"theme": {
"avatar_shape": "square",
"background_color": "#FAFAFA",
"body_font": "Helvetica Neue",
"header_bounds": "",
"header_image": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
"header_image_focused": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
"header_image_scaled": "https://secure.assets.tumblr.com/images/default_header/optica_pattern_05.png?_v=671444c5f47705cce40d8aefd23df3b1",
"header_stretch": true,
"link_color": "#529ECC",
"show_avatar": true,
"show_description": true,
"show_header_image": true,
"show_title": true,
"title_color": "#444444",
"title_font": "Gibson",
"title_font_weight": "bold"
},
"share_likes": true,
"share_following": false
},
"post": {
"id": "136740690571"
},
"content_raw": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br></p>",
"content": "<p>Bernhard ‘Bernd’ Becher (1931-2007) and Hilla Becher (1934-2015): Eisenhüttenstadt, Brandenburg. Industrial Facades, The MIT Press, 1995.<br /></p>",
"is_current_item": true,
"is_root_item": true
}
],
"image_permalink": "http://wergida.tumblr.com/image/136740690571",
"photos": [
{
"caption": "",
"alt_sizes": [
{
"url": "https://67.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_1280.jpg",
"width": 1280,
"height": 973
},
{
"url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_500.jpg",
"width": 500,
"height": 380
},
{
"url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_400.jpg",
"width": 400,
"height": 304
},
{
"url": "https://65.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_250.jpg",
"width": 250,
"height": 190
},
{
"url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_100.jpg",
"width": 100,
"height": 76
},
{
"url": "https://66.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_75sq.jpg",
"width": 75,
"height": 75
}
],
"original_size": {
"url": "https://67.media.tumblr.com/ea41a17d0febfd019c7afae5fcc6c51e/tumblr_nzk87tVlqk1s5ljg4o1_1280.jpg",
"width": 1280,
"height": 973
}
}
]
}
],
"total_posts": 223
}
}
但是当我使用 BeautifulSoup 解析我得到的信息时:
soup = BeautifulSoup(client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0),"lxml")
我懂了:
Traceback (most recent call last):
File "tumblr_test.py", line 29, in <module>
soup = BeautifulSoup(client.posts('wergida.tumblr.com', type='photo', tag='BERND AND HILLA BECHER', limit=1, offset=0),"lxml")
File "/Users/CB/Public/scrapy/env/lib/python2.7/site-packages/bs4/__init__.py", line 199, in __init__
if markup[:5] == "http:" or markup[:6] == "https:":
TypeError: unhashable type
而且我尝试了不同的解析器,如“html.parser”“html5lib”,仍然得到同样的错误。
感谢您提供任何线索!