我在使用具有多个 OSD 的单个数据承载主机的 rook (v1.5.7) 安装 ceph 时遇到了这个问题。
安装附带一个默认的CRUSH 规则 replicated_rule
,该规则具有host
默认的故障域:
$ ceph osd crush rule dump replicated_rule
{
"rule_id": 0,
"rule_name": "replicated_rule",
"ruleset": 0,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "chooseleaf_firstn",
"num": 0,
"type": "host"
},
{
"op": "emit"
}
]
}
我必须找出与 pg 1 关联的池名称“过小”,幸运的是,在默认的 rook-ceph 安装中,只有一个:
$ ceph osd pool ls
device_health_metrics
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+undersized+remapped
并确认 pg 使用默认规则:
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule
我没有修改默认的 CRUSH 规则,而是选择创建一个新的复制规则,但这次指定osd
(又名device
)类型(文档:CRUSH map Types and Buckets),同时假设默认的 CRUSH 根为default
:
# osd crush rule create-replicated <name> <root> <type> [<class>]
$ ceph osd crush rule create-replicated replicated_rule_osd default osd
$ ceph osd crush rule dump replicated_rule_osd
{
"rule_id": 1,
"rule_name": "replicated_rule_osd",
"ruleset": 1,
"type": 1,
"min_size": 1,
"max_size": 10,
"steps": [
{
"op": "take",
"item": -1,
"item_name": "default"
},
{
"op": "choose_firstn",
"num": 0,
"type": "osd"
},
{
"op": "emit"
}
]
}
然后将新规则分配给现有池:
$ ceph osd pool set device_health_metrics crush_rule replicated_rule_osd
set pool 1 crush_rule to replicated_rule_osd
$ ceph osd pool get device_health_metrics crush_rule
crush_rule: replicated_rule_osd
最后确认 pg 状态:
$ ceph pg ls-by-pool device_health_metrics
PG OBJECTS DEGRADED ... STATE
1.0 0 0 ... active+clean