Repairing Inconsistent Ceph PGs
My hyperconverged Ceph cluster was reporting an inconsistent placement group on one of my OSDs. Proxmox doesn’t provide any method in the UI to repair these, so I had to SSH into the node and run a repair using the Ceph CLI.
- Check for inconsistent placement groups (PGs):
ceph pg dump pgs | grep "inconsistent"
- The first column of the output lists the PGID (e.g.
13.d
). The first part (13
) is the ID of the pool. You can see all the pools by ID and name with:
ceph osd pool ls detail | awk '{print $2, $3}'
- You may get more info on the exact objects affected with these commands, where
<pool_name>
and<pgid>
are from the outputs above.
rados list-inconsistent-pg <pool_name>
rados list-inconsistent-obj <pgid>
rados list-inconsistent-snapset <pgid>
- Start a repair on the errored PGs with:
ceph pg repair <pgid>