1

我知道您不应该在单个节点上创建 ceph 集群。但这只是一个小型私人项目,因此我没有资源或需要真正的集群。

但我想建立一个集群,但我遇到了一些问题。目前我的集群已关闭,并且出现以下健康问题。

[root@rook-ceph-tools-6bdcd78654-vq7kn /]# ceph status
  cluster:
    id:     12d9fbb9-73f3-4229-9ef4-6b7670324629
    health: HEALTH_WARN
            Reduced data availability: 33 pgs inactive
            68 slow ops, oldest one blocked for 26686 sec, osd.0 has slow ops
 
  services:
    mon: 1 daemons, quorum g (age 15m)
    mgr: a(active, since 44m)
    osd: 1 osds: 1 up (since 8m), 1 in (since 9m)
 
  data:
    pools:   2 pools, 33 pgs
    objects: 0 objects, 0 B
    usage:   1.0 GiB used, 465 GiB / 466 GiB avail
    pgs:     100.000% pgs unknown
             33 unknown

[root@rook-ceph-tools-6bdcd78654-vq7kn /]# ceph health detail
HEALTH_WARN Reduced data availability: 33 pgs inactive; 68 slow ops, oldest one blocked for 26691 sec, osd.0 has slow ops
[WRN] PG_AVAILABILITY: Reduced data availability: 33 pgs inactive
    pg 2.0 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.0 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.2 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.3 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.4 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.5 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.6 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.7 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.8 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.9 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.a is stuck inactive for 44m, current state unknown, last acting []
    pg 3.b is stuck inactive for 44m, current state unknown, last acting []
    pg 3.c is stuck inactive for 44m, current state unknown, last acting []
    pg 3.d is stuck inactive for 44m, current state unknown, last acting []
    pg 3.e is stuck inactive for 44m, current state unknown, last acting []
    pg 3.f is stuck inactive for 44m, current state unknown, last acting []
    pg 3.10 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.11 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.12 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.13 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.14 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.15 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.16 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.17 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.18 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.19 is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1a is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1b is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1c is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1d is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1e is stuck inactive for 44m, current state unknown, last acting []
    pg 3.1f is stuck inactive for 44m, current state unknown, last acting []
[WRN] SLOW_OPS: 68 slow ops, oldest one blocked for 26691 sec, osd.0 has slow ops

ceph 版本 15.2.3 (d289bbdec69ed7c1f516e0a093594580a76b78d0) 章鱼(稳定)

客户端版本:version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:58: 53Z”,GoVersion:“go1.13.9”,编译器:“gc”,平台:“linux/amd64”}

服务器版本:version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:51: 04Z”,GoVersion:“go1.13.9”,编译器:“gc”,平台:“linux/amd64”}

kubeadm 版本:&version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.6", GitCommit:"dff82dc0de47299ab66c83c626e08b245ab19037", GitTreeState:"clean", BuildDate:"2020-07-15T16:56: 34Z”,GoVersion:“go1.13.9”,编译器:“gc”,平台:“linux/amd64”}

如果有人知道从哪里开始或如何解决我的问题,请帮助!

4

1 回答 1

0

是的,同意上面提到的eblock。如果每个 OSD 上至少有 3 个对象副本,则应该有 3 个以上的 OSD(最少 3 个磁盘,或 3 个卷......无论如何)。归置组中的对象内容存储在一组 OSD 中,归置组不拥有 OSD,它们与来自同一个池甚至其他池的其他归置组共享它。

  • 如果一个 OSD 发生故障并且它包含的对象的所有副本都将丢失。对于放置组中的所有对象,副本的数量突然从三个下降到两个。Ceph 通过选择一个新的 OSD 来重新创建所有对象的第三个副本来开始恢复这个归置组。

  • 如果同一归置组中的另一个 OSD 在新 OSD 被第三个副本完全填充之前发生故障。一些对象将只有一个幸存的副本。

  • 如果同一归置组中的第三个 OSD 在恢复完成之前发生故障,则此 OSD 包含对象的唯一剩余副本,它将永久丢失。

因此,在创建池期间选择正确的 pg 号非常重要:

总 PGs = ( OSDs×100)/poolsize

池大小是副本数(在本例中为 3 )

于 2020-12-18T15:51:02.750 回答