ceph-ansible 使用

Ceph的部署有很多种,从最早的手动部署,到后来的Ceph-deploy,再到目前比较火的容器部署等等。但能作为生产环境部署的却不多,其中Ceph-ansible算是生产环境部署的方法之一。Ceph-ansible是基于ansible工具完成Ceph部署的。对于生产环境中动则几十、成百上千的节点数量如果一台一台安装配置效率太低。引入ansible工具可以快速完成Ceph集群的安装配置,大大提高效率。让运维人员从机械重复的工作中解脱出来。

使用

环境

三个节点,其中一个节点用作ansible和Ceph节点的复用节点。

节点IP 节点角色 OS
172.30.12.137 ansible节点、ceph节点 CentOS Linux release 7.7.1908
172.30.12.197 ceph节点 CentOS Linux release 7.7.1908
172.30.12.227 ceph节点 CentOS Linux release 7.7.1908

本文中使用的ansible工具运行在一个 CentOS7.6.1810 的容器内。

安装

ceph-ansible是基于ansible工作的,所以要先安装ansible。既然要安装ansible,首先要先搞清楚安装哪个版本?

本文部署的Ceph版本为mimic,官方推荐可以使用stable-3.1stable-3.2,我选择了较新的stable-3.2stable-3.2对应的ansible版本为2.6。(更多关于版本对应关系请见官网

选择完了ansible版本接下来就可以安装了,安装分为两种方式

  • pip 安装
  • yum/apt 安装

由于 yum/apt 安装版本选择范围比较狭窄,所以本人推荐使用 pip 安装。(关于pip安装请见pypa官网)

首先clone ceph-ansible project

1
2
# git clone git@github.com:ceph/ceph-ansible.git
# git checkout -b 3.2.0 v3.2.0

然后使用ceph-ansible中推荐的requirements.txt安装对应版本的ansible

1
pip install -r ./ceph-ansible/requirements.txt

待安装完成后,查看确认ansible版本

1
2
3
4
5
6
7
# ansible --version
ansible 2.6.19
config file = /root/ceph-ansible/ansible.cfg
configured module search path = [u'/root/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
ansible python module location = /usr/lib/python2.7/site-packages/ansible
executable location = /usr/bin/ansible
python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]

配置

ansible 配置

ansible inventory 自定义的配置可以放置在任何位置,只是在运行ansible时增加-i {inventory host path}指定其路径即可。若不指定inventory文件,ansible将去/etc/ansible/hosts这个路径去找。

inventory配置:

1
2
3
4
5
6
7
8
9
[mons]
172.30.12.197 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

[osds]
172.30.12.137 ansible_ssh_user=root ansible_ssh_pass=1234\$\#
172.30.12.197 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

[mgrs]
172.30.12.137 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

密码中若存在特殊字符$,#等,需要使用\\进行转义。
配置完成后,可使用ansible all -i {inventory host} -m ping测试节点连通情况

1
2
3
4
5
6
7
8
9
10
11
12
13
# ansible all -i dummy-ansible-hosts -m ping
172.30.12.197 | SUCCESS => {
"changed": false,
"ping": "pong"
}
172.30.12.227 | SUCCESS => {
"changed": false,
"ping": "pong"
}
172.30.12.137 | SUCCESS => {
"changed": false,
"ping": "pong"
}

ceph-ansible 配置

ceph-ansible的配置,主要是对group变量的配置,一般场景中我们不需要修改role。

配置 all.yml

首先:

1
cp ./gourp_vars/all.yml.sample ./gourp_vars/all.yml

然后,修改all.yml中的配置,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# Inventory host group variables
mon_group_name: mons
osd_group_name: osds
#rgw_group_name: rgws
#mds_group_name: mdss
#nfs_group_name: nfss
#restapi_group_name: restapis
#rbdmirror_group_name: rbdmirrors
#client_group_name: clients
#iscsi_gw_group_name: iscsigws
#mgr_group_name: mgrs
# 上述 Inventory host 相关变量的默认值与 Inventory配置中的标签名一致,也就是说这里注释打开与否没有影响
...
# If configure_firewall is true, then ansible will try to configure the
# appropriate firewalling rules so that Ceph daemons can communicate
# with each others.
#configure_firewall: True
configure_firewall: False
# 建议Firewall配不明白的同学将Firewall关闭,免得找麻烦。
...
# Set type of NTP client daemon to use, valid entries are chronyd, ntpd or timesyncd
# Note that this selection is currently ignored on containerized deployments
#ntp_daemon_type: timesyncd
ntp_daemon_type: chronyd
# Ceph需要做时间同步,具体用什么做可根据自己环境来选择。目前提供支持的有三种 chronyd, ntpd, timesyncd
...
# ORIGIN SOURCE
#
# Choose between:
# - 'repository' means that you will get ceph installed through a new repository. Later below choose between 'community', 'rhcs', 'dev' or 'obs'
# - 'distro' means that no separate repo file will be added
# you will get whatever version of Ceph is included in your Linux distro.
# 'local' means that the ceph binaries will be copied over from the local machine
#ceph_origin: dummy
ceph_origin: repository
#valid_ceph_origins:
# - repository
# - distro
# - local

ceph_repository: community
#valid_ceph_repository:
# - community
# - rhcs
# - dev
# - uca
# - custom
# - obs

# REPOSITORY: COMMUNITY VERSION
#
# Enabled when ceph_repository == 'community'
#
#ceph_mirror: http://download.ceph.com
#ceph_stable_key: https://download.ceph.com/keys/release.asc
#ceph_stable_release: dummy
#ceph_stable_repo: "{{ ceph_mirror }}/debian-{{ ceph_stable_release }}"
ceph_mirror: http://mirrors.163.com/ceph
ceph_stable_key: https://mirrors.163.com/ceph/keys/release.asc
ceph_stable_release: mimic
ceph_stable_repo: "{{ ceph_mirror }}/rpm-{{ ceph_stable_release }}"
# Ceph 软件包的安装方式,有三种,repository使用一个新的源进行安装;distro使用linux发行版自带的源进行安装;local使用本地安装包的形式进行安装。
# 具体情况根据自身要求而定吧。
...
## Monitor options
#
# You must define either monitor_interface, monitor_address or monitor_address_block.
# These variables must be defined at least in all.yml and overrided if needed (inventory host file or group_vars/*.yml).
# Eg. If you want to specify for each monitor which address the monitor will bind to you can set it in your **inventory host file** by using 'monitor_address' variable.
# Preference will go to monitor_address if both monitor_address and monitor_interface are defined.
#monitor_interface: interface
monitor_interface: ens33
#monitor_address: 0.0.0.0
#monitor_address_block: subnet
# set to either ipv4 or ipv6, whichever your network is using
#ip_version: ipv4
#mon_use_fqdn: false # if set to true, the MON name used will be the fqdn in the ceph.conf
# Monitor的配置,必须要在 interface, address, address_block 中选择一个定义。(更多使用方法,请仔细阅读上面的英文吧。)
...
## OSD options
#
#is_hci: false
#hci_safety_factor: 0.2
#non_hci_safety_factor: 0.7
#osd_memory_target: 4294967296
#journal_size: 5120 # OSD journal size in MB
journal_size: 1024 # OSD journal size in MB
#block_db_size: -1 # block db size in bytes for the ceph-volume lvm batch. -1 means use the default of 'as big as possible'.
#public_network: 0.0.0.0/0
public_network: 172.30.12.0/24
cluster_network: 172.30.12.0/24
#cluster_network: "{{ public_network | regex_replace(' ', '') }}"
#osd_mkfs_type: xfs
#osd_mkfs_options_xfs: -f -i size=2048
#osd_mount_options_xfs: noatime,largeio,inode64,swalloc
#osd_objectstore: bluestore
osd_objectstore: filestore
# 根据硬盘的存储介质与速度决定Journal size的大小;配置public,cluster newtwork;选择objectstore filestore or bluestore
...

配置 osds.yml

首先

1
cp ./gourp_vars/osds.yml.sample ./gourp_vars/osds.yml

然后,修改osds.yml配置,如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
...
# Even though OSD nodes should not have the admin key
# at their disposal, some people might want to have it
# distributed on OSD nodes. Setting 'copy_admin_key' to 'true'
# will copy the admin key to the /etc/ceph/ directory
#copy_admin_key: false
copy_admin_key: true
# 根据个人喜好来吧,我喜欢各个OSD节点都有admin key
...
# Declare devices to be used as OSDs
# All scenario(except 3rd) inherit from the following device declaration
# Note: This scenario uses the ceph-disk tool to provision OSDs

devices:
- /dev/sdb
- /dev/sdc
# - /dev/sdd
# - /dev/sde

#devices: []
# 根据主机情况配置硬盘路径相关信息
...
osd_scenario: collocated
#valid_osd_scenarios:
# - collocated
# - non-collocated
# - lvm
# collocated是将日志和数据部署到同一个硬盘上;non-collocated是分硬盘部署日志和数据;这两个选项都是使用ceph-disk创建的。
# lvm使用ceph-volume创建osd,需要指定vg、lvm等信息。
# 此处具体信息,请见`osds.yml.sample`里面有很详细的注解。
...

配置 mgrs.yml

首先

1
cp ./gourp_vars/mgrs.yml.sample ./gourp_vars/mgrs.yml

然后,修改mgrs.yml配置,如下:

1
2
3
4
5
6
7
...
###########
# MODULES #
###########
# Ceph mgr modules to enable, current modules available are: status,dashboard,localpool,restful,zabbix,prometheus,influx
ceph_mgr_modules: [status]
...

部署

集群初始化部署

首先

1
cp ./site.yml.sample ./site.yml

然后

1
2
ansible-playbook -i {inventory host} site.yml
...

部署过程需要下载RPM包安装,所以网速对部署速度的影响很大。请耐心等待。

增加OSD节点部署

首先

1
cp ./infrastructure-playbooks/add-osd.yml ./add-osd.yml

然后,修改inventory host文件,在[osds]中增加172.30.12.227这条记录,相当于增加一个osd节点。

1
2
3
4
5
6
7
8
9
10
[mons]
172.30.12.197 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

[osds]
172.30.12.137 ansible_ssh_user=root ansible_ssh_pass=1234\$\#
172.30.12.197 ansible_ssh_user=root ansible_ssh_pass=1234\$\#
172.30.12.227 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

[mgrs]
172.30.12.137 ansible_ssh_user=root ansible_ssh_pass=1234\$\#

最后

1
ansible-playbook -i {inventory host} add-osd.yml

清除集群部署

首先

1
cp ./infrastructure-playbooks/purge-cluster.yml ./purge-cluster.yml

然后

1
ansible-playbook -i {inventory host} purge-cluster.yml

Ceph配置文件分发

配置文件分发各个节点

1
ansible all -i dummy-ansible-hosts -m copy -a "src=/root/ceph.conf dest=/etc/ceph/"

参考&鸣谢