使用 RabbitMQ 3.7.0 开始内置的 rabbitmq_peer_discovery_k8s 插件来实现各实例自动发现,最终用 StatefulSet 部署集群。

一、支持 RabbitMQ 在 k8s 内自动发现

官方文档此处说明:https://www.rabbitmq.com/cluster-formation.html#peer-discovery-k8s

集群内部署用到了内置的 rabbitmq_peer_discovery_k8s 插件来实现各实例自动发现。RabbitMQ 3.7.0 及之后版本镜像自带有,更旧的版本需要参考文档说明手动安装环境(制作 Docker 镜像)。

首先在 k8s 内创建包含 get endpoint 的 RBAC 使 rabbitmq 可以发现其他节点地址,文件参考:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# filename: rabbitmq_rbac.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rabbitmq
  namespace: rabbitmq
automountServiceAccountToken: true
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: rabbitmq
rules:
- apiGroups: [""]
  resources: ["endpoints"]
  verbs: ["get"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1beta1
metadata:
  name: endpoint-reader
  namespace: rabbitmq
subjects:
- kind: ServiceAccount
  name: rabbitmq
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: endpoint-reader

二、利用 Configmap 添加配置文件

包含两个文件,enabled_plugins 指明了初始化时要加载的插件,rabbitmq.conf 重要在使用 hostname 发现。

configmap 定义文件参考:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
# filename: rabbitmq-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: rabbitmq-config
  namespace: rabbitmq
data:
  enabled_plugins: | [rabbitmq_management,rabbitmq_peer_discovery_k8s,rabbitmq_shovel,rabbitmq_shovel_management].

  rabbitmq.conf: |
      ## Cluster formation. See https://www.rabbitmq.com/cluster-formation.html to learn more.
      cluster_formation.peer_discovery_backend  = rabbit_peer_discovery_k8s
      cluster_formation.k8s.host = kubernetes.default.svc.cluster.local
      ## Should RabbitMQ node name be computed from the pod's hostname or IP address?
      ## IP addresses are not stable, so using [stable] hostnames is recommended when possible.
      ## Set to "hostname" to use pod hostnames.
      ## When this value is changed, so should the variable used to set the RABBITMQ_NODENAME
      ## environment variable.
      cluster_formation.k8s.address_type = hostname
      ## How often should node cleanup checks run?
      cluster_formation.node_cleanup.interval = 30
      ## Set to false if automatic removal of unknown/absent nodes
      ## is desired. This can be dangerous, see
      ##  * https://www.rabbitmq.com/cluster-formation.html#node-health-checks-and-cleanup
      ##  * https://groups.google.com/forum/#!msg/rabbitmq-users/wuOfzEywHXo/k8z_HWIkBgAJ
      cluster_formation.node_cleanup.only_log_warning = true
      cluster_partition_handling = autoheal
      ## See https://www.rabbitmq.com/ha.html#master-migration-data-locality
      queue_master_locator=min-masters
      ## This is just an example.
      ## This enables remote access for the default user with well known credentials.
      ## Consider deleting the default user and creating a separate user with a set of generated
      ## credentials instead.
      ## Learn more at https://www.rabbitmq.com/access-control.html#loopback-users
      loopback_users.guest = false
      #
      # memory limit
      vm_memory_high_watermark.absolute = 1024M
      # disk usage limit
      disk_free_limit.absolute = 2GB      
---

三、部署 Rabbitmq StatefulSet 集群

首先参考下面 StatefulSet 声明文件 创建 workload,然后等待 rabbitmq 三节点启动完成。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
kind: Service
apiVersion: v1
metadata:
  namespace: rabbitmq
  name: rabbitmq
  labels:
    app: rabbitmq
spec:
  type: NodePort
  ports:
   - name: http
     protocol: TCP
     port: 15672
     targetPort: 15672
     nodePort: 15672
   - name: amqp
     protocol: TCP
     port: 5672
     targetPort: 5672
     nodePort: 5672
  selector:
    app: rabbitmq
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: rabbitmq
  namespace: rabbitmq
spec:
  serviceName: rabbitmq
  replicas: 3
  template:
    metadata:
      labels:
        app: rabbitmq
    # annotations:
      # prometheus.io/path: /metrics    # prometheus 支持
      # prometheus.io/port: "15692"
      # prometheus.io/scrape: "true"
    spec:
      serviceAccountName: rabbitmq
      terminationGracePeriodSeconds: 10
      containers:
      - name: rabbitmq
        image: rabbitmq:3.7.24
        resources:
          limits:
            cpu: 2.0
            memory: 2.0Gi
          requests:
            cpu: 0.5
            memory: 2.0Gi
        volumeMounts:
          - name: config-volume
            mountPath: /etc/rabbitmq
        ports:
          - name: http
            protocol: TCP
            containerPort: 15672
          - name: amqp
            protocol: TCP
            containerPort: 5672
        readinessProbe:
          exec:
            command: ["rabbitmq-diagnostics", "status"]
          initialDelaySeconds: 20
          periodSeconds: 60
          timeoutSeconds: 10
        env:
          - name: MY_POD_NAME
            valueFrom:
              fieldRef:
                apiVersion: v1
                fieldPath: metadata.name
          - name: MY_POD_NAMESPACE
            valueFrom:
              fieldRef:
                fieldPath: metadata.namespace
          - name: RABBITMQ_USE_LONGNAME
            value: "true"
          - name: K8S_SERVICE_NAME
            value: rabbitmq
          - name: RABBITMQ_NODENAME
            value: rabbit@$(MY_POD_NAME).$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
          - name: K8S_HOSTNAME_SUFFIX
            value: .$(K8S_SERVICE_NAME).$(MY_POD_NAMESPACE).svc.cluster.local
          - name: RABBITMQ_ERLANG_COOKIE
            value: "xxx-rabbitmq"
      volumes:
        - name: config-volume
          configMap:
            name: rabbitmq-config
            items:
            - key: rabbitmq.conf
              path: rabbitmq.conf
            - key: enabled_plugins
              path: enabled_plugins

切换为高可用镜像模式,可以进入其中一个 POD,使用下面命令设置镜像策略:

1
rabbitmqctl set_policy ha-all "^" '{"ha-mode":"all", "ha-sync-mode":"automatic"}'

最终效果如下图,在 dashboard 中可以查看到 ha-all 策略生效。policies 冲可以看到具体策略信息

四、监控相关(待补充)

RabbitMQ 3.8.0 及之后版本内置 Prometheus 和 Grafana 支持插件,3.7.0 没有适配插件需要更改旧版本插件编译安装。

参考链接

https://github.com/rabbitmq/rabbitmq-peer-discovery-k8s/tree/master/examples