Cloud Tech: [Cluster] Pacemaker and HAProxy

[Cluster] Pacemaker and HAProxy - HA구성
출처: https://m.blog.naver.com/PostView.nhn?blogId=sunsync&logNo=220802338953

-- cluster를 구성하는 모든 node에 설치 --
-- vip : 192.168.56.202
-- vip : 192.168.56.203
-- vip : 192.168.56.204
-- node1: root@cent7_m_hp1 서버 - 192.168.56.131
-- node2: root@cent7_m_hp2 서버 - 192.168.56.132 -- 서버 : CentOS7, haproxy-1.5.14-3.el7.x86_64 pacemaker-1.1.13-10.el7_2.4.x86_64-- HAProxy 설정은 목록의 'HAProxy 설정' 참고-- 모든 테스트는 PC에 Oracle VM을 이용 가상서버들을 설치하여 진행했음.
1. 시간 동기화
- 여러서버가 통신하는데 시간 동기화는 매우 중요한 요소, 여기서는 ntp를 이용 시간 동기화
# yum install ntp
2. 방화벽 설정
- corosync는 udp 5404 ~ 5406 포트를 사용. 방화벽에서 설정해 줌.
ex)
# iptables -A INPUT -i eth1 -p udp -m multiport --dports 5404,5405,5406 -m conntrack --ctstate NEW,ESTABLISHED -j ACCEPT
# iptables -A OUTPUT -o eth1 -p udp -m multiport --sports 5404,5405,5406 -m conntrack --ctstate ESTABLISHED -j ACCEPT
3. host name 등록(모든 node 동일)
[root@cent7_m_L1 ~]# vi /etc/hosts
127.0.0.1 localhost
::1 localhost
192.168.56.131 ha-master.exam.com ha-master
192.168.56.132 ha-slave.exam.com ha-slave

4. pacemaker 설치(모든 node 동일)
-- pacemaker를 설치하면 의존성으로 corosync도 설치가 됨
-- pcs는 pacemaker와 corosync를 config하는 tool
-- pcs를 설치하면 pcsd가 설치가 됨.
-- pcsd is openssl based daemon written in ruby, manages pcs authentication between nodes,
-- the authentication files are located in /var/lib/pcsd.

[root@cent7_m_hp1 /var/log]# yum install pacemaker pcs
[root@cent7_m_hp2 /var/log]# yum install pacemaker pcs

5. pcsd 서비스 시작(모든 node 동일)
[root@cent7_m_hp1 /var/log]# systemctl start pcsd
[root@cent7_m_hp2 /var/log]# systemctl start pcsd

6. cluster node간 인증을 위한 설정(모든 node 동일)
-- 이들 패키지 설치로 인해, 시스템에 새로운 user가 만들어짐(hacluster).
-- 1) 클러스터를 구성 할 노드 인증, node들간의 user,password를 동일하게 설정.
-- 2) 클러스터 노드를 구성하고 동기화.
-- 3) 클러스터 노드에서 클러스터 서비스를 시작,중지
[root@cent7_m_hp1 /etc/corosync]# passwd hacluster
[root@cent7_m_hp2 /etc/corosync]# passwd hacluster

7. cluster node간의 인증(master node 에서만 적용)
[root@cent7_m_hp1 /var/lib]# pcs cluster auth ha-master.exam.com ha-slave.exam.com -u hacluster -p
Username: hacluster
Password:
ha-slave.exam.com: Authorized
ha-master.exam.com: Authorized
[root@cent7_m_hp1 /var/lib]#

-- 위 실행 후 /var/lib/pcsd/tokens 파일에 토큰 정보 생성

8. 'Main_Cluster'라는 cluster를 만들고, corosync config를 node간에 동기화(master node에서)
[root@cent7_m_hp1 /var/lib]# pcs cluster setup --name Main_Cluster ha-master.exam.com ha-slave.exam.com
Shutting down pacemaker/corosync services...
Redirecting to /bin/systemctl stop pacemaker.service
Redirecting to /bin/systemctl stop corosync.service
Killing any remaining services...
Removing all cluster configuration files...
ha-master.exam.com: Succeeded
ha-slave.exam.com: Succeeded
Synchronizing pcsd certificates on nodes ha-master.exam.com, ha-slave.exam.com...
ha-slave.exam.com: Success
ha-master.exam.com: Success

Restaring pcsd on the nodes in order to reload the certificates...
ha-slave.exam.com: Success
ha-master.exam.com: Success
[root@cent7_m_hp1 /var/lib]#

9. starting cluster(master node에서)
root@cent7_m_hp1 /etc/corosync]# pcs cluster start --all
ha-master.exam.com: Starting Cluster...
ha-slave.exam.com: Starting Cluster...
[root@cent7_m_hp1 /etc/corosync]# pcs status
Cluster name: Main_Cluster
WARNING: no stonith devices and stonith-enabled is not false
Last updated: Wed Aug 31 15:57:03 2016 Last change: Wed Aug 31 15:56:51 2016 by hacluster via crmd on ha-slave.exam.com
Stack: corosync
Current DC: ha-slave.exam.com (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 0 resources configured

Online: [ ha-master.exam.com ha-slave.exam.com ]

Full list of resources:

PCSD Status:
ha-master.exam.com: Online
ha-slave.exam.com: Online

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cent7_m_hp1 /etc/corosync]#
[root@cent7_m_hp1 /etc/corosync]# pcs status corosync
Membership information
----------------------
Nodeid Votes Name
1 1 ha-master.exam.com (local)
2 1 ha-slave.exam.com
[root@cent7_m_hp1 /etc/corosync]#

-- cluster를 멈출때
# pcs cluster stop [--all] [node] [...]

-- local node에서 강제로 멈출때
# pcs cluster kill

-- node를 standby mode로 바꿀때, 또는 제거할때
# pcs cluster standby node_name | --all
# pcs cluster unstandby node_name | --all

-- cluster의 모든 config 파일과 내용을 지우고 cluster를 멈출때
# pcs cluster stop
# pcs cluster destroy

10. Disabling STONITH and Ignoring Quorum
-- What is STONITH (Shoot The Other Node In The Head) ?
You will see a warning in the output of pcs status that no STONITH devices are configured
and STONITH is not disabled:

-- What is Quorum?
A cluster has quorum when more than half of the nodes are online.
Pacemaker's default behavior is to stop all resources if the cluster does not have quorum.
However, this does not make sense in a two-node cluster; the cluster will lose quorum if one node fails.
For this tutorial, we will tell Pacemaker to ignore quorum by setting the no-quorum-policy:

[root@cent7_m_hp1 /etc/corosync]# pcs property set stonith-enabled=false

# change to "ignore", it does not need for 2 nodes cluste
[root@cent7_m_hp1 /etc/corosync]# pcs property set no-quorum-policy=ignore

11. 가상IP 구성 및 관리그룹생성(한쪽 node에서만 진행, 주로 master에서)
-- 가상IP 리소스를 추가
-- 이를위해 'ocf:heartbeat:IPaddr2' (기본임) 리소스 agent를 구성,
-- 모든 리소스 agent는 2 ~ 3개의 필드로 이루어짐.
-- 첫째 - resource class, OCF (Open Cluster Framework)
둘째 - 표준에 의존
셋째 - resource agent 이름

-- 다음은 'Main_VIP' 라는 resource를 VIP:192.168.56.202, netmask-32bit, 모티너링 interval - 10초 로 생성
-- 'Sub_VIP', 'Etc_VIP' 는 테스트를 위해 생성하는 것임.

[root@cent7_m_hp1 /etc/corosync]# pcs resource create Main_VIP IPaddr2 ip=192.168.56.202 cidr_netmask=32 op monitor interval=10s
[root@cent7_m_hp1 /etc/corosync]# pcs resource create Sub_VIP IPaddr2 ip=192.168.56.203 cidr_netmask=32 op monitor interval=10s
[root@cent7_m_hp1 /etc/corosync]# pcs resource create Etc_VIP IPaddr2 ip=192.168.56.204 cidr_netmask=32 op monitor interval=10s
[root@cent7_m_hp1 /etc/corosync]# pcs status

-- 그룹 만들기
[root@cent7_m_hp1 /etc/corosync]# pcs resource group add VIP_Group Main_VIP Sub_VIP Etc_VIP
[root@cent7_m_hp1 /etc/corosync]# pcs status
Cluster name: Main_Cluster
Last updated: Thu Sep 1 15:22:36 2016 Last change: Thu Sep 1 15:19:53 2016 by root via cibadmin on ha-master.exam.com
Stack: corosync
Current DC: ha-slave.exam.com (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 3 resources configured

Online: [ ha-master.exam.com ha-slave.exam.com ]

Full list of resources:

Resource Group: VIP_Group
Main_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com
Sub_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com
Etc_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com

PCSD Status:
ha-master.exam.com: Online
ha-slave.exam.com: Online

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/enabled
[root@cent7_m_hp1 /etc/corosync]#

12. slave node에서도 상태확인
[root@cent7_s_hp2 /etc/corosync]# pcs status

13. 부팅시 서비스 시작을 위해 등록(모든 node에 적용)
[root@cent7_m_hp1 /etc/corosync]# systemctl enable pcsd
[root@cent7_m_hp1 /etc/corosync]# systemctl enable corosync
[root@cent7_m_hp1 /etc/corosync]# systemctl enable pacemaker

14. 각 config file 확인
[root@cent7_m_hp1 /etc/corosync]# more corosync.conf
[root@cent7_m_hp1 /var/lib/pcsd]# more pcs_settings.conf
[root@cent7_m_hp1 /var/lib/pcsd]# more pcs_users.conf
[root@cent7_m_hp1 /var/lib/pcsd]# more tokens

15. 가상 IP 확인
-- 실제 interface에는 올라오지 않음.
[root@cent7_m_hp1 /var/lib/pcsd]# ip addr
...
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:ce:a2:a1 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.131/24 brd 192.168.56.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet 192.168.56.202/32 brd 192.168.56.255 scope global secondary enp0s8
valid_lft forever preferred_lft forever
[root@cent7_m_hp1 /var/lib/pcsd]#
[root@cent7_m_hp1 /var/lib/pcsd]# ifconfig
...
enp0s8: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.56.131 netmask 255.255.255.0 broadcast 192.168.56.255
ether 08:00:27:ce:a2:a1 txqueuelen 1000 (Ethernet)
RX packets 46174 bytes 5641875 (5.3 MiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 64794 bytes 8655261 (8.2 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[root@cent7_m_hp1 /var/lib/pcsd]#

16. haproxy 로드밸런서 resource 를 만들고 그룹으로 등록
-- HA 부하분산으로 VIP그룹과 LB그룹이 서로 다른 node에서 실행됨.
-- 제약조건으로 이를 한곳에서 실행할 수있게 제한

[root@cent7_m_hp1 /etc/corosync]# pcs resource create HAProxy_LB systemd:haproxy op monitor interval=10s
[root@cent7_m_hp1 /etc/corosync]# pcs resource group add LB_Group HAProxy_LB
[root@cent7_m_hp1 /etc/corosync]# pcs status
Cluster name: Main_Cluster
Last updated: Thu Sep 1 15:30:16 2016 Last change: Thu Sep 1 15:28:04 2016 by root via cibadmin on ha-master.exam.com
Stack: corosync
Current DC: ha-slave.exam.com (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 4 resources configured

Online: [ ha-master.exam.com ha-slave.exam.com ]

Full list of resources:

Resource Group: VIP_Group
Main_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com
Sub_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com
Etc_VIP (ocf::heartbeat:IPaddr2): Started ha-master.exam.com
Resource Group: LB_Group
HAProxy_LB (systemd:haproxy): Started ha-slave.exam.com

PCSD Status:
ha-master.exam.com: Online
ha-slave.exam.com: Online

Daemon Status:
corosync: active/enabled
pacemaker: active/enabled
pcsd: active/enabled
[root@cent7_m_hp1 /etc/corosync]#

-- 리소스 그룹 만들기
-- 그룹에 속한 리소스들은 순차적으로 실행되고, 역순으로 종료한다.
1)만들기
# pcs resource group add group_name resource_id...
2)제거하기(주의, 리소스아이디가 없으면 그룹자체가 지워짐)
# pcs resource group remove group_name resource_id...
3)그룹리스트 보기
# pcs resource group list

18. 제약조건 생성
-- 필요에 의해 같은 곳에 있어야하는 리소스는 제약조건을 걸어야 함.
-- 모든 VIP 리소스가 LB리소스와 연동되는 상황
-- VIP 각 리소스에 HAProxy_LB를 매핑할 필요 없이 최초 하나의 VIP와 연결시키면 됨.
[root@cent7_m_hp1 /etc/corosync]# pcs constraint colocation add HAProxy_LB with Main_VIP score=INFINITY

-- 제약 체크 순서는 VIP -> LV 데몬으로 한다.
-- 맨 마지막 VIP와 LB를 연결해 관리
[root@cent7_m_hp1 /etc/corosync]# pcs constraint order set Main_VIP Sub_VIP Etc_VIP
[root@cent7_m_hp1 /etc/corosync]# pcs constraint order set Etc_VIP HAProxy_LB

-- 마스터IP(location)에서 주된 서비스를 하도록 설정(일부는 반대로 slave에서 하도록 할 수도 있음)
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location Main_VIP prefers ha-master.exam.com=INFINITY
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location Sub_VIP prefers ha-master.exam.com=INFINITY
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location Etc_VIP prefers ha-master.exam.com=INFINITY
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location HAProxy_LB prefers ha-master.exam.com=INFINITY

-- order 제약 삭제[root@cent7_m_hp1 /etc/corosync]# pcs constraint order remove [constraint id-- 제약 확인
[root@cent7_m_hp1 /etc/corosync]# pcs constraint --full

18. 리소스 이동
1) 한 노드에서 동작중인 모든 리소스들을 다른 노드로 이동시키려면, 해당 노드를 standby mode로 변경한다.
[root@cent7_m_hp1 /etc/haproxy]# pcs cluster standby node_name

리소스가 이동한다음 해당 cluster node를 다시 회복시킨다.
[root@cent7_m_hp1 /etc/haproxy]# pcs cluster unstandby node_name

2) 하나의 동작중인 리소스를 옮기려면, pcs resource move resource_id [destination_node] 를 이용
-- 리소스가 이동하고나면, 관련 모든 제약조건은 자동으로 삭제된다.
[root@cent7_m_hp1 /etc/haproxy]# pcs resource move Etc_VIP ha-slave.exam.com
[root@cent7_m_hp1 /etc/haproxy]# pcs resource relocate run Etc_VIP
[root@cent7_m_hp1 /etc/haproxy]# pcs resource relocate clear
[root@cent7_m_hp1 /etc/haproxy]# pcs resource relocate show

-------------- 기타 -----------------------------------

1. cluster node stop
[root@cent7_m_hp1 /var/lib/pcsd]# pcs cluster stop ha-slave.exam.com
-- stop 이 잘 안될경우
1) # pcs cluster disable 대상_노드이름 <-- disable 시킴
2) # pcs status <-- 상태 및 에러확인
3) # pcs resource cleanup <-- 에러 메시지 제거
# pcs resource cleanup HAProxy --node proxy24) # pcs status <-- 다시확인
5) # pcs cluster stop ha-slave.exam.com
6) # pcs status <-- 최종확인

2) 새로운 cluster resource를 생성
[root@cent7_m_hp1 /var/lib/pcsd]# pcs resource create Web_VIP ocf:heartbeat:IPaddr2 ip=192.168.56.203 cidr_netmask=32 op monitor interval=10s
[root@cent7_m_hp1 /var/lib/pcsd]#
[root@cent7_m_hp1 /var/lib/pcsd]# pcs status
Cluster name: MainWeb_VIP
WARNING: corosync and pacemaker node names do not match (IPs used in setup?)
Last updated: Mon Aug 29 13:15:06 2016 Last change: Mon Aug 29 13:15:04 2016 by root via cibadmin on cent7_m_hp1
Stack: corosync
Current DC: cent7_s_hp2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
2 nodes and 2 resources configured

Online: [ cent7_m_hp1 cent7_s_hp2 ]

Full list of resources:

Cluster_VIP (ocf::heartbeat:IPaddr2): Started cent7_s_hp2
Web_VIP (ocf::heartbeat:IPaddr2): Started cent7_m_hp1

PCSD Status:
cent7_m_hp1 (192.168.56.131): Online
cent7_s_hp2 (192.168.56.132): Online

Daemon Status:
corosync: active/disabled
pacemaker: active/disabled
pcsd: active/disabled
[root@cent7_m_hp1 /var/lib/pcsd]#

3) cluster node중 하나를 삭제
[root@cent7_m_hp1 /var/lib/pcsd]# pcs cluster node remove node_name

4) 삭제 node 다시 추가
-- node를 추가해도 관련 데몬들은 시작을 안함.
-- 추가된 노드를 시작시켜 줘야함.
[root@cent7_m_hp1 /etc/corosync]# pcs cluster node add node_name
[root@cent7_m_hp1 /etc/corosync]# pcs cluster start node_name

-- 기타 명령어 --

1. 상태확인
-- 두 서버가 online 상태인지 확인
[root@cent7_m_L1 /var/lib/pcsd]# pcs status

2. config error 체크
[root@cent7_m_L1 /etc/corosync]# crm_verify -L -V
error: unpack_resources: Resource start-up disabled since no STONITH resources have been defined
error: unpack_resources: Either configure some or disable STONITH with the stonith-enabled option
error: unpack_resources: NOTE: Clusters with shared data need STONITH to ensure data integrity
Errors found during check: config not valid
-- 위와 같은 에러발생시
-- no-quorum-policy=ignore 설정 안할 경우 cluster 간 failover가 부자연스러울 수 있음
[root@cent7_m_L1 /etc/corosync]# pcs property set stonith-enabled=false
[root@cent7_m_L1 /etc/corosync]# pcs property set no-quorum-policy=ignore

[root@cent7_m_L1 /etc/corosync]# crm_verify -L -V

[root@cent7_m_L1 /etc/corosync]# pcs property
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: WebCluster
dc-version: 1.1.13-10.el7_2.4-44eb2dd
have-watchdog: false
no-quorum-policy: ignore
stonith-enabled: false

3. 서비스(리소스 or 가상아이피) 등록
-- master node에서만 실행
[root@cent7_m_L1 /etc/corosync]# pcs resource create MainWebVirtualIP IPaddr2 ip=192.168.56.201 cidr_netmask=32

4. 리소스 전체 확인
[root@cent7_m_hp1 /etc/corosync]# pcs resource --full

5. 리소스 수정
[root@cent7_m_hp1 /etc/corosync]# pcs resource update Cluster_VIP ip=192.168.56.202 cidr_netmask=32 op monitor interval=10s

6. 제약조건
[root@cent7_m_hp1 /etc/corosync]# pcs constraint colocation add HAProxy_Main HAProxy_Main_VIP INFINITY
[root@cent7_m_hp1 /etc/corosync]# pcs constraint order HAProxy_Main_VIP then HAProxy_Main
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location HAProxy_Main_VIP prefers node_id
[root@cent7_m_hp1 /etc/corosync]# pcs constraint location HAProxy_Main prefers node_id

-- 확인
[root@cent7_m_hp1 /etc/corosync]# pcs constraint
[root@cent7_m_hp1 /etc/corosync]# pcs constraint list
[root@cent7_m_hp1 /etc/corosync]# pcs constraint list --full

7. Enable, disable resource
-- pcs resource disable resource_id [--wait[=n]]
-- pcs resource enable resource_id [--wait[=n]]

[root@cent7_m_hp1 /etc/haproxy]# pcs resource disable HAProxy_Main
[root@cent7_m_hp1 /etc/haproxy]# pcs resource enable HAProxy_Main
[root@cent7_m_hp1 /etc/haproxy]# pcs resource disable HAProxy_Main --wait=5
[root@cent7_m_hp1 /etc/haproxy]# pcs resource enable HAProxy_Main

8. pcs status - fail action 메시지 clear
-- pcs status 에 나오는 fail action 메시지 제거

[root@cent7_m_hp1 /etc/haproxy]# pcs resource cleanup

Cloud Tech

2017년 6월 28일 수요일

[Cluster] Pacemaker and HAProxy - HA구성

블로그 보관함