Thursday, December 30, 2021

One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary

                 SWITCHOVER/ FAILOVER IN POSTGRES/EDB POSTGRES USING EFM


Error Reported :


On switchover/failover using EFM utility, noticed below error on command efm cluster-status efm.

/usr/edb/efm-4.2/bin/efm cluster-status efm

One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 10.0.1.40

Resolution: 

1. Run EFM cluster status command to confirm the error.

/var/lib/edb-as/13/main# /usr/edb/efm-4.2/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              DB       VIP
        ----------------------------------------------------------------
        Standby     172.30.21.227           UP
        Primary     172.30.21.226        UP
        Witness     172.30.21.228         N/A

Allowed node host list:
        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:
        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Primary     172.30.21.226                           8F/2901DE38
        Standby     172.30.21.227           8F/29000000        8F/290001C0

        One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 172.30.21.227


2. Check Error Reported in log file of database.



In Secondary DB log
2021-12-30 11:03:13.593 UTC [25226] FATAL:  could not start WAL streaming: ERROR:  replication slot "repuser" does not exist


3. Created Replication Slot on Primary DB


root@ip-172-30-21-226:~# su - enterprisedb
enterprisedb@ip-172-30-21-226:~$ psql -d edb
psql (13.5 (Ubuntu 13.5-1.pgdg18.04+1), server 13.3.7 (Ubuntu 13.3.7-1+ubuntu4))
Type "help" for help.

edb=# SELECT * FROM pg_create_physical_replication_slot('repuser');
 slot_name | lsn
-----------+-----
 repuser   |
(1 row)

edb=#


4. Check the status again by running EFM command


/var/lib/edb-as/13/main/log# /usr/edb/efm-4.2/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              DB       VIP
        ----------------------------------------------------------------
        Standby     172.30.21.227           UP
        Primary     172.30.21.226        UP
        Witness     172.30.21.228         N/A

Allowed node host list:
        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:
        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Primary     172.30.21.226                           8F/2901DE38
        Standby     172.30.21.227           8F/2901DE38        8F/2901DE38

        Standby database(s) in sync with primary. It is safe to promote.
root@ip-10-0-1-40:/var/lib/edb-as/13/main/log#


Note:- The important things related to this issue is the error reported in database logs and efm logs. In my case the issue was related to replication slot. In your case it could be different but you will get the actual error by looking DB and EFM logs. 


No comments:

Post a Comment

EFM Switchover Command

1. Check the Current Status using EFM command root@ip-192.162.1.1:~# root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm Clus...