Thursday, December 30, 2021

EFM Switchover Command


1. Check the Current Status using EFM command

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm


        Agent Type  Address              DB       VIP

        ----------------------------------------------------------------

        Standby     192.162.1.2            UP

        Primary     192.162.1.1        UP

        Witness     192.162.1.3         N/A


Allowed node host list:

        192.162.1.1 192.162.1.3 192.162.1.2


Membership coordinator: 192.162.1.1


Standby priority host list:

        192.162.1.2


Promote Status:


        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------------------

        Primary     192.162.1.1                           8F/2901E170

        Standby     192.162.1.2            8F/2901E170        8F/2901E170


        Standby database(s) in sync with primary. It is safe to promote.


2. Run Switchover Command

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm promote efm -switchover

Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.


3. Switchover Completed Successfully , Check Status

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm


        Agent Type  Address              DB       VIP

        ----------------------------------------------------------------

        Primary     192.162.1.2            UP

        Idle        192.162.1.1        UNKNOWN

        Witness     192.162.1.3         N/A


Allowed node host list:

        192.162.1.1 192.162.1.3 192.162.1.2


Membership coordinator: 192.162.1.1


Standby priority host list:

        (List is empty.)


Promote Status:


        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------------------

        Primary     192.162.1.2                               8F/2901E300


        No standby databases were found.


Idle Node Status (idle nodes ignored in WAL LSN comparisons):


        Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------

        192.162.1.1        8F/2901E300        8F/2901E300        DB is in recovery.

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm


        Agent Type  Address              DB       VIP

        ----------------------------------------------------------------

        Primary     192.162.1.2            UP

        Standby     192.162.1.1        UP

        Witness     192.162.1.3         N/A


Allowed node host list:

        192.162.1.1 192.162.1.3 192.162.1.2


Membership coordinator: 192.162.1.1


Standby priority host list:

        192.162.1.1


Promote Status:


        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------------------

        Primary     192.162.1.2                               8F/2901E300

        Standby     192.162.1.1        8F/2901E300        8F/2901E300


        Standby database(s) in sync with primary. It is safe to promote.

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~#


One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary

                 SWITCHOVER/ FAILOVER IN POSTGRES/EDB POSTGRES USING EFM


Error Reported :


On switchover/failover using EFM utility, noticed below error on command efm cluster-status efm.

/usr/edb/efm-4.2/bin/efm cluster-status efm

One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 10.0.1.40

Resolution: 

1. Run EFM cluster status command to confirm the error.

/var/lib/edb-as/13/main# /usr/edb/efm-4.2/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              DB       VIP
        ----------------------------------------------------------------
        Standby     172.30.21.227           UP
        Primary     172.30.21.226        UP
        Witness     172.30.21.228         N/A

Allowed node host list:
        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:
        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Primary     172.30.21.226                           8F/2901DE38
        Standby     172.30.21.227           8F/29000000        8F/290001C0

        One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 172.30.21.227


2. Check Error Reported in log file of database.



In Secondary DB log
2021-12-30 11:03:13.593 UTC [25226] FATAL:  could not start WAL streaming: ERROR:  replication slot "repuser" does not exist


3. Created Replication Slot on Primary DB


root@ip-172-30-21-226:~# su - enterprisedb
enterprisedb@ip-172-30-21-226:~$ psql -d edb
psql (13.5 (Ubuntu 13.5-1.pgdg18.04+1), server 13.3.7 (Ubuntu 13.3.7-1+ubuntu4))
Type "help" for help.

edb=# SELECT * FROM pg_create_physical_replication_slot('repuser');
 slot_name | lsn
-----------+-----
 repuser   |
(1 row)

edb=#


4. Check the status again by running EFM command


/var/lib/edb-as/13/main/log# /usr/edb/efm-4.2/bin/efm cluster-status efm
Cluster Status: efm

        Agent Type  Address              DB       VIP
        ----------------------------------------------------------------
        Standby     172.30.21.227           UP
        Primary     172.30.21.226        UP
        Witness     172.30.21.228         N/A

Allowed node host list:
        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:
        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info
        ---------------------------------------------------------------------------
        Primary     172.30.21.226                           8F/2901DE38
        Standby     172.30.21.227           8F/2901DE38        8F/2901DE38

        Standby database(s) in sync with primary. It is safe to promote.
root@ip-10-0-1-40:/var/lib/edb-as/13/main/log#


Note:- The important things related to this issue is the error reported in database logs and efm logs. In my case the issue was related to replication slot. In your case it could be different but you will get the actual error by looking DB and EFM logs. 


Monday, December 27, 2021

EDB Postgres Database Installation

 


# Login with Root/sudo access

# Setup the EDB repository

sudo su -c 'echo "deb [arch=amd64] https://apt.enterprisedb.com/$(lsb_release -cs)-edb/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/edb-$(lsb_release -cs).list'

 

# Replace '<USERNAME>' and '<PASSWORD>' below with your username and password for the EDB repositories. Visit https://www.enterprisedb.com/user to get your username and password

sudo su -c 'echo "machine apt.enterprisedb.com login username password password" > /etc/apt/auth.conf.d/edb.conf'

 

# Add support for secure APT repositories

sudo apt-get -y install apt-transport-https

 

Add the EDB signing key

sudo wget -q -O - https://apt.enterprisedb.com/edb-deb.gpg.key  | sudo apt-key add -

 

# Update the repository meta data

sudo apt-get update

 

# Install selected packages

sudo apt-get -y install edb-as13-server   ## Server

apt-get install postgresql-client-13 ##Client

apt install postgresql-client-common ##tool

 

# Connect to the database server

 sudo su - enterprisedb

 psql postgres

 

# Check if the service is running

 /usr/bin/epas_lsclusters

Ver Cluster Port Status Owner        Data directory          Log file

13  main    5444 online enterprisedb /var/lib/edb-as/13/main /var/log/edb-as/edb-as-13-main.log


 

FATAL: could not read pid file

  Error:- I faced this issue when I first time installed the edb-pgpool42.  Installed the PGPOOL4.2 and started the pgpool service successfully. But after reboot the pgpool server, it not came up. Error reported in PGPOOL log file  was  "FATAL:  could not read pid file".


Issue:-

Error report in edb-pgpool42.log log file

root@ip:/var/run/runfile# cd /var/log/edb/pgpool4.2/

root@ip:/var/log/edb/pgpool4.2# ls -lrth

total 4.0K

-rw-r--r-- 1 enterprisedb enterprisedb 202 Dec 25 12:39 edb-pgpool42.log

root@ip:/var/log/edb/pgpool4.2# cat edb-pgpool42.log

2021-12-25 12:39:25: pid 8228: WARNING:  "pool_conn_dbname" is depreciated, use "health_check_database" to configure health check database

2021-12-25 12:39:25: pid 8228: FATAL:  could not read pid file

root@ip:/var/log/edb/pgpool4.2#

Solution:-

1. Checked the /etc/edb/edb-pgpool42/pgpool.conf file. Check the location of pid_file_name.

pid_file_name = '/var/run/edb/pgpool4.2/edb-pgpool-4.2.pid'

2. Change the directory of pid_file_name in pgpool.conf file and take the restart/start.

ex: pid_file_name = '/var/run/any_directory/pgpool.pid'

pid_file_name = '/var/run/anylocation/edb-pgpool-4.2.pid'

3. Give full permission to enterprisedb or postgres

root@ip:/var/run# pwd

/var/run

drwxr-xr-x  2 enterprisedb enterprisedb   80 Dec 25 15:25 anylocation
root@ip:/var/run# cd anylocation
root@ip:/var/run# ls
-rw------- 1 enterprisedb enterprisedb 6 Dec 25 15:20 edb-pgpool-4.2.pid

4. Check all the directory mentioned in pgpool.conf file exists or not and having proper permission. In my case directory /var/run/edb-as/ was missing.  

enterprisedb@ip:~$ /usr/edb/pgpool4.2/bin/pgpool -n -d
pid 28631: WARNING:  "pool_conn_dbname" is depreciated, use "health_check_database" to configure health check database
pid 28631: DEBUG:  initializing pool configuration
pid 28631: DETAIL:  num_backends: 1 total_weight: 1.000000
pid 28631: DEBUG:  initializing pool configuration
pid 28631: DETAIL:  backend 0 weight: 2147483647.000000 flag: 0000
pid 28631: DEBUG:  pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
pid 28631: LOG:  health_check_stats_shared_memory_size: requested size: 12288
pid 28631: LOG:  memory cache initialized
pid 28631: DETAIL:  memcache blocks :64
pid 28631: LOG:  allocating (136571712) bytes of shared memory segment
pid 28631: LOG:  allocating shared memory segment of size: 136571712
pid 28631: DEBUG:  pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory
pid 28631: LOG:  health_check_stats_shared_memory_size: requested size: 12288
pid 28631: LOG:  health_check_stats_shared_memory_size: requested size: 12288
pid 28631: LOG:  memory cache initialized
pid 28631: DETAIL:  memcache blocks :64
pid 28631: DEBUG:  memory cache request size : 67108864
pid 28631: LOG:  pool_discard_oid_maps: discarded memqcache oid maps
pid 28631: FATAL:  failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999"
pid 28631: DETAIL:  bind socket failed with error: "No such file or directory"
pid 28631: LOG:  shutting down


4. Created the missed directory and verify the other directories too.

enterprisedb@ip:~$ cd /var/run/edb-as/
-bash: cd: /var/run/edb-as/: No such file or directory
root@ip:/tmp# mkdir -p /var/run/edb-as/
root@ip:/tmp# chown enterprisedb:enterprisedb /var/run/edb-as/
root@ip:/tmp# su - enterprisedb

5. Started the service successfully.

root@ip:/tmp# systemctl start edb-pgpool42.service
root@ip:/tmp# systemctl status edb-pgpool42.service
● edb-pgpool42.service - pgpool-II service script for EDB Postgres Advanced Server
     Loaded: loaded (/lib/systemd/system/edb-pgpool42.service; enabled; vendor preset: enabled)
     Active: active (running) since Sat 2021-12-25 15:20:33 UTC; 2s ago
    Process: 29671 ExecStart=/bin/bash -c /usr/edb/pgpool4.2/bin/edb-pgpool42.sh (code=exited, status=0/SUCCESS)
   Main PID: 29672 (pgpool)
      Tasks: 33 (limit: 1147)
     Memory: 143.5M
     CGroup: /system.slice/edb-pgpool42.service
             ├─29672 /usr/edb/pgpool4.2/bin/pgpool -D -f /etc/edb/edb-pgpool42/pgpool.conf -n -F /etc/edb/edb-pgpool42/pcp.conf>
             ├─29676 pgpool: wait for connection request
             ├─29677 pgpool: wait for connection request
             ├─29678 pgpool: wait for connection request
             ├─29679 pgpool: wait for connection request
             ├─29680 pgpool: wait for connection request
             ├─29681 pgpool: wait for connection request
             ├─29682 pgpool: wait for connection request
             ├─29683 pgpool: wait for connection request
             ├─29684 pgpool: wait for connection request
             ├─29685 pgpool: wait for connection request
             ├─29686 pgpool: wait for connection request
             ├─29687 pgpool: wait for connection request
             ├─29688 pgpool: wait for connection request
             ├─29689 pgpool: wait for connection request





FATAL: failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999" bind socket failed with error: "No such file or directory"

 Error reported after restart of pgpool service or pgpool host


Error:-

 pid 28631: FATAL:  failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999"

pid 28631: DETAIL:  bind socket failed with error: "No such file or directory"

Solution:-

To resolve this  issue change the default directory to some another directory with correct permission

In my case I have modified the location of socket_dir in pgpool.conf file.

1. In pgpool.conf file check the location of socket_dir.


socket_dir = '/var/run/edb-as'


2. The location of socket_dir should not be in in /var/run location. so I have created a file in root directory ahc change the owner to enterprisedb.

root@ip-172-31-80-35:/var/run# mkdir /ahc

root@ip-172-31-80-35:/var/run# chown enterprisedb:enterprisedb /ahc

root@ip-172-31-80-35:/var/run# cd /etc/edb/edb-pgpool42

root@ip-172-31-80-35:/etc/edb/edb-pgpool42# vi pgpool.conf


socket_dir = '/ahc'

Ex. socket_dir = '/var/run/edb-as' to socket_dir = '/ahc'

3. Now you can restart server/Service.

root@ip-172-31-80-35:/etc/edb/edb-pgpool42# systemctl start edb-pgpool42.service
root@ip-172-31-80-35:/etc/edb/edb-pgpool42# systemctl status edb-pgpool42.service
● edb-pgpool42.service - pgpool-II service script for EDB Postgres Advanced Server
     Loaded: loaded (/lib/systemd/system/edb-pgpool42.service; enabled; vendor preset: enabled)
     Active: active (running) since Sun 2021-12-26 09:06:13 UTC; 6s ago
    Process: 1780 ExecStart=/bin/bash -c /usr/edb/pgpool4.2/bin/edb-pgpool42.sh (code=exited, status=0/SUCCESS)
   Main PID: 1781 (pgpool)
      Tasks: 33 (limit: 1147)
     Memory: 143.6M
     CGroup: /system.slice/edb-pgpool42.service
             ├─1781 /usr/edb/pgpool4.2/bin/pgpool -D -f /etc/edb/edb-pgpool42/pgpool.conf -n -F /etc/edb/edb-pgpool42/pcp.conf >
             ├─1791 pgpool: wait for connection request
             ├─1792 pgpool: wait for connection request
             ├─1793 pgpool: wait for connection request
             ├─1794 pgpool: wait for connection request
             ├─1795 pgpool: wait for connection request
             ├─1796 pgpool: wait for connection request
             ├─1797 pgpool: wait for connection request
             ├─1798 pgpool: wait for connection request
             ├─1799 pgpool: wait for connection request
             ├─1800 pgpool: wait for connection request
             ├─1801 pgpool: wait for connection request
             ├─1802 pgpool: wait for connection request
             ├─1803 pgpool: wait for connection request
             ├─1804 pgpool: wait for connection request
             ├─1805 pgpool: wait for connection request
             ├─1806 pgpool: wait for connection request
             ├─1807 pgpool: wait for connection request
             ├─1808 pgpool: wait for connection request
             ├─1809 pgpool: wait for connection request
             ├─1810 pgpool: wait for connection request
             ├─1811 pgpool: wait for connection request
             ├─1812 pgpool: wait for connection request
             ├─1813 pgpool: wait for connection request
             ├─1814 pgpool: wait for connection request
             ├─1815 pgpool: wait for connection request
             ├─1816 pgpool: wait for connection request
             ├─1817 pgpool: wait for connection request
root@ip-172-31-80-35:/etc/edb/edb-pgpool42#


FATAL: authentication failed for user DETAIL: username and/or password does not match pgpool

 

POSTGRES FAILOVER/FAILBACK + PGPOOL


In my postgres environment which consists on one primary and one standby database got failover.
After failover the primary become the standby and standby become the primary. 

But the issue is pgpool not able to process the connection to new primary node and at application we start receiving below error after failover/failback command(switchover/failover)

Error Reported At Client End

psql: error: server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.


Error Reported In Pgpool.log File


FATAL:  unable to read from client
DETAIL:  pcp_read failed with error : "Success"
LOG:  PCP process with pid: 47261 exit with SUCCESS.
LOG:  PCP process with pid: 47261 exits with status 256
FATAL:  pgpool is not accepting any new connections
DETAIL:  all backend nodes are down, pgpool requires at least one valid node
HINT:  repair the backend nodes and restart pgpool
LOG:  child process with pid: 47254 exits with status 256
LOG:  fork a new child process with pid: 47316
LOG:  forked new pcp worker, pid=47323 socket=7
FATAL:  unable to read from client
DETAIL:  pcp_read failed with error : "Success"
LOG:  PCP process with pid: 47323 exit with SUCCESS.
LOG:  PCP process with pid: 47323 exits with status 256


Resolution 

1.  Check the pgpool.log file to find the reason for this error and notified below error.

cat /var/log/edb/pgpool4.2/edb-pgpool42.log

DETAIL:  all backend nodes are down, pgpool requires at least one valid node
HINT:  repair the backend nodes and restart pgpool

2. By looking this error,  came to know that pcp_attach_node not ran properly. But I have configured in such a way that it should run automatically after switchover/failover.  So what went wrong in my case

/usr/edb/pgpool4.2/bin/pcp_attach_node  -U pcpuser -h **** -w -n 0

3. Run the above command manually to find why the node not attach once standby database(earlier which was primary) came online.

export PCPPASSFILE=/var/lib/edb-as/.pcppass
enterprisedb:~$ /usr/edb/pgpool4.2/bin/pcp_attach_node  -U pcpuser -h *** -w -n 0
FATAL:  authentication failed for user "pcpuser"
DETAIL:  username and/or password does not match

4.Checked my pcp.conf file and .pcppass file to check if the password is same. 

enterprisedb@:/etc/edb/edb-pgpool42$ cat pcp.conf
pcpuser:6a01bfa30172639e770a6aacb78a3ed4

enterprisedb@:/etc/edb/edb-pgpool42# cat /var/lib/edb-as/.pcppass
*******:9898:pcpuser:**

It was okay.

5. Finally checked the permission and owner the the pcp.cof and .pcppass. In my case the file permission was the issue.


enterprisedb@ip:/etc/edb/edb-pgpool42$  ls -lrth  /var/lib/edb-as/.pcppass
-rw-rw-r-- 1 enterprisedb enterprisedb 30 Dec 27 09:20 /var/lib/edb-as/.pcppass
root@ip:/etc/edb/edb-pgpool42# chmod 0600 /var/lib/edb-as/.pcppass
root@ip:/etc/edb/edb-pgpool42# ls -lrth  /var/lib/edb-as/.pcppass
-rw------- 1 enterprisedb enterprisedb 30 Dec 27 09:20 /var/lib/edb-as/.pcppass
enterprisedb@ip:~$ export PCPPASSFILE=/var/lib/edb-as/.pcppass
enterprisedb@ip:~$ /usr/edb/pgpool4.2/bin/pcp_attach_node  -U pcpuser -h ***** -w -n 0
pcp_attach_node -- Command Successful
enterprisedb@ip-172-31-80-35:~$









EFM Switchover Command

1. Check the Current Status using EFM command root@ip-192.162.1.1:~# root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm Clus...