My Database Journey: December 2021

Thursday, December 30, 2021

EFM Switchover Command

1. Check the Current Status using EFM command

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm

Agent Type Address DB VIP

----------------------------------------------------------------

Standby 192.162.1.2 UP

Primary 192.162.1.1 UP

Witness 192.162.1.3 N/A

Allowed node host list:

192.162.1.1 192.162.1.3 192.162.1.2

Membership coordinator: 192.162.1.1

Standby priority host list:

192.162.1.2

Promote Status:

DB Type Address WAL Received LSN WAL Replayed LSN Info

---------------------------------------------------------------------------

Primary 192.162.1.1 8F/2901E170

Standby 192.162.1.2 8F/2901E170 8F/2901E170

Standby database(s) in sync with primary. It is safe to promote.

2. Run Switchover Command

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm promote efm -switchover

Promote/switchover command accepted by local agent. Proceeding with promotion and will reconfigure original primary. Run the 'cluster-status' command for information about the new cluster state.

3. Switchover Completed Successfully , Check Status

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm

Agent Type Address DB VIP

----------------------------------------------------------------

Primary 192.162.1.2 UP

Idle 192.162.1.1 UNKNOWN

Witness 192.162.1.3 N/A

Allowed node host list:

192.162.1.1 192.162.1.3 192.162.1.2

Membership coordinator: 192.162.1.1

Standby priority host list:

(List is empty.)

Promote Status:

DB Type Address WAL Received LSN WAL Replayed LSN Info

---------------------------------------------------------------------------

Primary 192.162.1.2 8F/2901E300

No standby databases were found.

Idle Node Status (idle nodes ignored in WAL LSN comparisons):

Address WAL Received LSN WAL Replayed LSN Info

---------------------------------------------------------------

192.162.1.1 8F/2901E300 8F/2901E300 DB is in recovery.

root@ip-192.162.1.1:~#

root@ip-192.162.1.1:~# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm

Agent Type Address DB VIP

----------------------------------------------------------------

Primary 192.162.1.2 UP

Standby 192.162.1.1 UP

Witness 192.162.1.3 N/A

Allowed node host list:

192.162.1.1 192.162.1.3 192.162.1.2

Membership coordinator: 192.162.1.1

Standby priority host list:

192.162.1.1

Promote Status:

DB Type Address WAL Received LSN WAL Replayed LSN Info

---------------------------------------------------------------------------

Primary 192.162.1.2 8F/2901E300

Standby 192.162.1.1 8F/2901E300 8F/2901E300

Standby database(s) in sync with primary. It is safe to promote.

root@ip-192.162.1.1:~#

One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary

SWITCHOVER/ FAILOVER IN POSTGRES/EDB POSTGRES USING EFM

Error Reported :

On switchover/failover using EFM utility, noticed below error on command efm cluster-status efm.

/usr/edb/efm-4.2/bin/efm cluster-status efm

One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 10.0.1.40

Resolution:

1. Run EFM cluster status command to confirm the error.

/var/lib/edb-as/13/main# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm

        Agent Type  Address              DB       VIP

        ----------------------------------------------------------------

        Standby     172.30.21.227           UP

        Primary     172.30.21.226        UP

        Witness     172.30.21.228         N/A

Allowed node host list:

        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:

        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------------------

        Primary     172.30.21.226                           8F/2901DE38

        Standby     172.30.21.227           8F/29000000        8F/290001C0

        One or more standby databases are not in sync with the primary database. The following node(s) returned a WAL LSN that does not match the primary: 172.30.21.227

2. Check Error Reported in log file of database.

In Secondary DB log

2021-12-30 11:03:13.593 UTC [25226] FATAL:  could not start WAL streaming: ERROR:  replication slot "repuser" does not exist

3. Created Replication Slot on Primary DB

root@ip-172-30-21-226:~# su - enterprisedb

enterprisedb@ip-172-30-21-226:~$ psql -d edb

psql (13.5 (Ubuntu 13.5-1.pgdg18.04+1), server 13.3.7 (Ubuntu 13.3.7-1+ubuntu4))

Type "help" for help.

edb=# SELECT * FROM pg_create_physical_replication_slot('repuser');

 slot_name | lsn

-----------+-----

 repuser   |

(1 row)

edb=#

4. Check the status again by running EFM command

/var/lib/edb-as/13/main/log# /usr/edb/efm-4.2/bin/efm cluster-status efm

Cluster Status: efm

        Agent Type  Address              DB       VIP

        ----------------------------------------------------------------

        Standby     172.30.21.227           UP

        Primary     172.30.21.226        UP

        Witness     172.30.21.228         N/A

Allowed node host list:

        172.30.21.226 172.30.21.228 172.30.21.227

Membership coordinator: 172.30.21.226

Standby priority host list:

        172.30.21.227

Promote Status:

        DB Type     Address              WAL Received LSN   WAL Replayed LSN   Info

        ---------------------------------------------------------------------------

        Primary     172.30.21.226                           8F/2901DE38

        Standby     172.30.21.227           8F/2901DE38        8F/2901DE38

        Standby database(s) in sync with primary. It is safe to promote.

root@ip-10-0-1-40:/var/lib/edb-as/13/main/log#

Note:- The important things related to this issue is the error reported in database logs and efm logs. In my case the issue was related to replication slot. In your case it could be different but you will get the actual error by looking DB and EFM logs.

Monday, December 27, 2021

EDB Postgres Database Installation

# Login with Root/sudo access

# Setup the EDB repository

sudo su -c 'echo "deb [arch=amd64] https://apt.enterprisedb.com/$(lsb_release -cs)-edb/ $(lsb_release -cs) main" > /etc/apt/sources.list.d/edb-$(lsb_release -cs).list'

# Replace '<USERNAME>' and '<PASSWORD>' below with your username and password for the EDB repositories. Visit https://www.enterprisedb.com/user to get your username and password

sudo su -c 'echo "machine apt.enterprisedb.com login username password password" > /etc/apt/auth.conf.d/edb.conf'

# Add support for secure APT repositories

sudo apt-get -y install apt-transport-https

# Add the EDB signing key

sudo wget -q -O - https://apt.enterprisedb.com/edb-deb.gpg.key | sudo apt-key add -

# Update the repository meta data

sudo apt-get update

# Install selected packages

sudo apt-get -y install edb-as13-server ## Server

apt-get install postgresql-client-13 ##Client

apt install postgresql-client-common ##tool

# Connect to the database server

sudo su - enterprisedb

psql postgres

# Check if the service is running

/usr/bin/epas_lsclusters

Ver Cluster Port Status Owner Data directory Log file

13 main 5444 online enterprisedb /var/lib/edb-as/13/main /var/log/edb-as/edb-as-13-main.log

FATAL: could not read pid file

Error:- I faced this issue when I first time installed the edb-pgpool42. Installed the PGPOOL4.2 and started the pgpool service successfully. But after reboot the pgpool server, it not came up. Error reported in PGPOOL log file was "FATAL: could not read pid file".

Issue:-

Error report in edb-pgpool42.log log file

root@ip:/var/run/runfile# cd /var/log/edb/pgpool4.2/

root@ip:/var/log/edb/pgpool4.2# ls -lrth

total 4.0K

-rw-r--r-- 1 enterprisedb enterprisedb 202 Dec 25 12:39 edb-pgpool42.log

root@ip:/var/log/edb/pgpool4.2# cat edb-pgpool42.log

2021-12-25 12:39:25: pid 8228: WARNING: "pool_conn_dbname" is depreciated, use "health_check_database" to configure health check database

2021-12-25 12:39:25: pid 8228: FATAL: could not read pid file

root@ip:/var/log/edb/pgpool4.2#

Solution:-

1. Checked the /etc/edb/edb-pgpool42/pgpool.conf file. Check the location of pid_file_name.

pid_file_name = '/var/run/edb/pgpool4.2/edb-pgpool-4.2.pid'

2. Change the directory of pid_file_name in pgpool.conf file and take the restart/start.

ex: pid_file_name = '/var/run/any_directory/pgpool.pid'

pid_file_name = '/var/run/anylocation/edb-pgpool-4.2.pid'

3. Give full permission to enterprisedb or postgres

root@ip:/var/run# pwd

/var/run

drwxr-xr-x 2 enterprisedb enterprisedb 80 Dec 25 15:25 anylocation

root@ip:/var/run# cd anylocation

root@ip:/var/run# ls

-rw------- 1 enterprisedb enterprisedb 6 Dec 25 15:20 edb-pgpool-4.2.pid

4. Check all the directory mentioned in pgpool.conf file exists or not and having proper permission. In my case directory /var/run/edb-as/ was missing.

enterprisedb@ip:~$ /usr/edb/pgpool4.2/bin/pgpool -n -d

pid 28631: WARNING: "pool_conn_dbname" is depreciated, use "health_check_database" to configure health check database

pid 28631: DEBUG: initializing pool configuration

pid 28631: DETAIL: num_backends: 1 total_weight: 1.000000

pid 28631: DEBUG: initializing pool configuration

pid 28631: DETAIL: backend 0 weight: 2147483647.000000 flag: 0000

pid 28631: DEBUG: pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory

pid 28631: LOG: health_check_stats_shared_memory_size: requested size: 12288

pid 28631: LOG: memory cache initialized

pid 28631: DETAIL: memcache blocks :64

pid 28631: LOG: allocating (136571712) bytes of shared memory segment

pid 28631: LOG: allocating shared memory segment of size: 136571712

pid 28631: DEBUG: pool_coninfo_size: num_init_children (32) * max_pool (4) * MAX_NUM_BACKENDS (128) * sizeof(ConnectionInfo) (136) = 2228224 bytes requested for shared memory

pid 28631: LOG: health_check_stats_shared_memory_size: requested size: 12288

pid 28631: LOG: memory cache initialized

pid 28631: DETAIL: memcache blocks :64

pid 28631: DEBUG: memory cache request size : 67108864

pid 28631: LOG: pool_discard_oid_maps: discarded memqcache oid maps

pid 28631: FATAL: failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999"

pid 28631: DETAIL: bind socket failed with error: "No such file or directory"

pid 28631: LOG: shutting down

4. Created the missed directory and verify the other directories too.

enterprisedb@ip:~$ cd /var/run/edb-as/

-bash: cd: /var/run/edb-as/: No such file or directory

root@ip:/tmp# mkdir -p /var/run/edb-as/

root@ip:/tmp# chown enterprisedb:enterprisedb /var/run/edb-as/

root@ip:/tmp# su - enterprisedb

5. Started the service successfully.

root@ip:/tmp# systemctl start edb-pgpool42.service

root@ip:/tmp# systemctl status edb-pgpool42.service

● edb-pgpool42.service - pgpool-II service script for EDB Postgres Advanced Server

Loaded: loaded (/lib/systemd/system/edb-pgpool42.service; enabled; vendor preset: enabled)

Active: active (running) since Sat 2021-12-25 15:20:33 UTC; 2s ago

Process: 29671 ExecStart=/bin/bash -c /usr/edb/pgpool4.2/bin/edb-pgpool42.sh (code=exited, status=0/SUCCESS)

Main PID: 29672 (pgpool)

Tasks: 33 (limit: 1147)

Memory: 143.5M

CGroup: /system.slice/edb-pgpool42.service

├─29672 /usr/edb/pgpool4.2/bin/pgpool -D -f /etc/edb/edb-pgpool42/pgpool.conf -n -F /etc/edb/edb-pgpool42/pcp.conf>

├─29676 pgpool: wait for connection request

├─29677 pgpool: wait for connection request

├─29678 pgpool: wait for connection request

├─29679 pgpool: wait for connection request

├─29680 pgpool: wait for connection request

├─29681 pgpool: wait for connection request

├─29682 pgpool: wait for connection request

├─29683 pgpool: wait for connection request

├─29684 pgpool: wait for connection request

├─29685 pgpool: wait for connection request

├─29686 pgpool: wait for connection request

├─29687 pgpool: wait for connection request

├─29688 pgpool: wait for connection request

├─29689 pgpool: wait for connection request

FATAL: failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999" bind socket failed with error: "No such file or directory"

Error reported after restart of pgpool service or pgpool host

Error:-

pid 28631: FATAL: failed to bind a socket: "/var/run/edb-as/.s.PGSQL.9999"

pid 28631: DETAIL: bind socket failed with error: "No such file or directory"

Solution:-

To resolve this issue change the default directory to some another directory with correct permission

In my case I have modified the location of socket_dir in pgpool.conf file.

1. In pgpool.conf file check the location of socket_dir.

socket_dir = '/var/run/edb-as'

2. The location of socket_dir should not be in in /var/run location. so I have created a file in root directory ahc change the owner to enterprisedb.

root@ip-172-31-80-35:/var/run# mkdir /ahc

root@ip-172-31-80-35:/var/run# chown enterprisedb:enterprisedb /ahc

root@ip-172-31-80-35:/var/run# cd /etc/edb/edb-pgpool42

root@ip-172-31-80-35:/etc/edb/edb-pgpool42# vi pgpool.conf

socket_dir = '/ahc'

Ex. socket_dir = '/var/run/edb-as' to socket_dir = '/ahc'

3. Now you can restart server/Service.

root@ip-172-31-80-35:/etc/edb/edb-pgpool42# systemctl start edb-pgpool42.service

root@ip-172-31-80-35:/etc/edb/edb-pgpool42# systemctl status edb-pgpool42.service

● edb-pgpool42.service - pgpool-II service script for EDB Postgres Advanced Server

Loaded: loaded (/lib/systemd/system/edb-pgpool42.service; enabled; vendor preset: enabled)

Active: active (running) since Sun 2021-12-26 09:06:13 UTC; 6s ago

Process: 1780 ExecStart=/bin/bash -c /usr/edb/pgpool4.2/bin/edb-pgpool42.sh (code=exited, status=0/SUCCESS)

Main PID: 1781 (pgpool)

Tasks: 33 (limit: 1147)

Memory: 143.6M

CGroup: /system.slice/edb-pgpool42.service

├─1781 /usr/edb/pgpool4.2/bin/pgpool -D -f /etc/edb/edb-pgpool42/pgpool.conf -n -F /etc/edb/edb-pgpool42/pcp.conf >

├─1791 pgpool: wait for connection request

├─1792 pgpool: wait for connection request

├─1793 pgpool: wait for connection request

├─1794 pgpool: wait for connection request

├─1795 pgpool: wait for connection request

├─1796 pgpool: wait for connection request

├─1797 pgpool: wait for connection request

├─1798 pgpool: wait for connection request

├─1799 pgpool: wait for connection request

├─1800 pgpool: wait for connection request

├─1801 pgpool: wait for connection request

├─1802 pgpool: wait for connection request

├─1803 pgpool: wait for connection request

├─1804 pgpool: wait for connection request

├─1805 pgpool: wait for connection request

├─1806 pgpool: wait for connection request

├─1807 pgpool: wait for connection request

├─1808 pgpool: wait for connection request

├─1809 pgpool: wait for connection request

├─1810 pgpool: wait for connection request

├─1811 pgpool: wait for connection request

├─1812 pgpool: wait for connection request

├─1813 pgpool: wait for connection request

├─1814 pgpool: wait for connection request

├─1815 pgpool: wait for connection request

├─1816 pgpool: wait for connection request

├─1817 pgpool: wait for connection request

root@ip-172-31-80-35:/etc/edb/edb-pgpool42#

FATAL: authentication failed for user DETAIL: username and/or password does not match pgpool

POSTGRES FAILOVER/FAILBACK + PGPOOL

In my postgres environment which consists on one primary and one standby database got failover.

After failover the primary become the standby and standby become the primary.

But the issue is pgpool not able to process the connection to new primary node and at application we start receiving below error after failover/failback command(switchover/failover)

Error Reported At Client End

psql: error: server closed the connection unexpectedly

This probably means the server terminated abnormally

before or while processing the request.

Error Reported In Pgpool.log File

FATAL: unable to read from client

DETAIL: pcp_read failed with error : "Success"

LOG: PCP process with pid: 47261 exit with SUCCESS.

LOG: PCP process with pid: 47261 exits with status 256

FATAL: pgpool is not accepting any new connections

DETAIL: all backend nodes are down, pgpool requires at least one valid node

HINT: repair the backend nodes and restart pgpool

LOG: child process with pid: 47254 exits with status 256

LOG: fork a new child process with pid: 47316

LOG: forked new pcp worker, pid=47323 socket=7

FATAL: unable to read from client

DETAIL: pcp_read failed with error : "Success"

LOG: PCP process with pid: 47323 exit with SUCCESS.

LOG: PCP process with pid: 47323 exits with status 256

Resolution

1. Check the pgpool.log file to find the reason for this error and notified below error.

cat /var/log/edb/pgpool4.2/edb-pgpool42.log

DETAIL: all backend nodes are down, pgpool requires at least one valid node

HINT: repair the backend nodes and restart pgpool

2. By looking this error, came to know that pcp_attach_node not ran properly. But I have configured in such a way that it should run automatically after switchover/failover. So what went wrong in my case

/usr/edb/pgpool4.2/bin/pcp_attach_node  -U pcpuser -h **** -w -n 0

3. Run the above command manually to find why the node not attach once standby database(earlier which was primary) came online.

export PCPPASSFILE=/var/lib/edb-as/.pcppass

enterprisedb:~$ /usr/edb/pgpool4.2/bin/pcp_attach_node -U pcpuser -h *** -w -n 0

FATAL: authentication failed for user "pcpuser"

DETAIL: username and/or password does not match

4.Checked my pcp.conf file and .pcppass file to check if the password is same.

enterprisedb@:/etc/edb/edb-pgpool42$ cat pcp.conf

pcpuser:6a01bfa30172639e770a6aacb78a3ed4

enterprisedb@:/etc/edb/edb-pgpool42# cat /var/lib/edb-as/.pcppass

*******:9898:pcpuser:**

It was okay.

5. Finally checked the permission and owner the the pcp.cof and .pcppass. In my case the file permission was the issue.

enterprisedb@ip:/etc/edb/edb-pgpool42$ ls -lrth /var/lib/edb-as/.pcppass

-rw-rw-r-- 1 enterprisedb enterprisedb 30 Dec 27 09:20 /var/lib/edb-as/.pcppass

root@ip:/etc/edb/edb-pgpool42# chmod 0600 /var/lib/edb-as/.pcppass

root@ip:/etc/edb/edb-pgpool42# ls -lrth /var/lib/edb-as/.pcppass

-rw------- 1 enterprisedb enterprisedb 30 Dec 27 09:20 /var/lib/edb-as/.pcppass

enterprisedb@ip:~$ export PCPPASSFILE=/var/lib/edb-as/.pcppass

enterprisedb@ip:~$ /usr/edb/pgpool4.2/bin/pcp_attach_node -U pcpuser -h ***** -w -n 0

pcp_attach_node -- Command Successful

enterprisedb@ip-172-31-80-35:~$