Tutorial

How To Set Up Physical Streaming Replication with PostgreSQL 12 on Ubuntu 20.04

How To Set Up Physical Streaming Replication with PostgreSQL 12 on Ubuntu 20.04

The author selected the COVID-19 Relief Fund to receive a donation as part of the Write for DOnations program.

Introduction

Streaming replication is a popular method you can use to horizontally scale your relational databases. It uses two or more copies of the same database cluster running on separate machines. One database cluster is referred to as the primary and serves both read and write operations; the others, referred to as the replicas, serve only read operations. You can also use streaming replication to provide high availability of a system. If the primary database cluster or server were to unexpectedly fail, the replicas are able to continue serving read operations, or (one of the replicas) become the new primary cluster.

PostgreSQL is a widely used relational database that supports both logical and physical replication. Logical replication streams high-level changes from the primary database cluster to the replica databases. Using logical replication, you can stream changes to just a single database or table in a database. However, in physical replication, changes to the WAL (Write-Ahead-Logging) log file are streamed and replicated in the replica clusters. As a result, you can’t replicate specific areas of a primary database cluster, but instead all changes to the primary are replicated.

In this tutorial, you will set up physical streaming replication with PostgreSQL 12 on Ubuntu 20.04 using two separate machines running two separate PostgreSQL 12 clusters. One machine will be the primary and the other, the replica.

Prerequisites

To complete this tutorial, you will need the following:

  • Two separate machines Ubuntu 20.04 machines; one referred to as the primary and the other referred to as the replica. You can set these up with our Initial Server Setup Guide, including non-root users with sudo permissions and a firewall.

Step 1 — Configuring the Primary Database to Accept Connections

In this first step, you’ll configure the primary database to allow your replica database(s) to connect. By default, PostgreSQL only listens to the localhost (127.0.0.1) for connections. To change this, you’ll first edit the listen_addresses configuration parameter on the primary database.

On your primary server, open the PostgreSQL configuration file, postgresql.conf, found in the /etc/postgresql/12/main/ directory:

  1. sudo nano /etc/postgresql/12/main/postgresql.conf

Once the file is open, locate the listen_addresses variable and change the value from localhost to the IP address of your primary server. You will also need to remove the # character from the start of the line like the following:

/etc/postgresql/12/main/postgresql.conf
. . .
listen_addresses = 'your_primary_IP_address'
. . .

Save and exit the file.

Once you’re done, your primary database will now be ready to accept connections from other machines on the IP address you entered. Next, you’ll create a role with the appropriate permissions that the replica will use when connecting to the primary.

Step 2 — Creating a Special Role with Replication Permissions

Now, you need to create a role in the primary database that has permission to replicate the database. Your replica will use this role when connecting to the primary. Creating a separate role just for replication also has security benefits. Your replica won’t be able to manipulate any data on the primary; it will only be able to replicate the data.

First, connect to the database cluster as the postgres user with the following command:

  1. sudo -u postgres psql

To create a role, you need to run the CREATE ROLE command on the cluster as follows:

  1. CREATE ROLE test WITH REPLICATION PASSWORD 'testpassword' LOGIN;

You’ll receive the following output:

Output
CREATE ROLE

This command creates a role named test with the password 'testpassword', which has permission to replicate the database cluster.

PostgreSQL has a special replication pseudo-database that the replica connects to, but you first need to edit the /etc/postgresql/12/main/pg_hba.conf configuration file to allow your replica to access it. So, exit the PostgreSQL command prompt by running:

  1. \q

Now that you’re back at your terminal command prompt, open the /etc/postgresql/12/main/pg_hba.conf configuration file using nano:

  1. sudo nano /etc/postgresql/12/main/pg_hba.conf

Append the following line to the end of the pg_hba.conf file:

/etc/postgresql/12/main/pg_hba.conf
. . .
host    replication     test    your-replica-IP/32   md5

This ensures that your primary allows your replica to connect to the replication pseudo-database using the role, test, you created earlier. The host value means to accept non-local connections via plain or SSL-encrypted TCP/IP sockets. replication is the name of the special pseudo-database that PostgreSQL uses for replication. Finally, the value md5 is the type of authentication used. If you want to have more than one replica, just add the same line again to the end of the file with the IP address of your other replica.

To ensure these changes to the configuration file are implemented, you need to restart the primary cluster using:

  1. sudo systemctl restart postgresql@12-main

If your primary cluster restarted successfully, it is correctly set up and ready to start streaming once your replica connects. Next, you’ll move on to setting up your replica cluster.

Step 3 — Backing Up the Primary Cluster on the Replica

As you are setting up physical replication with PostgreSQL in this tutorial, you need to perform a physical backup of the primary cluster’s data files into the replica’s data directory. To do this, you’ll first clear out all the files in the replica’s data directory. The default data directory for PostgreSQL on Ubuntu is /var/lib/postgresql/12/main/.

You can also find PostgreSQL’s data directory by running the following command on the replica’s database:

  1. SHOW data_directory;

Once you have the location of the data directory, run the following command to remove everything:

  1. sudo -u postgres rm -r /var/lib/postgresql/12/main/*

Since the default owner of the files in the directory is the postgres user, you will need to run the command as postgres using sudo -u postgres.

Note: If in the exceedingly rare case a file in the directory is corrupted and the command does not work, remove the main directory all together and recreate it with the appropriate permissions as follows:

  1. sudo -u postgres rm -r /var/lib/postgresql/12/main
  2. sudo -u postgres mkdir /var/lib/postgresql/12/main
  3. sudo -u postgres chmod 700 /var/lib/postgresql/12/main

Now that the replica’s data directory is empty, you can perform a physical backup of the primary’s data files. PostgreSQL conveniently has the utility pg_basebackup that simplifies the process. It even allows you to put the server into standby mode using the -R option.

Execute the pg_basebackup command on the replica as follows:

  1. sudo -u postgres pg_basebackup -h primary-ip-addr -p 5432 -U test -D /var/lib/postgresql/12/main/ -Fp -Xs -R
  • The -h option specifies a non-local host. Here, you need to enter the IP address of your server with the primary cluster.

  • The -p option specifies the port number it connects to on the primary server. By default, PostgreSQL uses port :5432.

  • The -U option allows you to specify the user you connect to the primary cluster as. This is the role you created in the previous step.

  • The -D flag is the output directory of the backup. This is your replica’s data directory that you emptied just before.

  • The -Fp specifies the data to be outputted in the plain format instead of as a tar file.

  • -Xs streams the contents of the WAL log as the backup of the primary is performed.

  • Lastly, -R creates an empty file, named standby.signal, in the replica’s data directory. This file lets your replica cluster know that it should operate as a standby server. The -R option also adds the connection information about the primary server to the postgresql.auto.conf file. This is a special configuration file that is read whenever the regular postgresql.conf file is read, but the values in the .auto file override the values in the regular configuration file.

When the pg_basebackup command connects to the primary, you will be prompted to enter the password for the role you created in the previous step. Depending on the size of your primary database cluster, it may take some time to copy all the files.

Your replica will now have all the data files from the primary that it requires to begin replication. Next, you’ll be putting the replica into standby mode and start replicating.

Step 4 — Restarting and Testing the Clusters

Now that the primary cluster’s data files have been successfully backed up on the replica, the next step is to restart the replica database cluster to put it into standby mode. To restart the replica database, run the following command:

  1. sudo systemctl restart postgresql@12-main

If your replica cluster restarted in standby mode successfully, it should have already connected to the primary database cluster on your other machine. To check if the replica has connected to the primary and the primary is streaming, connect to the primary database cluster by running:

  1. sudo -u postgres psql

Now query the pg_stat_replication table on the primary database cluster as follows:

  1. SELECT client_addr, state FROM pg_stat_replication;

Running this query on the primary cluster will output something similar to the following:

Output
client_addr | state ------------------+----------- your_replica_IP | streaming

If you have similar output, then the primary is correctly streaming to the replica.

Conclusion

You now have two Ubuntu 20.04 servers each with a PostgreSQL 12 database cluster running with physical streaming between them. Any changes now made to the primary database cluster will also appear in the replica cluster.

You can also add more replicas to your setup if your databases need to handle more traffic.

If you wish to learn more about physical streaming replication including how to set up synchronous replication to ensure zero chance of losing any mission-critical data, you can read the entry in the official PostgreSQL docs.

You can check out our PostgreSQL topic page for more tutorials and content.

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about us


About the authors


Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
4 Comments


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

What is the best practice when you have around 20GB on the primary Server running postgres and the standby node is clean/newly created. How to configure streaming replication in this case? Thanks for this nice and usefull top.

concise and easy to follow, i used two Centos nodes instead of Ubuntu, was a brease…thanks for the article

Great article. Thanks for creating.

ALTER SYSTEM SET listen_addresses TO 'your_replica_IP_addr';

This should be 'your_master_internal_IP_addr'

Try DigitalOcean for free

Click below to sign up and get $200 of credit to try our products over 60 days!

Sign up

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Get our biweekly newsletter

Sign up for Infrastructure as a Newsletter.

Hollie's Hub for Good

Working on improving health and education, reducing inequality, and spurring economic growth? We'd like to help.

Become a contributor

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

Welcome to the developer cloud

DigitalOcean makes it simple to launch in the cloud and scale up as you grow — whether you're running one virtual machine or ten thousand.

Learn more
DigitalOcean Cloud Control Panel