Help with PostgreSQL replication

July 21, 2014 3.6k views

The master and slave servers are set up as per the tutorial https://www.digitalocean.com/community/tutorials/how-to-set-up-master-slave-replication-on-postgresql-on-an-ubuntu-12-04-vps and both servers are Ubuntu 12.04 x64 and are running PostgreSQL 9.1

below are the commands I am running and the errors I am getting, any help will be appreciated.

MASTER SERVER

$ sudo su - postgres
$ psql -c "select pg_start_backup('initial_backup');"
$ rsync -cva --inplace --exclude=*pg_xlog* /var/lib/postgresql/9.1/main/ MAS.TER.SER.VER:/var/lib/postgresql/9.1/main/
$ psql -c "select pg_stop_backup();"
$ psql -c "select pg_stop_backup();"
NOTICE:  pg_stop_backup cleanup done, waiting for required WAL segments to be archived
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
WARNING:  pg_stop_backup still waiting for all required WAL segments to be archived (120 seconds elapsed)
HINT:  Check that your archive_command is executing properly.  pg_stop_backup can be canceled safely, but the database backup will not be usable without all the WAL segments.
^CCancel request sent
ERROR:  canceling statement due to user request

SLAVE SERVER

After running the above on the master I then set up the recovery.conf as per the tutorial and attempt to start the slave server.

$ sudo su - postgres
$ nano /etc/postgresql/9.1/main/recovery.conf

Edit as per tutorial

$ service postgresql start
 * Starting PostgreSQL 9.1 database server                                                                                                                            * Removed stale pid file.
The PostgreSQL server failed to start. Please check the log output:
2014-07-21 22:31:12 BST LOG:  database system was interrupted; last known up at 2014-07-21 22:21:43 BST
2014-07-21 22:31:12 BST LOG:  could not open file "pg_xlog/000000010000000000000018" (log file 0, segment 24): No such file or directory
2014-07-21 22:31:12 BST LOG:  invalid checkpoint record
2014-07-21 22:31:12 BST FATAL:  could not locate required checkpoint record
2014-07-21 22:31:12 BST HINT:  If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.1/main/backup_label".
2014-07-21 22:31:12 BST LOG:  startup process (PID 8274) exited with exit code 1
2014-07-21 22:31:12 BST LOG:  aborting startup due to startup process failure
1 Answer

In the command:

rsync -cva --inplace --exclude=*pg_xlog* /var/lib/postgresql/9.1/main/ MAS.TER.SER.VER:/var/lib/postgresql/9.1/main/

I assume MAS.TER.SER.VER is the IP address of the master postgres server? If so, that should actually be the IP of the slave server.

  • Thanks for the reply, Sorry the IP should be the slave it was an error I just wanted to mask it for the post. To confirm I am executing the command on the master server and it is the SLAVE IP address. here is the latest update:

    after looking through the logs again on the master server it seems that the archive_command was set but commented out I have changed this and restarted.

    Here is an output of the MASTER server:

    SL.AVE.SER.VER = the IP Address of the slave server the command is executed as postgres

    $ psql -c "select pgstartbackup('initial_backup');"
    pgstartbackup
    0/1D000020
    (1 row)
    
    $ rsync -cva --inplace --exclude=pg_xlog /var/lib/postgresql/9.1/main/ SL.AVE.SER.VER:/var/lib/postgresql/9.1/main/
    sending incremental file list
    ./
    backuplabel
    postmaster.opts
    postmaster.pid
    rsync: failed to set times on "/var/lib/postgresql/9.1/main/server.crt": Operation not permitted (1)
    rsync: failed to set times on "/var/lib/postgresql/9.1/main/server.key": Operation not permitted (1)
    base/11951/
    base/11951/pginternal.init
    base/25911/
    base/26683/
    base/26683/pginternal.init
    global/
    global/pgcontrol
    global/pginternal.init
    pgnotify/
    pgstattmp/
    pgstattmp/pgstat.stat
    
    sent 293815 bytes received 836 bytes 45330.92 bytes/sec
    total size is 38129968 speedup is 129.41
    rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]
    $ psql -c "select pgstopbackup();"
    NOTICE: pgstopbackup complete, all required WAL segments have been archived
    
    pgstopbackup
    0/1D0000D8
    (1 row)
    

    Previously I got

    WARNING: pg_stop_backup still waiting for all required WAL segments to be archived (60 seconds elapsed)
    

    So this seems to be fixed. However on the SLAVE server this is the output:

    $ service postgresql start
     * Starting PostgreSQL 9.1 database server                                                                                                                            * Removed stale pid file.
    The PostgreSQL server failed to start. Please check the log output:
    2014-07-22 20:07:39 BST LOG:  database system was interrupted; last known up at 2014-07-22 20:07:02 BST
    2014-07-22 20:07:39 BST LOG:  could not open file "pg_xlog/00000001000000000000001D" (log file 0, segment 29): No such file or directory
    2014-07-22 20:07:39 BST LOG:  invalid checkpoint record
    2014-07-22 20:07:39 BST FATAL:  could not locate required checkpoint record
    2014-07-22 20:07:39 BST HINT:  If you are not restoring from a backup, try removing the file "/var/lib/postgresql/9.1/main/backup_label".
    2014-07-22 20:07:39 BST LOG:  startup process (PID 10375) exited with exit code 1
    2014-07-22 20:07:39 BST LOG:  aborting startup due to startup process failure
    
    
Have another answer? Share your knowledge.