Using Master-master database replication for strong high-availability

It is possible to obtain strong, resilient high availability over a wide area network with automated replication using master-master replication.

Imagine this scenario: you have 4 Asterisk servers that are physically in two locations connected over a WAN; you want the cluster monitoring to be available from any location, and you want to have a usable (local) system even in a case where the WAN should become unavailable.

        SITE A                    SITE B
     IPs 10.10.1.x             IPs 10.10.2.x

      +---------+               +---------+
      |  QM A   |               |  QM  B  |
      +---------+               +---------+
           |                         |
           |                         |
      +---------+  == A->B ==>  +---------+
      | MySQL A |               | MySQL B |
      +---------+  <== B->A ==  +---------+
        |     |                   |     |
        |     |                   |     |
+---------+ +---------+   +---------+ +---------+
| PBX A1  | | PBX A2  |   | PBX B1  | |  PBX B2 |
+---------+ +---------+   +---------+ +---------+

What we do is this: we connect the two MySQL servers in a master-master fashion, so that data from one is automatically replicated to the other within a minimun delay.

In case the WAN goes down, data from the local PBXs is still available in real-time; when the WAN comes back online, the two databses sync automatically and make all data available to all users again.

How it works:

  • Each PBX has a local qloaderd that uploads data to the local MySQL database; each PBX uploads data to a partition that has its own name in it (e.g. PBX A1 uploads data to partition PA1, A2 to PA2, B1 to PB1, B2 to PB2)

  • Each MySQL server is configured to insert rows with a unique id that is always odd for server A and always even for server B; this way the same table can be shared on insert with no issues

  • Each MySQL server holds all the data; some coming from local Qloaderds and some from the replica of the other database

  • Each QM server is able to monitor all four PBXs at once; using a cluster of all partitions - so it does not need to know what is where

Prerequisites

  • Two clustered QueueMetrics licenses.

  • All server clocks aligned to a sub-second difference via NTP

  • MySQL server version 5 or later

When doing this tutorial, we assume that we have a working QueueMetrics database on MySQL "A" while we have nothing on server "B". During the replication, we will clone the database on server A to the new server B.

Server A Server B

QM IP Address

10.10.1.10

10.10.2.10

MySQL IP address

10.10.1.11

10.10.2.11

Asterisk 1

10.10.1.12

10.10.2.12

Asterisk 2

10.10.1.13

10.10.2.13

QM database

queuemetrics

queuemetrics

Changes to MySQL

Changes to the insert order

On server A, you add the following lines to your /etc/my.cnf configuration file, under the [mysqld] section:

auto_increment_increment= 2
auto_increment_offset   = 1

This way all inserted lines will be odd.

You do the same on server B.

auto_increment_increment= 2
auto_increment_offset   = 2

In this case all inserted lines will be even.

On both servers, make sure that the MySQL server is available on a public IP by editing '/etc/mysql/my.cnf' - this is not so by default:

bind-address = 0.0.0.0

Restart both MySQL servers.

Configure replica from A (master) to B (slave)

First, upload the QM default database on MySQL server A.

On server A, we create a slave for replica to server B:

GRANT REPLICATION SLAVE ON queuemetrics.*
   TO 'slave_b'@'%'
   IDENTIFIED BY 'slave_b_pass';
FLUSH PRIVILEGES;

then we edit server B’s my.cnf file and set:

[mysqld]
server-id = 1
replicate-same-server-id = 0
auto_increment_increment= 2
auto_increment_offset   = 2

master-host = 10.10.1.11
master-user = slave_a
master-password = slave_a_password
master-connect-retry = 60
replicate-do-db = queuemetrics

log-bin = /var/log/mysql/mysql-bin.log
binlog-do-db = queuemetrics
log-slave-updates

relay-log = /var/lib/mysql/slave-relay.log
relay-log-index = /var/lib/mysql/slave-relay-log.index

expire_logs_days        = 10
max_binlog_size         = 500M

After this, we create a dump of the database on A and SHOW MASTER STATUS, as in:

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000010 |     1067 | queuemetrics |                  |
+------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

Then unlock the tables:

mysql> UNLOCK TABLES;

and run the following command to make server3 a slave of server2 (it is important that you replace the values in the following command with the values you got from the SHOW MASTER STATUS command that we ran on server2!):

mysql> CHANGE MASTER TO
				MASTER_HOST='192.168.0.101',
				MASTER_USER='slaveuser_for_s3',
				MASTER_PASSWORD='slave_user_for_server3_password',
				MASTER_LOG_FILE='mysql-bin.000010',
				MASTER_LOG_POS=1067;

You see that the values for 'MASTER_LOG_FILE' and 'MASTER_LOG_POS' come from the MASTER STATUS query.

Finally start the slave:

START SLAVE;

Then check the slave status: It is important that both 'Slave_IO_Running' and 'Slave_SQL_Running' have the value Yes in the output (otherwise something went wrong, and you should check your setup again and take a look at '/var/log/syslog' or the MySQL logs to find out about any errors):

mysql> SHOW SLAVE STATUS \G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 192.168.0.101
                Master_User: slaveuser_for_s3
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000010
        Read_Master_Log_Pos: 1067
             Relay_Log_File: slave-relay.000002
              Relay_Log_Pos: 235
      Relay_Master_Log_File: mysql-bin.000010
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB: exampledb
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 1067
            Relay_Log_Space: 235
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

Configure replica from B (master) to A (slave)

GRANT REPLICATION SLAVE ON queuemetrics.*
   TO 'slave_a'@'%'
   IDENTIFIED BY 'slave_a_pass';
FLUSH PRIVILEGES;

Changes to QueueMetrics

Just point each QM instance to its copy of the database as you would for two distinct instances. Make sure that the changes you make on one server are immediately available on the other one. Also, make sure that each qloaderd instance uses a distinct partition.