Removing duplicate rows from the queue_log table

If multiple instances of the qloaderd were run at the same time, it is possible that data was loaded multiple times. The correct way to handle this is to use the 'Data Queue Partial Update' mode - see http://manuals.loway.ch/QLoader-chunked/ar01s03.html - and reload any queue_log files involved.

If this is not possible, for example, because that data was rotated out of the system and original queue_log files are unavailable, it is possible to do this operation at the SQL level.

If you do not feel confortable doing this, Loway offers 'Per-Incident Support Tickets' so that a qualified technician may connect to your QueueMetrics system and perform this procedure for you.

Prerequisites

  • A working QueueMetrics instance with MySQL storage

  • About 2x the current size of the 'queue_log' table as free disk space

  • Make sure all qloaderd instances are stopped

Make sure you have enough disk space to carry out these operations - one of the few things MySQL does not like is finishing up disk space on a live system. So you should have at least enough space to make a full copy of the queue_log table, with indexes and all.
The following procedure is to be run when the system is idle, as it uses a lot of I/O for large tables and may lock tables. On a very large table each query might take tens of minutes to run, so beware.
You need to make a complete database backup before attempting this.

Reality check

Before you attempt this procedure, make sure you run the following query:

 SELECT `partition`, time_id, unique_row_count, count(*)
   FROM queue_log
  GROUP BY `partition`, time_id, unique_row_count
 HAVING count(*) > 1;

It shoud return exactly zero results. If it is not so, the following procedure may not work correctly.

Also, you should be aware of how much data is available and under which partitions:

 SELECT `partition`, count(*)
   FROM queue_log
  GROUP BY `partition`
  ORDER BY `partition` ASC

Make sure that you have no partitions which name starts with "o_".

Spotting duplicate rows

If you want to know which rows are duplicates, this is easy to perform in SQL:

 SELECT  `partition`, `time_id`, `call_id`, `queue`,
         `agent`, `verb`, `data1`, `data2`, `data3`, `data4`, `data5`, `serverid`
   FROM queue_log
  WHERE `partition` = "P001"
    AND `time_id`   > 0
  GROUP BY `partition`, `time_id`, `call_id`, `queue`,
           `agent`, `verb`, `data1`, `data2`, `data3`, `data4`, `data5`, `serverid`
 HAVING count(*) > 1

Make sure you set the partition correctly and that you specify a minimum time_id so that you can avoid applying the query to the whole table, which might be very slow.

Loading unique rows

First, we make a backup of the queue_log table into a temporary table (this is not strictly needed, as you should already have a full database backup before attempting this - but may come in handy in case you need to restore quickly):

 CREATE TABLE queue_log_backup  AS
 SELECT *
   FROM queue_log;

Then we move all data from their current partition to a partition of the same name but prefixed with "o_". From now onwards, QueueMetrics will not find data in the database until the procedure completes.

UPDATE queue_log
   SET `partition` =  concat( "o_", `partition`);

Now we keep the highest unique_row for each duplicate row belonging to a partition starting with "o_" and load it to a partition with their original name:

INSERT INTO queue_log
SELECT substring( partition, 3) as `partition`, `time_id`, `call_id`, `queue`,
       `agent`, `verb`, `data1`, `data2`, `data3`, `data4`, `data5`, `serverid`,
       max(`unique_row_count`)
  FROM queue_log
 WHERE `partition` LIKE 'o_%'
 GROUP BY `partition`, `time_id`, `call_id`, `queue`,
          `agent`, `verb`, `data1`, `data2`, `data3`, `data4`, `data5`, `serverid`

This query might take a while to run. Data will reappear in the "correct" partition, and is still available in the temporary "o_" partition as well.

Now check with QueueMetrics that you have no more duplicate rows.

If you still have, run a database restore.

Cleaning up

First we remove the temporary table:

DROP TABLE queue_log_backup;

Then we remove data from temporary partitions:

DELETE FROM queue_log
 WHERE `partition` LIKE 'o_%';

After large inserts and deletes, it is better to resort the table and optimize it so that QueueMetrics can access it efficiently.

ALTER TABLE queue_log
   ORDER BY `partition` ASC, `time_id` ASC, `unique_row_count` ASC;

OPTIMIZE TABLE queue_log;

Do not forget to restart the 'qloaderd' when done.