Fixing broken indexes on the queue_log table

Sometimes indexes on the queue_log table end up in a broken state. They keep existing, but the database engine is not able to use them any longer. What you see is that queries asking for a large dataset are very very slow, and the system is nearly unusable. The database uses a lot of CPU and I/O while QueueMetrics is sitting idle most of the time.

If you do not feel confortable doing this, Loway offers 'Per-Incident Support Tickets' so that a qualified technician may connect to your QueueMetrics system and perform this procedure for you.

Does this apply to you?

The first symptoms of this issue are:

  • very high load on the database

  • all of a sudden, reports that used to take seconds start taking minutes

To go further into the issue, you first need to set up slow query logging. See e.g. https://stackoverflow.com/questions/2403793/how-can-i-enable-mysqls-slow-query-log-without-restarting-mysql - a log file will be created with all slow queries.

When you do, you will find entries like the following:

# Time: 180624  1:56:34
# User@Host: queuemetrics[queuemetrics] @ localhost [127.0.0.1]
# Thread_id: 18  Schema: queuemetrics  QC_hit: No
# Query_time: 289.292460  Lock_time: 0.000124  Rows_sent: 1063140  Rows_examined: 47432990
SET timestamp=1529819794;
SELECT `time_id` , `call_id` , `queue` , `agent` , `verb` , `data1` , `data2` , `data3` , `data4` , `data5` , `unique_row_count`   FROM queue_log  WHERE  `partition` = 'P001' AND       (`time_id` >= '1527804000' AND `time_id`<= '1529819505' )  AND `queue` IN ( '', 'NONE'  ,  '100'  , '200'  , '303'  )   ORDER BY `time_id` ASC , `unique_row_count` ASC;

In this case, the database scanned 47M rows to fetch only 1M and took almost five minutes. Not good. In theory, all acceses on the table should be by index, so the number of rows examined and sent should be more or less the same.

This problem is often misleading because if you run an an "EXPLAIN" the database will say that there is a multiple column index and that it is being considered and in fact used. But the query is still slow and there is a big difference between rows sent and examined.

Prerequisites

  • A working QueueMetrics instance with MySQL storage

  • A backup of the current database

  • At least 2x the current size of the 'queue_log' table as free disk space

  • Make sure all Qloaderd/Uniloader instances are stopped. If not you risk data loss.

  • QueueMetrics is switched off.

Make sure you have enough disk space to carry out these operations - one of the few things MySQL does not like is finishing up disk space on a live system. So you should have at least enough space to make a full copy of the queue_log table, with indexes and all.
The following procedure is to be run when the system is idle, as it uses a lot of I/O for large tables and may lock tables. On a very large table each query might take tens of minutes to run, so beware.
You need to make a complete database backup before attempting this.

How it works

To force the database to create a new index, we craete a new queue_log_2 table with the correct indexing schema, copy all data into it and rename it to queue_log. After it is renamed, we restart QueueMetrics that will star reading from the new table.

Procedure

First, we create a copy of the queue_log table as queue_log_2.

CREATE TABLE queue_log_2 LIKE queue_log;

We copy all data from the old table to the new one. As the schema is identical, we do not need to define the exact fields:

INSERT INTO queue_log_2
     SELECT *
       FROM queue_log
   ORDER BY `partition` ASC, `time_id` ASC, `unique_row_count` ASC;

The query above might take a significant amount of time, as data will be copied and indexes rebuilt.

Now we rename the old table to queue_log_old, and rename the new table to queue_log.

RENAME TABLE queue_log to queue_log_old;

RENAME TABLE queue_log_2 TO queue_log;

We can restart QueueMetrics and Uniloader/QLoader.

Cleaning up

After testing that everything is okay and that queries are now way quicker, we can drop the queue_log_old table:

DROP TABLE queue_log_old;