Wednesday, January 4, 2012

Selecting fewer columns yields faster queries

While writing about ORDER BY and LIMIT in my last post, I start to notice an increase in speed when the number columns requests was limited to just a few. So I decided to run some test to see if request only a few columns yields faster results. My gut is telling its a no-brainer and the answer is yes, but lets run a few test just to show how much faster.
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)

mysql> SELECT p.pat_id,p.revision_number,p.gurantor_id,p.extern_id1,p.first_name,p.first_name_phn,p.last_name,p.last_name_phn,p.middle_name,p.preferred_first_name,p.title,p.degree,p.suffix,p.address1,p.address2,p.address3,p.county,p.city,p.state,p.zip_code,p.country,p.home_phone,p.work_phone,p.cell_phone,p.fax_number,p.alternate_phone,p.email,p.race,p.sex,p.ssn,p.sin,p.marital_status,p.spouse_name,p.spouse_birthdate,p.emergency_contact,p.emergency_phone,p.employment_status,p.employer_uid,p.employer_name,p.employer_addr1,p.employer_addr2,p.employer_city,p.employer_state,p.employer_zipcode,p.employer_country,p.interface,p.birth_date,p.death_indicator,p.death_date,p.chart_online,p.active,p.is_patient,p.is_tmp, p.signature_date,p.create_date,p.edit_date,p.revised_by, pp.name, pp.partition, pm.mrnumber, IF(pl.user_id IS NULL OR pul.user_id IS NOT NULL, 0, 1) AS locked 
    -> FROM patient_mrns AS pm 
    -> LEFT JOIN patient_partitions AS pp ON pm.partition=pp.partition 
    -> LEFT JOIN patients AS p ON p.pat_id=pm.pat_id 
    -> LEFT JOIN patient_restrictions AS pl ON p.pat_id=pl.pat_id 
    -> LEFT JOIN patient_restrictions AS pul ON p.pat_id=pul.pat_id AND pul.user_id=8554 
    -> WHERE pp.echart_opts != 'H' AND pm.mrnumber LIKE '1%' AND pp.echart_opts!='H' 
    -> GROUP BY pm.partition,pm.mrnumber ORDER BY p.last_name,p.first_name,pp.part_order ASC LIMIT 10;

...

10 rows in set (3.17 sec)

mysql> show status like 'Handler%';
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 1      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 1      |
| Handler_read_key           | 575855 |
| Handler_read_next          | 760721 |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 10     |
| Handler_read_rnd_next      | 255326 |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_update             | 0      |
| Handler_write              | 255325 |
+----------------------------+--------+
15 rows in set (0.00 sec)

mysql> show status like 'Created_tmp_%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 1     |
| Created_tmp_files       | 3     |
| Created_tmp_tables      | 1     |
+-------------------------+-------+
3 rows in set (0.00 sec)

mysql> show status like 'Sort_%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 4     |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)
Now with just one selected column:
mysql> flush status;
Query OK, 0 rows affected (0.00 sec)
 
mysql> SELECT p.pat_id
    -> FROM patient_mrns AS pm 
    -> LEFT JOIN patient_partitions AS pp ON pm.partition=pp.partition 
    -> LEFT JOIN patients AS p ON p.pat_id=pm.pat_id 
    -> LEFT JOIN patient_restrictions AS pl ON p.pat_id=pl.pat_id 
    -> LEFT JOIN patient_restrictions AS pul ON p.pat_id=pul.pat_id AND pul.user_id=8554 
    -> WHERE pp.echart_opts != 'H' AND pm.mrnumber LIKE '1%' AND pp.echart_opts!='H' 
    -> GROUP BY pm.partition,pm.mrnumber ORDER BY p.last_name,p.first_name,pp.part_order ASC LIMIT 10;

...

10 rows in set (1.91 sec)

mysql> 
mysql> show status like 'Handler%';
+----------------------------+--------+
| Variable_name              | Value  |
+----------------------------+--------+
| Handler_commit             | 1      |
| Handler_delete             | 0      |
| Handler_discover           | 0      |
| Handler_prepare            | 0      |
| Handler_read_first         | 1      |
| Handler_read_key           | 575852 |
| Handler_read_next          | 760721 |
| Handler_read_prev          | 0      |
| Handler_read_rnd           | 10     |
| Handler_read_rnd_next      | 192145 |
| Handler_rollback           | 0      |
| Handler_savepoint          | 0      |
| Handler_savepoint_rollback | 0      |
| Handler_update             | 0      |
| Handler_write              | 192144 |
+----------------------------+--------+
15 rows in set (0.00 sec)

mysql> show status like 'Created_tmp_%';
+-------------------------+-------+
| Variable_name           | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 0     |
| Created_tmp_files       | 3     |
| Created_tmp_tables      | 1     |
+-------------------------+-------+
3 rows in set (0.00 sec)

mysql> show status like 'Sort_%';
+-------------------+-------+
| Variable_name     | Value |
+-------------------+-------+
| Sort_merge_passes | 4     |
| Sort_range        | 0     |
| Sort_rows         | 10    |
| Sort_scan         | 1     |
+-------------------+-------+
4 rows in set (0.00 sec)

Conclusion

  • Query 1: 10 rows in set (3.17 sec)
  • Query 2: 10 rows in set (1.91 sec)
Thats 1.26 faster by simply reducing the number of columns in the result.

You can also see that on the second query the Created_tmp_disk_tables=0 and Handler_write=192144, (255325 - 192144) = 63181 less writes than the first query.

So - the next time you decided to return all the columns in several tables while doing GROUP BY and ORDER BY, think twice. If your schema is in 3rd normal form, then one method of fixing this is by using a DERIVED table that only returns a unique or primary key and if you can add a LIMIT. This will make the tmp table very small for sorting. Then join the table with the columns needed for the result using the primary key.

No comments:

Post a Comment