π [en] MySQL Tuple Comparison (Row Constructor): Use with Caution
Using tuple comparison-based queries in MySQL can cause unexpected performance issues. (very slow)
Row Constructor
- In MySQL, you can group multiple columns into a single tuple-like value using the ROW(col1, col2, col3, β¦) or (col1, col2, β¦) syntax for comparison.
SELECT *
FROM your_table_name
WHERE (col1, col2) = (x, y);
This method can make your queries easier to read, especially when handling with complex conditions involving multiple columns.
However, such queries can become very slow.
- Related report on MySQL
- An issue where using tuple comparison causes indexes to be ignored, leading to a Full Table Scan.
Letβs look an example.
- Table Schema
create table comment
(
comment_id bigint not null
primary key,
content varchar(3000) not null,
article_id bigint not null,
parent_comment_id bigint not null,
writer_id bigint not null,
is_deleted tinyint(1) not null,
created_at datetime not null
);
create index idx_article_id_parent_comment_id_comment_id
on comment (article_id, parent_comment_id, comment_id);
- mysql base image: mysql:8.0.38
- test dataset: ~= 8 million rows
Case 1. tuple comparison (slow case)
explain analyze
select comment.comment_id,
comment.parent_comment_id,
comment.article_id,
comment.writer_id,
comment.content,
comment.is_deleted,
comment.created_at
from comment
where article_id = 1
and (parent_comment_id, comment_id) > (142539921307124354, 142539921307124350)
order by parent_comment_id, comment_id
limit 30;
- Result
-> Limit: 30 row(s) (cost=542979 rows=30) (actual time=8620..8620 rows=30 loops=1) -> Filter: ((`comment`.comment_id,`comment`.parent_comment_id) > (142539921307124354,142539921307124350)) (cost=542979 rows=4.01e+6) (actual time=8620..8620 rows=30 loops=1) -> Index lookup on comment using idx_article_id_parent_comment_id_comment_id (article_id=1) (cost=542979 rows=4.01e+6) (actual time=1.83..8251 rows=8e+6 loops=1)
As shown, an index full scan occurs. It scans all 8 million rows and takes about 8 seconds to run the query.
The tuple comparison (a, b) > (x, y) can be decomposed into (a > x) OR (a = x AND b > y).
Case 2. Decomposed Conditions (fast case)
explain analyze
select comment.comment_id,
comment.parent_comment_id,
comment.article_id,
comment.writer_id,
comment.content,
comment.is_deleted,
comment.created_at
from comment
where article_id = 1
and (
parent_comment_id > 142539921307124354
or
(parent_comment_id = 142539921307124354 and comment_id > 142539921307124350)
)
order by parent_comment_id, comment_id
limit 30;
- Result
-> Limit: 30 row(s) (cost=416 rows=30) (actual time=0.252..0.727 rows=30 loops=1) -> Index range scan on comment using idx_article_id_parent_comment_id_comment_id over (article_id = 1 AND parent_comment_id = 142539921307124354 AND 142539921307124350 < comment_id) OR (article_id = 1 AND 142539921307124354 < parent_comment_id), with index condition: ((`comment`.article_id = 1) and ((`comment`.parent_comment_id > 142539921307124354) or ((`comment`.parent_comment_id = 142539921307124354) and (`comment`.comment_id > 142539921307124350)))) (cost=416 rows=358) (actual time=0.232..0.705 rows=30 loops=1)
MySQL performs an Index Range Scan, making the query run fast in under 0.7 seconds.
Conclusion
- Use Explicit Conditions
- Avoid using
Row Constructors
for tuple comparisons with inequalities. Instead, decompose the conditions. This helps MySQLβs optimizer use indexes effectively.
- Avoid using
- Always use
EXPLAIN
to understand query plan- Before using any query, use EXPLAIN ANALYZE to see the query plan.
- Ensure that indexes are being used as expected.
By following these tips, I hope you can prevent performance issues related to tuple comparisons in MySQL.
Leave a comment