MYSQL Optimization Tips: 5 Major Steps + 10 Cases

General Steps for SQL Optimization

1 Locate SQL statements with low execution efficiency via slow query log, etc.

2 Use explain to analyze the SQL execution plan

Focus on type, rows, filtered, and extra.

type from top to bottom, efficiency increases

ALL full table scan
index full index scan
range index range scan, commonly used with <, <=, >=, between, in, etc.
ref non-unique index scan or unique index prefix scan, returns single row, often appears in join queries
eq_ref similar to ref, but uses a unique index, used in join queries with primary key
const/system single row, the system treats other columns of the matching row as constants, e.g., primary key or unique index query
null MySQL does not access any table or index, returns result directly

Although efficiency increases from top to bottom, according to the cost model, suppose there are two indexes idx1(a, b, c) and idx2(a, c), and the SQL is select * from t where a = 1 and b in (1, 2) order by c;. If it goes through idx1, the type is range; if it goes through idx2, the type is ref. When the number of rows to be scanned using idx2 is about 5 times more than using idx1, idx1 will be used; otherwise, idx2 will be used.

Extra

Using filesort: MySQL needs an extra pass to find out how to retrieve rows in sorted order. It sorts by browsing all rows according to the join type and saving the sort key and row pointer for all rows that match the WHERE clause. Then the keys are sorted, and rows are retrieved in sorted order.
Using temporary: a temporary table is used to save intermediate results, performance is particularly poor, needs key optimization.
Using index: indicates that a covering index is used in the corresponding SELECT operation, avoiding accessing the data rows of the table, good efficiency! If "Using where" also appears, it means the index lookup alone cannot query the data that meets the conditions.
Using index condition: ICP (Index Condition Pushdown) added after MySQL 5.6, "Using index condition" means ICP is used, filtering data at the storage engine layer instead of the server layer, using existing index data to reduce table lookups.

3 Use show profile for analysis

Understand the state and time consumption of SQL execution threads.

It is disabled by default; enable it with set profiling = 1;

SHOW PROFILES ;
SHOW PROFILE FOR QUERY  #{id};

4 trace

The trace analyzes how the optimizer chooses the execution plan, and by looking at the trace file, we can further understand why the optimizer chose execution plan A over plan B.

set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1000000;
select * from information_schema.optimizer_trace;

5 Identify the problem and take appropriate measures

Optimize indexes
Optimize SQL statements: modify SQL, segment IN queries, segment time queries, filter based on previous data
Use alternative implementations: ES, data warehouse, etc.
Data fragmentation handling

Based on the microservice philosophy, a project practice built in a B2C e-commerce scenario. The core technology stack is Spring Boot + Dubbo. In the future, it will be refactored to Spring Cloud Alibaba.

Project address: https://github.com/YunaiV/onemall

Scenario Analysis

Case 1: Leftmost Prefix Rule

Index:

KEY `idx_shopid_orderno` (`shop_id`,`order_no`)

SQL statement:

select * from _t where orderno=''

Query matching proceeds from left to right. To use order_no to go through the index, the query condition must include shop_id, or the order of (shop_id, order_no) in the index must be swapped.

Case 2: Implicit Conversion

Index:

KEY `idx_mobile` (`mobile`)

SQL statement:

select * from _user where mobile=12345678901

Implicit conversion is equivalent to performing an operation on the index, causing the index to become invalid. mobile is a character type, but a number is used. String matching should be used; otherwise, MySQL will perform an implicit conversion, leading to index invalidation.

Case 3: Large Pagination

Index:

KEY `idx_a_b_c` (`a`, `b`, `c`)

SQL statement:

select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;

For large pagination scenarios, it is preferable to let the product team optimize the requirements. If no optimization is performed, there are two optimization methods:

Pass the last row's data from the previous page, i.e., the c value above, and then do a c < xxx condition. However, this usually requires changing the interface protocol and may not be feasible.
Another method is to use a deferred join to reduce table lookups. However, the index must be a covering index for it to be effective. The modified SQL is as follows:

SELECT
 t1.*
FROM
 _t t1,
 ( SELECT id FROM _t WHERE a = 1 AND b = 2 ORDER BY c DESC LIMIT 10000, 10 ) t2
WHERE
 t1.id = t2.id;

Case 4: IN + ORDER BY

Index:

KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)

SQL statement:

SELECT
 *
FROM
 _order
WHERE
 shop_id = 1
 AND order_status IN ( 1, 2, 3 )
ORDER BY
 created_at DESC
 LIMIT 10

In the MySQL internals, an IN query uses an n*m search method, similar to UNION, but more efficient than UNION.

When calculating the cost for an IN query (cost = number of tuples * average IO), MySQL retrieves the number of tuples by querying each value in the IN list one by one. Therefore, this calculation process can be slow. MySQL sets a threshold (eq_range_index_dive_limit). After version 5.6, if this threshold is exceeded, the cost for that column is no longer calculated.

This can lead to inaccurate execution plan selection. The default is 200; if the IN condition contains more than 200 values, the cost calculation for the IN clause may be problematic, potentially causing MySQL to choose an inaccurate index.

A solution is to swap the order of (order_status, created_at) and adjust the SQL to use a deferred join.

Case 5: Range Query Blocks Subsequent Columns from Using Index

Index:

KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)

SQL statement:

SELECT
 *
FROM
 _order
WHERE
 shop_id = 1
 AND created_at > '2021-01-01 00:00:00'
 AND order_status = 10

Range queries also include "IN" and "between".

Case 6: Not Equal, Not Contains Cannot Use Index Fast Search (Can Use ICP)

select * from _order where shop_id=1 and order_status not in (1,2)

select * from _order where shop_id=1 and order_status != 1

Avoid using NOT, !=, <>, !<, !>, NOT EXISTS, NOT IN, NOT LIKE, etc., on indexed columns.

Case 7: Optimizer Chooses Not to Use the Index

If the amount of data to be accessed is very small, the optimizer will still choose a secondary index. However, if the accessed data constitutes a large portion of the entire table (generally around 20%), the optimizer will choose to look up data through the clustered index.

select * from _order where  order_status = 1

This query retrieves all unpaid orders. Usually, such orders are very few, and even if an index is built, it cannot be used.

Case 8: Complex Queries

select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';

select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;

If it is for statistical data, consider switching to a data warehouse solution.

If the business requires such complex queries, it may be advisable not to continue using SQL and instead adopt other methods, such as using ES.

Case 9: Mixing ASC and DESC

select * from _t where a=1 order by b desc, c asc
Mixing desc and asc can cause index invalidation.

Case 10: Large Data Volume

For push business data storage, the data volume may be very large. If the chosen solution ultimately stores data in MySQL and retains it for a validity period such as 7 days, be aware that frequent data cleanup can cause data fragmentation. Contact the DBA for data fragmentation handling.

MYSQL Optimization Tips: 5 Major Steps + 10 Cases

General Steps for SQL Optimization

1 Locate SQL statements with low execution efficiency via slow query log, etc.

2 Use explain to analyze the SQL execution plan

3 Use show profile for analysis

4 trace

5 Identify the problem and take appropriate measures

Scenario Analysis

Case 1: Leftmost Prefix Rule

Case 2: Implicit Conversion

Case 3: Large Pagination

Case 4: IN + ORDER BY

Case 5: Range Query Blocks Subsequent Columns from Using Index

Case 6: Not Equal, Not Contains Cannot Use Index Fast Search (Can Use ICP)

Case 7: Optimizer Chooses Not to Use the Index

Case 8: Complex Queries

Case 9: Mixing ASC and DESC

Case 10: Large Data Volume

Related Reading

mysql 8.0.28 x64 for Windows ultra-compact portable version, only 16MB

19 MySQL Optimizations Commonly Used in Projects

No more LIKE+% for MySQL fuzzy queries

Running MySQL in Docker? Congratulations, you're about to be laid off!