
General Steps for SQL Optimization
1 Locate SQL statements with low execution efficiency via slow query log, etc.
2 Use explain to analyze the SQL execution plan
Focus on type, rows, filtered, and extra.
type from top to bottom, efficiency increases
- ALL full table scan
- index full index scan
- range index range scan, commonly used with <, <=, >=, between, in, etc.
- ref non-unique index scan or unique index prefix scan, returns single row, often appears in join queries
- eq_ref similar to ref, but uses a unique index, used in join queries with primary key
- const/system single row, the system treats other columns of the matching row as constants, e.g., primary key or unique index query
- null MySQL does not access any table or index, returns result directly
Although efficiency increases from top to bottom, according to the cost model, suppose there are two indexes idx1(a, b, c) and idx2(a, c), and the SQL is select * from t where a = 1 and b in (1, 2) order by c;. If it goes through idx1, the type is range; if it goes through idx2, the type is ref. When the number of rows to be scanned using idx2 is about 5 times more than using idx1, idx1 will be used; otherwise, idx2 will be used.
Extra
- Using filesort: MySQL needs an extra pass to find out how to retrieve rows in sorted order. It sorts by browsing all rows according to the join type and saving the sort key and row pointer for all rows that match the WHERE clause. Then the keys are sorted, and rows are retrieved in sorted order.
- Using temporary: a temporary table is used to save intermediate results, performance is particularly poor, needs key optimization.
- Using index: indicates that a covering index is used in the corresponding SELECT operation, avoiding accessing the data rows of the table, good efficiency! If "Using where" also appears, it means the index lookup alone cannot query the data that meets the conditions.
- Using index condition: ICP (Index Condition Pushdown) added after MySQL 5.6, "Using index condition" means ICP is used, filtering data at the storage engine layer instead of the server layer, using existing index data to reduce table lookups.
3 Use show profile for analysis
Understand the state and time consumption of SQL execution threads.
It is disabled by default; enable it with set profiling = 1;
SHOW PROFILES ;
SHOW PROFILE FOR QUERY #{id};
4 trace
The trace analyzes how the optimizer chooses the execution plan, and by looking at the trace file, we can further understand why the optimizer chose execution plan A over plan B.
set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1000000;
select * from information_schema.optimizer_trace;
5 Identify the problem and take appropriate measures
- Optimize indexes
- Optimize SQL statements: modify SQL, segment IN queries, segment time queries, filter based on previous data
- Use alternative implementations: ES, data warehouse, etc.
- Data fragmentation handling
Based on the microservice philosophy, a project practice built in a B2C e-commerce scenario. The core technology stack is Spring Boot + Dubbo. In the future, it will be refactored to Spring Cloud Alibaba.
Project address: https://github.com/YunaiV/onemall
Scenario Analysis
Case 1: Leftmost Prefix Rule
Index:
KEY `idx_shopid_orderno` (`shop_id`,`order_no`)
SQL statement:
select * from _t where orderno=''
Query matching proceeds from left to right. To use order_no to go through the index, the query condition must include shop_id, or the order of (shop_id, order_no) in the index must be swapped.
Case 2: Implicit Conversion
Index:
KEY `idx_mobile` (`mobile`)
SQL statement:
select * from _user where mobile=12345678901
Implicit conversion is equivalent to performing an operation on the index, causing the index to become invalid. mobile is a character type, but a number is used. String matching should be used; otherwise, MySQL will perform an implicit conversion, leading to index invalidation.
Case 3: Large Pagination
Index:
KEY `idx_a_b_c` (`a`, `b`, `c`)
SQL statement:
select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;
For large pagination scenarios, it is preferable to let the product team optimize the requirements. If no optimization is performed, there are two optimization methods:
- Pass the last row's data from the previous page, i.e., the
cvalue above, and then do ac < xxxcondition. However, this usually requires changing the interface protocol and may not be feasible. - Another method is to use a deferred join to reduce table lookups. However, the index must be a covering index for it to be effective. The modified SQL is as follows:
SELECT
t1.*
FROM
_t t1,
( SELECT id FROM _t WHERE a = 1 AND b = 2 ORDER BY c DESC LIMIT 10000, 10 ) t2
WHERE
t1.id = t2.id;
Case 4: IN + ORDER BY
Index:
KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)
SQL statement:
SELECT
*
FROM
_order
WHERE
shop_id = 1
AND order_status IN ( 1, 2, 3 )
ORDER BY
created_at DESC
LIMIT 10
In the MySQL internals, an IN query uses an n*m search method, similar to UNION, but more efficient than UNION.
When calculating the cost for an IN query (cost = number of tuples * average IO), MySQL retrieves the number of tuples by querying each value in the IN list one by one. Therefore, this calculation process can be slow. MySQL sets a threshold (eq_range_index_dive_limit). After version 5.6, if this threshold is exceeded, the cost for that column is no longer calculated.
This can lead to inaccurate execution plan selection. The default is 200; if the IN condition contains more than 200 values, the cost calculation for the IN clause may be problematic, potentially causing MySQL to choose an inaccurate index.
A solution is to swap the order of (order_status, created_at) and adjust the SQL to use a deferred join.
Case 5: Range Query Blocks Subsequent Columns from Using Index
Index:
KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)
SQL statement:
SELECT
*
FROM
_order
WHERE
shop_id = 1
AND created_at > '2021-01-01 00:00:00'
AND order_status = 10
Range queries also include "IN" and "between".
Case 6: Not Equal, Not Contains Cannot Use Index Fast Search (Can Use ICP)
select * from _order where shop_id=1 and order_status not in (1,2)
select * from _order where shop_id=1 and order_status != 1
Avoid using NOT, !=, <>, !<, !>, NOT EXISTS, NOT IN, NOT LIKE, etc., on indexed columns.
Case 7: Optimizer Chooses Not to Use the Index
If the amount of data to be accessed is very small, the optimizer will still choose a secondary index. However, if the accessed data constitutes a large portion of the entire table (generally around 20%), the optimizer will choose to look up data through the clustered index.
select * from _order where order_status = 1
This query retrieves all unpaid orders. Usually, such orders are very few, and even if an index is built, it cannot be used.
Case 8: Complex Queries
select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';
select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;
If it is for statistical data, consider switching to a data warehouse solution.
If the business requires such complex queries, it may be advisable not to continue using SQL and instead adopt other methods, such as using ES.
Case 9: Mixing ASC and DESC
select * from _t where a=1 order by b desc, c asc
Mixing desc and asc can cause index invalidation.
Case 10: Large Data Volume
For push business data storage, the data volume may be very large. If the chosen solution ultimately stores data in MySQL and retains it for a validity period such as 7 days, be aware that frequent data cleanup can cause data fragmentation. Contact the DBA for data fragmentation handling.