
General steps for SQL optimization
1 Locate SQL statements that are inefficient by slowly checking logs
2 explain Analyze the execution plan of SQL
You need to focus on type, rows, filtered, and extra.
type From top to bottom, efficiency is getting higher and higher
- ALL Full Table Scan
- index full scan
- range Index range scanning, common words <,<=,>=,between,in and other operations
- ref uses a non-unique index scan or a unique index prefix scan to return a single record, which is often found in associated queries
- eq_ref is similar to ref, with the difference that it uses a unique index and uses an associated query of the primary key
- const/system For a single record, the system treats other columns in the matching row as constants, such as a primary key or unique index query
- null MySQL does not access any tables or indexes and returns results directly
虽然上至下,效率越来越高,但是根据 cost 模型,假设有两个索引idx1(a, b, c),idx2(a, c),SQL 为select * from t where a = 1 and b in (1, 2) order by c;如果走 idx1,那么是 type 为 range,如果走 idx2,那么 type 是 ref;当需要扫描的行数,使用 idx2 大约是 idx1 的 5 倍以上时,会用 idx1,否则会用 idx2
Extra
- Using filesort: MySQL requires an extra pass to find out how to retrieve rows in sort order. Sort is done by browsing all rows based on the join type and saving the sort key and pointer to the row for all rows that match the WHERE clause. The keywords are then sorted and rows are retrieved in sorted order.
- Using temporary: A temporary table is used to store intermediate results, which has extremely poor performance and requires key optimization
- Using index: It means that the corresponding select operation uses the Coveing Index to avoid accessing the data rows of the table. It is efficient! If using where appears at the same time, it means that eligible data cannot be directly queried through index lookup.
- Using index condition: The ICP added after MySQL 5.6 uses ICP (Index Push Down) to filter data at the storage engine level rather than at the service level, and uses index existing data to reduce the data returned to the table.
3 show profile analysis
Understand the status and time consumed by SQL execution threads.
It is closed by default, and the statement "set profiling = 1;" is turned on
SHOW PROFILES ;
SHOW PROFILE FOR QUERY #{id};
4 trace
Trace analyzes how the optimizer selects the execution plan. Through the trace file, you can further understand why the coupon selects the execution plan A rather than the execution plan B.
set optimizer_trace="enabled=on";
set optimizer_trace_max_mem_size=1000000;
select * from information_schema.optimizer_trace;
5 Identify problems and adopt corresponding measures
- optimizing index
- Optimize SQL statements: modify SQL, IN query segmentation, time query segmentation, filtering based on previous data
- Switch to other implementation methods: ES, digital warehouse, etc.
- Data fragmentation
Based on the idea of microservices, we build actual project operations under the B2C e-commerce scenario. The core technology stack is Spring Boot + Dubbo. In the future, Spring Cloud Alibaba will be rebuilt.
Project address: github.com/YunaiV/onemall
scene analysis
Case 1. Leftmost match
index
KEY `idx_shopid_orderno` (`shop_id`,`order_no`)
SQL statement
select * from _t where orderno=''
查询匹配从左往右匹配,要使用 order_no 走索引,必须查询条件携带 shop_id 或者索引(shop_id,order_no)调换前后顺序。
Case 2. Implicit conversion
index
KEY `idx_mobile` (`mobile`)
SQL statement
select * from _user where mobile=12345678901
Implicit conversion is equivalent to doing an operation on an index, which will invalidate the index. Mobile is a character type and uses numbers. String matching should be used, otherwise MySQL will use implicit substitutions, which will cause the index to be invalid.
Case 3. Big paging
index
KEY `idx_a_b_c` (`a`, `b`, `c`)
SQL statement
select * from _t where a = 1 and b = 2 order by c desc limit 10000, 10;
For large paging scenarios, you can give priority to product optimization requirements. If there is no optimization, there are two optimization methods:
- One is to transfer the last piece of data from the previous time, that is, the c above, and then do "c < xxx" processing, but this generally requires changing the interface protocol and may not be feasible.
- The other is to use delayed association to reduce SQL table returns, but remember that the index needs to be completely overwritten to be effective. SQL changes are as follows
SELECT
t1.*
FROM
_t t1,
( SELECT id FROM _t WHERE a = 1 AND b = 2 ORDER BY c DESC LIMIT 10000, 10 ) t2
WHERE
t1.id = t2.id;
Case 4. in + order by
index
KEY `idx_shopid_status_created` (`shop_id`, `order_status`, `created_at`)
SQL statement
SELECT
*
FROM
_order
WHERE
shop_id = 1
AND order_status IN ( 1, 2, 3 )
ORDER BY
created_at DESC
LIMIT 10
in 查询在 MySQL 底层是通过n*m的方式去搜索,类似 union,但是效率比 union 高。
in 查询在进行 cost 代价计算时(代价 = 元组数 * IO平均值),是通过将 in 包含的数值,一条条去查询获取元组数的,因此这个计算过程会比较的慢,所以 MySQL 设置了个临界值(eq_range_index_dive_limit),5.6 之后超过这个临界值后该列的 cost 就不参与计算了。
This will lead to inaccurate execution plan selection. The default is 200, which means that the in condition exceeds 200 data, which will cause problems in the calculation of the cost of in and may cause the index selected by Mysql to be inaccurate.
处理方式,可以(order_status, created_at)互换前后顺序,并且调整 SQL 为延迟关联。
Case 5. Range query is blocked, and subsequent fields cannot be indexed
index
KEY `idx_shopid_created_status` (`shop_id`, `created_at`, `order_status`)
SQL statement
SELECT
*
FROM
_order
WHERE
shop_id = 1
AND created_at > '2021-01-01 00:00:00'
AND order_status = 10
Range query also includes "IN, between"
Case 6. Does not equal or include quick searches that cannot be indexed. (ICP can be used)
select * from _order where shop_id=1 and order_status not in (1,2)
select * from _order where shop_id=1 and order_status != 1
Avoid using NOT,!=, <>、! <、!>、NOT EXISTS, NOT IN, NOT LIKE, etc.
Case 7. The optimizer chooses not to use index
If the amount of data required to be accessed is small, the optimizer will still choose the secondary index. However, when the data accessed accounts for a large part of the data in the entire table (usually about 20%), the optimizer will choose to use the clustered index to find the data.
select * from _order where order_status = 1
Check out all unpaid orders. Generally, there are very few such orders, and even if an index is established, the index cannot be used.
Case 8. Complex query
select sum(amt) from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01';
select * from _t where a = 1 and b in (1, 2, 3) and c > '2020-01-01' limit 10;
If you are counting certain data, you may use several warehouses to solve it;
If there are such complex queries in the business, it may not be recommended to continue using SQL, but to solve them in other ways, such as using ES.
Case 9. Mixing asc and desc
select * from _t where a=1 order by b desc, c asc
desc 和asc混用时会导致索引失效
Case 10. Big Data
For the data storage of push services, the amount of data may be large. If you choose the solution, you will finally choose to store it on MySQL and save it for a valid period of 7 days.
Then you need to note that frequent cleaning of data will turn into data fragments, and you need to contact the DBA to process the data fragments.