Parallel query design of tree table

Parallel query design of tree table

Discussion of tree table design

最后更新 5/24/2022 6:53 PM
长空X
预计阅读 9 分钟
分类
share
标签
architecture design

本文由网友长空X投稿,欢迎转载、分享

Original author: Changkong X (CSDN has the same name as "Changkong X", author of CkTools, github: https://github.com/hjkl950217)

Original link: https://www.example.com

causes

今天在和懒得勤快聊天时谈到了树形表的处理时,发现目前我俩知道的查树形表都得递归查询,这种方式查询效率是非常底下且不好维护的,那么有没有一种又简单能平行查询的方式呢?后面我俩还真讨论了一种,他快速的修改到他的网站中了。

懒得勤快官网

statement

文章中的几个方案是我们的讨论结果和一部分网络资料总结。设计方式千万种,文章中介绍的设计方式是针对大部分需要树形表的情况而不代表最优解!最优解已经是集合设计方式人员水平业务情况等因素综合之后的方案,这篇分享只是加速找到你的最优解

What is a tree table?

关系型数据库表中,存放树形结构的表。例如某个字段需要选择分类,有一级、二级、...N 级,可以这样设计:

ID PID Name or content
1 comment 1
2 1 comment 2
3 1 remarks 3
4 3 comment 4

这样的数据可以组合成我们大学数据结构中的,用来表达层级关系。这里的Id一般情况下用数字最好,但也有不是数字的情况,这点对选择方案可能有影响,后面会提到这一点。 The entity definition of this data structure is generally as follows:

class CommentEntity
{
	public int ID {get;set;}
	public int PID {get;set;}
	//.. 若干数据字段
	public CommentEntity ParentNode {get;set;}
	public List<CommentEntity> ChildNode {get;set;}
}

实体定义ParentNode指向父节点,ChildNode指向若干子节点。如果你有数据结构中的链表知识,能看出这 2 个字段起指针域的作用。

数据在数据库中按存储,如果我们将数据获取出来后组装好ParentNodeChildNode中的指向,然后就能按你的实际业务情况使用了。

What's the use?

Everything with affiliation can be stored in this way, such as: authority relationship, classification, type, level division, administrative division, comments, etc...

但他麻烦之处在于查询不方便。比如想要查询一级分类下面的所有数据,按传统方式需要先查到id=1的一级分类,再查询PID=1的数据,再查询PID=刚才查询的数据ID 这样递归查询多次直到结束

Target

Let's take comments as an example

Needs to meet:

  1. 进页面时分页查询出主评论,然后按层次关系显示回评
  2. You can query all subordinates 'comments based on a certain comment
  3. Parallel queries rather than recursive queries
  4. Each comment data can be a main judge or a sub-comment

Scenario 1: Use tag tree

这个方案是添加一个字段tag来标记整颗树,结构如下:

ID PID Tag content
1 Article Id 1 comment 1
2 1 Article Id 1 comment 2
3 1 Article Id 1 remarks 3
4 3 Article Id 1 comment 4

Tag用于数据库查询,ID和PID用于内存中组装数据,同时对Tag这一列建立非聚集索引。

    • Inquiry method **:

这里新增的字段在每课树中都是一样的,最多查询 2 次数据库即可,然后自己在内存中用Pid重新排列引用关系,修剪掉不需要的数据。

The first query: Use the comment id to find the article id (if there is an article id, go to the second step)

Second query: Use article id to query all data

Paged query: prunes unnecessary data in memory after query

    • This design is based on these considerations **:
  1. Id 是数字的情况下,连续的数据大概率在磁盘上是连续存储,这能提高磁盘 IO 的效率。如果 Id 不是数字,用文章Id创建非聚集索引后也能快速查询。
  2. 在内存中组装引用关系是非常快的,而且不需要递归就能搞定.(遍历时用 PID 去查找,找到后直接向ChildNode添加,同时向ParentNode赋值)
  3. The design logic is simple and people above the intern level can easily maintain this code
    • Disadvantages **: If a comment tree has 1000 layers, it will undoubtedly obtain a huge amount of useless data

Improvement: Use level to mark levels

Add level fields:

ID PID tag level content
1 Article Id 1 1 comment 1
2 1 Article Id 1 2 comment 2
3 1 Article Id 1 2 remarks 3
4 3 Article Id 1 3 comment 4

查询时附加上level,能减少一部分无用数据的传输,最后复用上面的组装代码。

Scenario 2: Use path to mark dependent paths

借用网上的一张图直接说明思路(未找到出处,侵权删除):

Combine the transformation mentioned above:

ID PID Tag Path content
1 Article Id 1 comment 1
2 1 Article Id 1 1 comment 2
3 1 Article Id 1 1 remarks 3
4 3 Article Id 1 1,2 comment 4

在写入子节点时需要知道父节点的 path,但一般来说这点是能满足的。Tag和Path用于数据库查询,ID和PID用于内存中组装数据。

    • Inquiry method **:

查询全部: 仍文章 id 查询所有数据,然后在内存中用Pid组装

Query data with id 2 and the following:

First query: Query path with id = 2

Second query: query id = 2 or startwith $",2 "

Paginated query:

先用文章 id 按时间排序后查询前 X 个,然后进行第 2 次查询获取楼中楼的数据,第 2 次查询时可以拼多个 startwith

同时也建议按需冗余level字段以减少查询,path 中虽然隐含了级别数据,但在查询时并不友好。

    • This design is based on these considerations **:
  1. Similar to Option 1, and understanding that the cost is lower
    • Disadvantages **: Not a special shortcoming. When querying child node data using path filtering, the index cannot be utilized.

Option 3: Do not design the building in the building

Learn from Zhihu's design and understand the series at a glance:

Zhihu's structure only includes comments and feedback reviews, and feedback reviews only need to save the id of the previous comment. This method is not only simple in design, but also has an excellent reading experience (it's not that it's unsightly if the building is deep)

ID PID GroupID Tag content
1 1 Article Id 1 comment 1
2 1 1 Article Id 1 comment 2
3 1 1 Article Id 1 remarks 3
4 3 1 Article Id 1 comment 4
5 2 * * Article Id 1 ** * * Comment 5 **
    • Inquiry method **:

查询全部: 仍文章 id 查询所有PID is null的数据,然后在内存中用PID组装

查询 id 为 1 及下面的数据: 查询 GroupID = 1的数据。这种设计时不会单独查询回评的数据

Pros: Very low understanding costs and low storage pressure

Option 4: Using recursion

Didn't we say before that we don't use recursion? Why is it mentioned here? Because:

  1. Some people in some teams will stubbornly believe that the database should not return additional data or add redundant nodes
  2. MySQL 8.0 added RECURSIVE to implement recursion at the database level
  3. Other helplessness

So if none of the previous three solutions is suitable for your situation, you may have to go back to the recursion route. I won't mention the details here. There are many such articles on the Internet.

summary

方案 123 都是通过冗余字段来降低查询成本和理解成本,并且利用不同存储的特性(数据库不适合运算、内存适合快速读写)来实现目标

方案 3 也是,同时也通过分析优化业务实现技术成本客户体验的共赢。

Option 4 is a bottom-up plan.

我个人比较推崇level+path的组合,这个组合不光能处理评论,也能很好的处理其他的树形结构,毕竟开发人员不能总是有机会影响业务需求不是?

If you have a better plan, please leave a message and discuss it~

Keep Exploring

延伸阅读

更多文章