1. preface
这里先和大家介绍一下.NET 一些发布的历史,以前的.NET 框架原生并不支持最终编译结果的单文件发布(需要依赖第三方工具),我这里新建了一个简单的 ASP.NET Core 项目,发布以后的目录就会像下图这样,里面包含很多*.dll文件和其他各类的文件。

在.NET Core 2.1 时代,引入了单文件发布的功能,只需要在发布命令上,增加-p:PublishSingleFile=true参数就可以使用,从这以后就无需发布的文件夹就再也没有那么多的文件,只有一个*.exe文件和对应的配置文件和用于调试*.pdb的文件,如下所示:

However, at this time. NET still needs to install a. NET Runtime with a size of about 50 - 130MB to run. This is actually not conducive to the distribution of programs in the client scenario. You should be able to recall that before installing some software, you must install the. NET Framework scenario.

在单文件发布推出的同时,也可以通过--self-contained true的参数,将运行时也包含在发布文件内,这样的话就无需在目标机器上再安装.NET Runtime。不过由于它自带运行时,整个发布文件夹的大小就变得很大了,可以说比安装.NET Runtime 还要大一些(足足 82.4MB)。

程序本质上也就是文件,我们也可以通过压缩程序的方式,让它的大小变小,只需要加上-p:EnableCompressionInSingleFile=true参数。就可以将 80MB 的程序压缩至 44MB 左右。

单文件发布体积大的原因就是包括了所有运行可能用到的依赖,不过有很多依赖是我们程序中用不到的,所以发布的时候可以加-p:PublishTrimmed=true参数,发布的时候移除掉没有使用的依赖,这样体积就可以降低很多(从 44MB 到 35MB)。

Of course, removing unused dependencies and compression can be used at the same time, so that after release, the size can be smaller, only about 20MB.

此时.NET 运行还是需要自带运行时,在运行.NET 程序的时候需要 JIT 来参与,这样的话在应用启动时需要一定的时间让 JIT 将 MSIL 编译到对应平台机器码,随后.NET 推出了预览版的Native-AOT,可以在编译时直接将代码编译成对应平台的机器码,以加快启动速度;另外由于不需要自带运行时,它整个的体积大小也变得很小。

用于调试的pdb文件就会变得很大,不过真实发布的话也用不到这个文件,可以舍弃。AOT 以后的大小也就 20MB 左右。不过 AOT 也不是银弹,由于没有了 JIT,很多编译时优化就不能做了,Java 的 GraalVm 发布的时候就有一张五边形图,充分的说明了 JIT 和 AOT 之间的取舍。

AOT has faster boot speeds, lower memory footprint, and smaller program size; of course, its throughput and maximum latency do not perform as well (in addition, it will lose many dynamic features and reduce some programming efficiency).
I have a question in my mind, will this release method have an impact on the performance of the program? It is said that AOT will make program startup faster, so how much faster will it become?
2. evaluation results
I decided to spend some time to study it. I designed a set of tests over the weekend with the above questions. Of course, there are many loose points in the rush of time. It can be said that I am just happy and hope everyone points out and accepts it. A total of 12 groups were designed, mainly to compare the differences between single file release, AOT release and ordinary release; in addition, I also added JIT parameters such as PGO, TC, OSR, and OSA to see the impact of different JIT parameters.
PGO:PGO 即 Profile Guided Optimization(配置引导优化),通过收集运行时信息来指导 JIT 如何优化代码,相比以前没有 PGO 时可以做更多以前难以完成的优化。可以参考 hez 大佬的博客,还有一些链接 1、链接 2、链接 3.
TC:TC 即 Tiered Compilation(分层编译),是一种运行时优化代码的技术,每个 C#函数都会由 JIT 编译成目标平台的机器码,为了让方法能快点运行,JIT 一般会很粗犷(并不是最优,生成代码效率比较低)的编译,所以 JIT 就引入了 TC,当某一个方法频繁被调用时,JIT 就会为它编译一份更优的代码,这样下一次方法被调用时,它执行的会更有效率。想了解更多关于.NET 分层编译可以戳这个链接。
OSR:OSR 即 On-Stack Replacement(栈上替换),OSR 是一种在运行时替换正在运行的函数/方法的栈帧的技术。这个是为了分层编译引入的,因为有时候我们运行的方法是一个
while(ture)这种死循环方法,分层编译找不到时机能把低优化的代码替换成高优化的代码,所以引入了栈上替换,在方法运行中就可以替换成更优的方法。链接 1、链接 2。
OSR:OSA 即 Object Stack Allocation (对象栈上分配),在.NET 中的引用对象默认是分配在堆上的,回收时需要垃圾回收器介入,而且分配对象时必须初始化内存(全部初始化为 0),如果对象的生命周期可控,那么可以将它分配在栈上。这样做的好处就是能降低 GC 压力(方法栈结束,对象自动释放了),提升性能(可以进行标量替换,访问更快)。链接 1。
The names and parameters for each group are as follows.
| project | remarks |
|---|---|
| Normal | Normal release, control group |
| Normal-WksGC | Normal way, use WorkStationGC |
| Normal_PGO | Release normally, use PGO |
| Normal_PGO_OSR | Normal release, use OSR |
| Normal_PGO_OSR_OSA | Normal release, using PGO+OSR+OSA |
| SingleFilePublish | Ordinary single document release |
| SingleFilePublish-SelfContained | Contains runtime single file publishing |
| SingleFilePublish-SelfContained-Trim | Contains runtime single file publishing + trimming assemblies |
| SingleFilePublish-SelfContained-Compress | Contains runtime single file publishing + compressed assemblies |
| SingleFilePublish-SelfContained-Trim-Compress | Contains runtime single file publishing + trimming + compressed assemblies |
| AOT-Size | AOT compilation, using Size mode |
| AOT-Speed | AOT compilation, using Speed mode |
The subtitle below is the method of evaluating the items and the results of the evaluation. We will run each item 5 times and finally take the average value.
2.1 release relevant
In this section, the compilation parameters of Normal are all the same, so there is almost no difference in the results. There is no need to pay too much attention, just ignore them.
2.1.1 Release time consuming
发布耗时这个参数,是记录了dotnet publish的耗时,其中会清理/bin、/obj等文件夹,避免缓存带来的影响。

It can be seen that single-file publishing and AOT publishing are quite performance-intensive, especially in the AOT scenario, the release time of a simple ASPNET Core project reaches nearly 30 seconds, which is comparable to the compilation speed of some Rust and C++ projects. If it is larger, it is expected to be longer. However, normal release is still very fast and will not be completed in a second or two.
2.1.2 Directory size
The directory size is a direct calculation of the hard disk space occupied by the directory after release. Note: Normal releases calculate the space occupied by 67.5MB of. NET Runtime.

为什么 AOT 的目录大小会这么大呢?主要就是上文中提到的用于调试程序的pdb文件变的很大,这是因为 AOT 以后程序本身缺失很多用于调试的数据,只能存放在pdb文件中,不过这个对于使用没有什么影响,发布时也可以通过-p:DebugType=false和-p:DebugSymbols=false参数让它不生成pdb文件。
2.1.3 Program Size
Program size statistics only include the size of the program that needs to be run in the release file. This is closely related to the distribution project. The smaller the program size, the easier it is to distribute. Note: Normal releases calculate the space occupied by 67.5MB of. NET Runtime.

If the target platform is pre-installed with. NET Runtime, the efficiency of normal release is actually the highest, with a size of only over 100 KB. The second is single file release + self-contained runtime + clipping + compression, with a size of only about 20 MB, which is also relatively easy to distribute. AOT's performance is equally outstanding.
2.2 Program running-related
There are a total of three indicators related to program operation, namely startup time, application startup time, and memory usage. No CPU-related indicators are set here because the CPU of the startup program is basically 0, which has little reference significance. The flow chart below shows the collection time of these indicators.

2.2.1 Startup time consuming
The results of the program's startup time are as follows.

We can see two extremes. The largest single file + self-contained runtime + compression startup takes up to 170 ms. Because there is no clipping assembly and there is a lot of dependency on decompression, the startup time will be longer. The smallest AOT-Speed mode only takes 16.8ms to start the program. It seems that without the JIT compilation and assembly loading process, it is much faster.
2.2.2 Application startup time

The application startup time and program startup time are basically the same. For example, the single file + self-contained runtime + compressed startup time takes 0.5s+ to start the program, while the AOT mode only takes 70ms, which is seven or eight times the difference. However, the normal release and startup speed is also very fast, taking less than 200ms.
2.2.3 Memory usage

There is little difference between memory usage methods, but it also reminds us that if you want to make memory usage smaller, you can use WorkstationGC mode. After introducing JIT enhancements such as dynamic PGO, it will consume more memory accordingly.
2.3 performance pressure measurement
Machine configuration:
CPU: I7 8750H turns off hyperthreading
RAM:48GB
Client: Set CPU affinity and bind 3 cores
Server: Set CPU affinity and bind 2 cores
由于笔者机器配置有限,没有做Client和Server的环境隔离,只做了简单的 CPU 绑核,所以的出来的数据仅供参考。
2.3.1 Pressure measurement QPS

可以看到其实各个方式差别不是很大,都取得了4.7Wqps以上的成绩,最大和最小在 4%以内。由于这是 IO 密集型任务,JIT、PGO 的优势没有体现出来,后面可以试试一些计算密集型的任务,或者直接看 hez 的博客,上文介绍 PGO 中有链接。
2.3.2 Time consuming a single request
下图中在条形图内较大的是单次请求耗时(MAX),在条形图外的0.x的数据是单次请求耗时(AVG)。单位是ms.

我们发现平均耗时基本在0.3ms左右,AOT 和单文件+自包含运行时+剪裁+压缩的表现很亮眼,只有370ms左右。
2.3.3 Pressure measurement memory usage
下图中深色代表内存占用(MAX)而浅色代表内存占用(AVG),单位是MB.

可以看到除了 AOT 以外的方式,内存占用是大差不差的,4.7Wqps下只需要25MB左右的内存其实很不错了,近似的数字可以理解为误差;另外开启了 JIT 特性以后,就需要占用更多的内存。AOT 的话内存占用就比较多了,可能 GC 算法在 AOT 环境下的优化还不够。
2.3.4 Pressure measurement CPU occupancy
下图中深色代表CPU占用(MAX)而浅色代表CPU占用(AVG)。单位为百分比;1 个 CPU 核心是100%,如果占用 5 个 CPU 核心那么就是500%。

There is basically no difference, but the occupancy rate of the AOT method is much smaller. After all, there is no JIT step.
3. summary
这个结论也就是图一乐,毕竟目前 AOT 还没有正式发布(已经合并主分支.NET7 会正式发布),还有很多值得优化的地方。另外像 OSR、OSA 这些特性也还没有完全定下来,下面是一些和对照组比较的百分比数据,原始数据和测试代码见GitHub。后续.NET7 正式发布了,再跑一下试试。


To answer the question raised at the beginning, generally speaking, AOT plays a great role in reducing the size of software and improving application startup speed, but it currently requires a long release time and takes up more memory.
In addition, some JIT features such as PGO require more memory than normal, and their performance advantages are not well demonstrated in this IO-intensive scenario.
Finally, I would like to say a few more words. I always think that C#is a good language and. NET is a good platform. Since 2002, this year is the 20th year of. NET. Various new features have been added one after another, and performance has already stood at the top of the echelon. I hope there will be more development in the future.
PS:在前几天更新的 Benchmarks Game 数据里面,C# .NET 已经是带 JIT 语言里面跑的最快的了,仅次于 C、C++、Rust 等编译型语言,详情可见链接 1、链接 2。

Original author: InCerry
Original title: Impact of single file release on program performance
Original link: www.cnblogs.com/InCerry/p/Single-File-And-AOT-Publish.html