Optimization of binary serialization of C#objects: bit-field technology achieves extreme compression

Optimization of binary serialization of C#objects: bit-field technology achieves extreme compression

Demonstrates how to convert C#objects to binary form and optimize them to reduce packet size during network transmission.

最后更新 1/22/2024 12:33 AM
沙漠尽头的狼
预计阅读 19 分钟
分类
.NET
标签
.NET C# binary

1. introduction

In operating systems, process information is crucial for system monitoring and performance analysis. Suppose we need to develop a monitoring program that can capture process information from the current operating system and efficiently transfer it to other terminals (such as servers or monitors). In this process, how to convert the captured process objects into binary data and optimize them to reduce the size of the data packet has become a key issue. This article will use step-by-step analysis to explore how to use bit field technology to optimize binary serialization of C#objects.

操作系统进程信息

First, we give an example of field definition for a process object. In order to transmit this object over the network (TCP/UDP), we need to convert it to binary format. In this process, how to achieve the minimum packet size is a challenge.

field name description example
PID process ID 10565
Name process name Mafang
Publisher publisher Wolf at the end of the desert
CommandLine command line dotnet CodeWF.Tools.dll
CPU CPU (total processing utilization of all cores) 2.3%
Memory Memory (physical memory occupied by the process) 0.1%
Disk Disk (total utilization of all physical drives) 0.1 MB/SEC
Network Network (current network utilization on major networks 0 Mbps
GPU GPU(highest utilization of all GPU engines) 2.2%
GPUEngine GPU engine GPU 0 - 3D
PowerUsage Power usage (impact of CPU, disk and GPU on power consumption) low
PowerUsageTrend Power usage trends (impact of CPU, disk, and GPU on power consumption over time) very low
Type process type application
Status process state efficiency mode

2. optimization process

2.1. Process object definition and preliminary analysis

We determined the data type for each field based on its example values.

field name data type description example
PID int process ID 10565
Name string? process name Mafang
Publisher string? publisher Wolf at the end of the desert
CommandLine string? command line dotnet CodeWF.Tools.dll
CPU string? CPU (total processing utilization of all cores) 2.3%
Memory string? Memory (physical memory occupied by the process) 0.1%
Disk string? Disk (total utilization of all physical drives) 0.1 MB/SEC
Network string? Network (current network utilization on major networks 0 Mbps
GPU string? GPU(highest utilization of all GPU engines) 2.2%
GPUEngine string? GPU engine GPU 0 - 3D
PowerUsage string? Power usage (impact of CPU, disk and GPU on power consumption) low
PowerUsageTrend string? Power usage trends (impact of CPU, disk, and GPU on power consumption over time) very low
Type string? process type application
Status string? process state efficiency mode

创建一个 C#类SystemProcess表示进程信息:

public class SystemProcess
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public string? CPU { get; set; }
    public string? Memory { get; set; }
    public string? Disk { get; set; }
    public string? Network { get; set; }
    public string? GPU { get; set; }
    public string? GPUEngine { get; set; }
    public string? PowerUsage { get; set; }
    public string? PowerUsageTrend { get; set; }
    public string? Type { get; set; }
    public string? Status { get; set; }
}

Define test data

private SystemProcess _codeWFObject = new SystemProcess()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = "2.3%",
    Memory = "0.1%",
    Disk = "0.1 MB/秒",
    Network = "0 Mbps",
    GPU = "2.2%",
    GPUEngine = "GPU 0 - 3D",
    PowerUsage = "低",
    PowerUsageTrend = "非常低",
    Type = "应用",
    Status = "效率模式"
};

2.2. Exclusion of Json serialization

将对象转为Json字段串,这在 Web 开发是最常见的,因为简洁,前后端都方便处理:

public class SysteProcessUnitTest
{
    private readonly ITestOutputHelper _testOutputHelper;

    private SystemProcess _codeWFObject // 前面已给出定义,这里省

    public SysteProcessUnitTest(ITestOutputHelper testOutputHelper)
    {
        _testOutputHelper = testOutputHelper;
    }

    /// <summary>
    /// Json序列化大小测试
    /// </summary>
    [Fact]
    public void Test_SerializeJsonData_Success()
    {
        var jsonData = JsonSerializer.Serialize(_codeWFObject);
        _testOutputHelper.WriteLine($"Json长度:{jsonData.Length}");

        var jsonDataBytes = Encoding.UTF8.GetBytes(jsonData);
        _testOutputHelper.WriteLine($"json二进制长度:{jsonDataBytes.Length}");
    }
}
标准输出: 
Json长度:366
json二进制长度:366

Although Json serialization is very popular in Web development because it is concise and easy to handle, in TCP/UDP network transmission, Json serialization can cause unnecessary packet size increases (redundant field name declarations). Therefore, we ruled out Json serialization and looked for other, more efficient methods of binary serialization.

{
  "PID": 10565,
  "Name": "\u7801\u754C\u5DE5\u574A",
  "Publisher": "\u6C99\u6F20\u5C3D\u5934\u7684\u72FC",
  "CommandLine": "dotnet CodeWF.Tools.dll",
  "CPU": "2.3%",
  "Memory": "0.1%",
  "Disk": "0.1 MB/\u79D2",
  "Network": "0 Mbps",
  "GPU": "2.2%",
  "GPUEngine": "GPU 0 - 3D",
  "PowerUsage": "\u4F4E",
  "PowerUsageTrend": "\u975E\u5E38\u4F4E",
  "Type": "\u5E94\u7528",
  "Status": "\u6548\u7387\u6A21\u5F0F"
}

2.3. Binary serialization using BinaryWriter

使用站长前面一篇文章写的二进制序列化帮助类SerializeHelper转换,该类使用BinaryWriter将对象转换为二进制数据(反序列化使用BinaryReader)。

首先,我们使SystemProcess类实现了一个空接口INetObject,并在类上添加了NetHeadAttribute特性(加上了数据包头部定义,便于多个网络对象反序列化识别,序列化后会多出数个字节,主要是系统 Id、网络对象 Id、对象版本号等序列化辅助字段)。

/// <summary>
/// 网络对象序列化接口
/// </summary>
public interface INetObject
{
}
[NetHead(1, 1)]
public class SystemProcess : INetObject
{
 	// 省略字段定义
}

We then wrote a test method to verify the correctness of serialization and deserialization, and printed the serialized binary data length.

/// <summary>
/// 二进制序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject, 1);
    _testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
}
标准输出: 
序列化后二进制长度:152

It is more than half the size of Json (366 to 152, with a few more fields). The above unit test also tests whether the data is correct after deserialization, so we will continue to optimize on this basis.

2.4. Data type adjustment

In order to further optimize the size of binary data, we adjusted the data type. By analyzing the process data example, we found that the data types of some fields can be represented more compactly. For example, CPU utilization can pass only the numeric component (such as 2.3) without passing a percent sign; process types can pass only enumeration values without passing a personalized string. This adjustment reduces the size of the data packet.

field name data type description example
PID int process ID 10565
Name string? process name Mafang
Publisher string? publisher Wolf at the end of the desert
CommandLine string? command line dotnet CodeWF.Tools.dll
CPU float CPU (total processing utilization of all cores) 2.3
Memory float Memory (physical memory occupied by the process) 0.1
Disk float Disk (total utilization of all physical drives) 0.1
Network float Network (current network utilization on major networks 0
GPU float GPU(highest utilization of all GPU engines) 2.2
GPUEngine byte GPU engine, 0: None, 1: GPU 0 - 3D 1
PowerUsage byte Power usage (impact of CPU, disk and GPU on power consumption), 0: very low, 1: low, 2: medium, 3: high, 4: very high 1
PowerUsageTrend byte Power usage trends (impact of CPU, disk and GPU on power consumption over time), 0: very low, 1: low, 2: medium, 3: high, 4: very high 0
Type byte Process type, 0: application, 1: background process 0
Status byte Process status, 0: Normal operation, 1: Efficiency mode, 2: Suspend 1

Modify test data definitions:

[NetHead(1, 2)]
public class SystemProcess2 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public float CPU { get; set; }
    public float Memory { get; set; }
    public float Disk { get; set; }
    public float Network { get; set; }
    public float GPU { get; set; }
    public byte GPUEngine { get; set; }
    public byte PowerUsage { get; set; }
    public byte PowerUsageTrend { get; set; }
    public byte Type { get; set; }
    public byte Status { get; set; }
}
/// <summary>
/// 普通优化字段数据类型
/// </summary>
private SystemProcess2 _codeWFObject2 = new SystemProcess2()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = 2.3f,
    Memory = 0.1f,
    Disk = 0.1f,
    Network = 0,
    GPU = 2.2f,
    GPUEngine = 1,
    PowerUsage = 1,
    PowerUsageTrend = 0,
    Type = 0,
    Status = 1
};

Add unit tests as follows:

/// <summary>
/// 二进制序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes2_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject2, 1);
    _testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess2>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(2.2f, deserializeObj.GPU);
}

Test results:

标准输出: 
序列化后二进制长度:99

包体积又优化了1/3,由 152 字节减小到 99 字节长度,这是部分字段数据类型由string?调整为floatbyte的成果。

2.5. Again, data type adjustment and bit field optimization

Further, we introduced bit domain technology. Bit fields allow us to more fine-grained control over the layout of fields in memory, further reducing the size of binary data. We redefined the field rules and used bit fields to represent some enumerated value fields. In this way, we can significantly reduce the size of the data packet.

Looking at the comparison between the previous table and the following table, there are mainly two data types adjustments. The rules are as follows:

  • 第一种:部分字段只是一些枚举值,使用的byte表示,即 8 位(bit),其中比如进程类型只有 2 个状态(0:应用,1:后台进程),正好可以用 1 位表示(0、1);像电源使用情况,无非就是 5 个状态,用 3 位可表示全(可表示 6 种状态);

  • The second type: partial float data types. In reality, we only require accuracy to 1 decimal place. The percentage expressed by the number will not exceed 1 (i.e. 100.0%), you can consider rounding, for example, 23.3%, the passed 23.3, multiplied by 10, and passed 233, the maximum will not exceed 1000 (i.e. 100.0, 100%). After another process parses the data and divides it by 10, then the data type of 4 bytes and 32 bits represented by float can be optimized to 10 bits (maximum value 1024).

According to this rule, we redefine the field rules as follows:

field name data type description example
PID int process ID 10565
Name string? process name Mafang
Publisher string? publisher Wolf at the end of the desert
CommandLine string? command line dotnet CodeWF.Tools.dll
Data byte[8] Why are several fields of fixed size 8 bytes long? (Note: deserialization will also define 4 more bytes to represent the byte[] length, so the Data field takes up a total of 12 bytes)?

A detailed description of the fixed field (Data) is as follows:

field name Offset Size description example
CPU 0 10 CPU (total processing utilization of all cores), with the last digit representing the decimal place, e.g. 23 for 2.3% 23
Memory 10 10 Memory (physical memory occupied by the process), the last digit represents the decimal place, such as 1 represents 0.1%, the value can be calculated according to the basic information 1
Disk 20 10 Disk (total utilization of all physical drives), last digit represents decimal place, e.g. 1 represents 0.1%, value can be calculated from basic information 1
Network 30 10 Network (network utilization on current major networks), the last digit represents decimal places, for example, 253 represents 25.3%, and the value can be calculated based on basic information 0
GPU 40 10 GPU(the highest utilization of all GPU engines), the last digit represents decimal places, for example, 253 represents 25.3 22
GPUEngine 50 1 GPU engine, 0: None, 1: GPU 0 - 3D 1
PowerUsage 51 3 Power usage (impact of CPU, disk and GPU on power consumption), 0: very low, 1: low, 2: medium, 3: high, 4: very high 1
PowerUsageTrend 54 3 Power usage trends (impact of CPU, disk and GPU on power consumption over time), 0: very low, 1: low, 2: medium, 3: high, 4: very high 0
Type 57 1 Process type, 0: application, 1: background process 0
Status 58 2 Process status, 0: Normal operation, 1: Efficiency mode, 2: Suspend 1

The above table is a bit field rule table that fixes some example fields. Offset represents the position of the field in the Data byte array (calculated in bits), and Size represents the size of the field in Data (also calculated in bits). For example, the Memory field occupies 10 to 20 bits of space in the Data byte array.

As a result, the 10 fields with a fixed size and original length of 25 bytes are optimized to 8 bytes (the 32 bits of 5 float 4 bytes are optimized to 10 bits, and the 8 bits of a single byte are optimized to 2 bits, 4 bits, and 6 bits, that is, 200 bits (25*8) are optimized to 64 bits (actually 60 bits, since the minimum unit of network transmission is byte, so round up 8 bytes to 64 bits)).

Modify the class definition as follows, pay attention to the comments in the code:

[NetHead(1, 3)]
public class SystemProcess3 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    private byte[]? _data;
    /// <summary>
    /// 序列化,这是实际需要序列化的数据
    /// </summary>
    public byte[]? Data
    {
        get => _data;
        set
        {
            _data = value;

            // 这是关键:在反序列化将byte转换为对象,方便程序中使用(位域操作)
            _processData = _data?.ToFieldObject<SystemProcessData>();
        }
    }

    private SystemProcessData? _processData;

    /// <summary>
    /// 进程数据,添加NetIgnoreMember在序列化会忽略
    /// </summary>
    [NetIgnoreMember]
    public SystemProcessData? ProcessData
    {
        get => _processData;
        set
        {
            _processData = value;

            // 这里关键:将对象转换为byte[](位域序列化操作)
            _data = _processData?.FieldObjectBuffer();
        }
    }
}

public record SystemProcessData
{
    [NetFieldOffset(0, 10)] public short CPU { get; set; }
    [NetFieldOffset(10, 10)] public short Memory { get; set; }
    [NetFieldOffset(20, 10)] public short Disk { get; set; }
    [NetFieldOffset(30, 10)] public short Network { get; set; }
    [NetFieldOffset(40, 10)] public short GPU { get; set; }
    [NetFieldOffset(50, 1)] public byte GPUEngine { get; set; }
    [NetFieldOffset(51, 3)] public byte PowerUsage { get; set; }
    [NetFieldOffset(54, 3)] public byte PowerUsageTrend { get; set; }
    [NetFieldOffset(57, 1)] public byte Type { get; set; }
    [NetFieldOffset(58, 2)] public byte Status { get; set; }
}

Add unit tests as follows:

/// <summary>
/// 极限优化字段数据类型
/// </summary>
private SystemProcess3 _codeWFObject3 = new SystemProcess3()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    ProcessData = new SystemProcessData()
    {
        CPU = 23,
        Memory = 1,
        Disk = 1,
        Network = 0,
        GPU = 22,
        GPUEngine = 1,
        PowerUsage = 1,
        PowerUsageTrend = 0,
        Type = 0,
        Status = 1
    }
};

/// <summary>
/// 二进制极限序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes3_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject3, 1);
    _testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess3>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(23, deserializeObj.ProcessData.CPU);
    Assert.Equal(1, deserializeObj.ProcessData.PowerUsage);
}

Test output:

标准输出: 
序列化后二进制长度:86

99 is optimized to 86 bytes, 13 bytes, which is very impressive in a limited network environment. For example, 1 million data, wouldn't that be 12.4MB? The code for bit field serialization and deserialization will not be detailed here. It is very boring and the webmaster may not be able to explain clearly. The code length is like this:

public partial class SerializeHelper
{
    public static byte[] FieldObjectBuffer<T>(this T obj) where T : class
    {
        var properties = typeof(T).GetProperties();
        var totalSize = 0;

        // 计算总的bit长度
        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            totalSize = Math.Max(totalSize, offsetAttribute.Offset + offsetAttribute.Size);
        }

        var bufferLength = (int)Math.Ceiling((double)totalSize / 8);
        var buffer = new byte[bufferLength];

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = property.GetValue(obj)!; // 使用dynamic类型动态获取属性值
            SetBitValue(ref buffer, value, offsetAttribute.Offset, offsetAttribute.Size);
        }

        return buffer;
    }

    public static T ToFieldObject<T>(this byte[] buffer) where T : class, new()
    {
        var obj = new T();
        var properties = typeof(T).GetProperties();

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = GetValueFromBit(buffer, offsetAttribute.Offset, offsetAttribute.Size,
                property.PropertyType);
            property.SetValue(obj, value);
        }

        return obj;
    }

    /// <summary>
    /// 将值按位写入buffer
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="value"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    private static void SetBitValue(ref byte[] buffer, int value, int offset, int size)
    {
        var mask = (1 << size) - 1;
        buffer[offset / 8] |= (byte)((value & mask) << (offset % 8));
        if (offset % 8 + size > 8)
        {
            buffer[offset / 8 + 1] |= (byte)((value & mask) >> (8 - offset % 8));
        }
    }

    /// <summary>
    /// 从buffer中按位读取值
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    /// <param name="propertyType"></param>
    /// <returns></returns>
    private static dynamic GetValueFromBit(byte[] buffer, int offset, int size, Type propertyType)
    {
        var mask = (1 << size) - 1;
        var bitValue = (buffer[offset / 8] >> (offset % 8)) & mask;
        if (offset % 8 + size > 8)
        {
            bitValue |= (buffer[offset / 8 + 1] << (8 - offset % 8)) & mask;
        }

        dynamic result = Convert.ChangeType(bitValue, propertyType); // 根据属性类型进行转换
        return result;
    }
}

3. Optimization effect and summary

Through gradual optimization, we reduced the number from the initial Json serialization of 366 bytes to 152 bytes using ordinary binary serialization, and further optimized it to 86 bytes using bit field technology. This optimization is very impressive in network transmission, especially when large amounts of data need to be transmitted.

This paper discusses the optimization method of binary serialization of C#objects through an example case. By using bit field technology, we achieve extreme compression of the packet size and improve the efficiency of network transmission. This is a pleasure for developing C/S programs and a reflection of the pursuit of ultimate performance.

Finally, we provide a GitHub link to the test source code for this article for readers to refer and learn.

彩蛋:该仓库有上篇《C#百万对象序列化深度剖析:如何在网络传输中实现速度与体积的完美平衡 (dotnet9.com)》案例代码,也附带了 TCP、UDP 服务端与客户端联调测试程序哦。

Keep Exploring

延伸阅读

更多文章
同分类 / 同标签 4/22/2026

Support for. NET by operating system versions (250707 update)

Use virtual machines and test machines to test the support of each version of the operating system for. NET. After installing the operating system, it is passed by measuring the corresponding running time of the installation and being able to run the Stardust Agent.

继续阅读
同分类 / 同标签 2/7/2026

Summary of experience in using AOT

From the very beginning of project creation, you should develop a good habit of conducting AOT release testing in a timely manner whenever new features are added or newer syntax is used.

继续阅读