1. introduction
In operating systems, process information is crucial for system monitoring and performance analysis. Suppose we need to develop a monitoring program that can capture process information from the current operating system and efficiently transfer it to other terminals (such as servers or monitors). In this process, how to convert the captured process objects into binary data and optimize them to reduce the size of the data packet has become a key issue. This article will use step-by-step analysis to explore how to use bit field technology to optimize binary serialization of C#objects.

First, we give an example of field definition for a process object. In order to transmit this object over the network (TCP/UDP), we need to convert it to binary format. In this process, how to achieve the minimum packet size is a challenge.
| field name | description | example |
|---|---|---|
| PID | process ID | 10565 |
| Name | process name | Mafang |
| Publisher | publisher | Wolf at the end of the desert |
| CommandLine | command line | dotnet CodeWF.Tools.dll |
| CPU | CPU (total processing utilization of all cores) | 2.3% |
| Memory | Memory (physical memory occupied by the process) | 0.1% |
| Disk | Disk (total utilization of all physical drives) | 0.1 MB/SEC |
| Network | Network (current network utilization on major networks | 0 Mbps |
| GPU | GPU(highest utilization of all GPU engines) | 2.2% |
| GPUEngine | GPU engine | GPU 0 - 3D |
| PowerUsage | Power usage (impact of CPU, disk and GPU on power consumption) | low |
| PowerUsageTrend | Power usage trends (impact of CPU, disk, and GPU on power consumption over time) | very low |
| Type | process type | application |
| Status | process state | efficiency mode |
2. optimization process
2.1. Process object definition and preliminary analysis
We determined the data type for each field based on its example values.
| field name | data type | description | example |
|---|---|---|---|
| PID | int | process ID | 10565 |
| Name | string? | process name | Mafang |
| Publisher | string? | publisher | Wolf at the end of the desert |
| CommandLine | string? | command line | dotnet CodeWF.Tools.dll |
| CPU | string? | CPU (total processing utilization of all cores) | 2.3% |
| Memory | string? | Memory (physical memory occupied by the process) | 0.1% |
| Disk | string? | Disk (total utilization of all physical drives) | 0.1 MB/SEC |
| Network | string? | Network (current network utilization on major networks | 0 Mbps |
| GPU | string? | GPU(highest utilization of all GPU engines) | 2.2% |
| GPUEngine | string? | GPU engine | GPU 0 - 3D |
| PowerUsage | string? | Power usage (impact of CPU, disk and GPU on power consumption) | low |
| PowerUsageTrend | string? | Power usage trends (impact of CPU, disk, and GPU on power consumption over time) | very low |
| Type | string? | process type | application |
| Status | string? | process state | efficiency mode |
创建一个 C#类SystemProcess表示进程信息:
public class SystemProcess
{
public int PID { get; set; }
public string? Name { get; set; }
public string? Publisher { get; set; }
public string? CommandLine { get; set; }
public string? CPU { get; set; }
public string? Memory { get; set; }
public string? Disk { get; set; }
public string? Network { get; set; }
public string? GPU { get; set; }
public string? GPUEngine { get; set; }
public string? PowerUsage { get; set; }
public string? PowerUsageTrend { get; set; }
public string? Type { get; set; }
public string? Status { get; set; }
}
Define test data
private SystemProcess _codeWFObject = new SystemProcess()
{
PID = 10565,
Name = "码坊",
Publisher = "沙漠尽头的狼",
CommandLine = "dotnet CodeWF.Tools.dll",
CPU = "2.3%",
Memory = "0.1%",
Disk = "0.1 MB/秒",
Network = "0 Mbps",
GPU = "2.2%",
GPUEngine = "GPU 0 - 3D",
PowerUsage = "低",
PowerUsageTrend = "非常低",
Type = "应用",
Status = "效率模式"
};
2.2. Exclusion of Json serialization
将对象转为Json字段串,这在 Web 开发是最常见的,因为简洁,前后端都方便处理:
public class SysteProcessUnitTest
{
private readonly ITestOutputHelper _testOutputHelper;
private SystemProcess _codeWFObject // 前面已给出定义,这里省
public SysteProcessUnitTest(ITestOutputHelper testOutputHelper)
{
_testOutputHelper = testOutputHelper;
}
/// <summary>
/// Json序列化大小测试
/// </summary>
[Fact]
public void Test_SerializeJsonData_Success()
{
var jsonData = JsonSerializer.Serialize(_codeWFObject);
_testOutputHelper.WriteLine($"Json长度:{jsonData.Length}");
var jsonDataBytes = Encoding.UTF8.GetBytes(jsonData);
_testOutputHelper.WriteLine($"json二进制长度:{jsonDataBytes.Length}");
}
}
标准输出:
Json长度:366
json二进制长度:366
Although Json serialization is very popular in Web development because it is concise and easy to handle, in TCP/UDP network transmission, Json serialization can cause unnecessary packet size increases (redundant field name declarations). Therefore, we ruled out Json serialization and looked for other, more efficient methods of binary serialization.
{
"PID": 10565,
"Name": "\u7801\u754C\u5DE5\u574A",
"Publisher": "\u6C99\u6F20\u5C3D\u5934\u7684\u72FC",
"CommandLine": "dotnet CodeWF.Tools.dll",
"CPU": "2.3%",
"Memory": "0.1%",
"Disk": "0.1 MB/\u79D2",
"Network": "0 Mbps",
"GPU": "2.2%",
"GPUEngine": "GPU 0 - 3D",
"PowerUsage": "\u4F4E",
"PowerUsageTrend": "\u975E\u5E38\u4F4E",
"Type": "\u5E94\u7528",
"Status": "\u6548\u7387\u6A21\u5F0F"
}
2.3. Binary serialization using BinaryWriter
使用站长前面一篇文章写的二进制序列化帮助类SerializeHelper转换,该类使用BinaryWriter将对象转换为二进制数据(反序列化使用BinaryReader)。
首先,我们使SystemProcess类实现了一个空接口INetObject,并在类上添加了NetHeadAttribute特性(加上了数据包头部定义,便于多个网络对象反序列化识别,序列化后会多出数个字节,主要是系统 Id、网络对象 Id、对象版本号等序列化辅助字段)。
/// <summary>
/// 网络对象序列化接口
/// </summary>
public interface INetObject
{
}
[NetHead(1, 1)]
public class SystemProcess : INetObject
{
// 省略字段定义
}
We then wrote a test method to verify the correctness of serialization and deserialization, and printed the serialized binary data length.
/// <summary>
/// 二进制序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes_Success()
{
var buffer = SerializeHelper.SerializeByNative(_codeWFObject, 1);
_testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");
var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess>(buffer);
Assert.Equal("码坊", deserializeObj.Name);
}
标准输出:
序列化后二进制长度:152
It is more than half the size of Json (366 to 152, with a few more fields). The above unit test also tests whether the data is correct after deserialization, so we will continue to optimize on this basis.
2.4. Data type adjustment
In order to further optimize the size of binary data, we adjusted the data type. By analyzing the process data example, we found that the data types of some fields can be represented more compactly. For example, CPU utilization can pass only the numeric component (such as 2.3) without passing a percent sign; process types can pass only enumeration values without passing a personalized string. This adjustment reduces the size of the data packet.
| field name | data type | description | example |
|---|---|---|---|
| PID | int | process ID | 10565 |
| Name | string? | process name | Mafang |
| Publisher | string? | publisher | Wolf at the end of the desert |
| CommandLine | string? | command line | dotnet CodeWF.Tools.dll |
| CPU | float | CPU (total processing utilization of all cores) | 2.3 |
| Memory | float | Memory (physical memory occupied by the process) | 0.1 |
| Disk | float | Disk (total utilization of all physical drives) | 0.1 |
| Network | float | Network (current network utilization on major networks | 0 |
| GPU | float | GPU(highest utilization of all GPU engines) | 2.2 |
| GPUEngine | byte | GPU engine, 0: None, 1: GPU 0 - 3D | 1 |
| PowerUsage | byte | Power usage (impact of CPU, disk and GPU on power consumption), 0: very low, 1: low, 2: medium, 3: high, 4: very high | 1 |
| PowerUsageTrend | byte | Power usage trends (impact of CPU, disk and GPU on power consumption over time), 0: very low, 1: low, 2: medium, 3: high, 4: very high | 0 |
| Type | byte | Process type, 0: application, 1: background process | 0 |
| Status | byte | Process status, 0: Normal operation, 1: Efficiency mode, 2: Suspend | 1 |
Modify test data definitions:
[NetHead(1, 2)]
public class SystemProcess2 : INetObject
{
public int PID { get; set; }
public string? Name { get; set; }
public string? Publisher { get; set; }
public string? CommandLine { get; set; }
public float CPU { get; set; }
public float Memory { get; set; }
public float Disk { get; set; }
public float Network { get; set; }
public float GPU { get; set; }
public byte GPUEngine { get; set; }
public byte PowerUsage { get; set; }
public byte PowerUsageTrend { get; set; }
public byte Type { get; set; }
public byte Status { get; set; }
}
/// <summary>
/// 普通优化字段数据类型
/// </summary>
private SystemProcess2 _codeWFObject2 = new SystemProcess2()
{
PID = 10565,
Name = "码坊",
Publisher = "沙漠尽头的狼",
CommandLine = "dotnet CodeWF.Tools.dll",
CPU = 2.3f,
Memory = 0.1f,
Disk = 0.1f,
Network = 0,
GPU = 2.2f,
GPUEngine = 1,
PowerUsage = 1,
PowerUsageTrend = 0,
Type = 0,
Status = 1
};
Add unit tests as follows:
/// <summary>
/// 二进制序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes2_Success()
{
var buffer = SerializeHelper.SerializeByNative(_codeWFObject2, 1);
_testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");
var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess2>(buffer);
Assert.Equal("码坊", deserializeObj.Name);
Assert.Equal(2.2f, deserializeObj.GPU);
}
Test results:
标准输出:
序列化后二进制长度:99
包体积又优化了1/3,由 152 字节减小到 99 字节长度,这是部分字段数据类型由string?调整为float或byte的成果。
2.5. Again, data type adjustment and bit field optimization
Further, we introduced bit domain technology. Bit fields allow us to more fine-grained control over the layout of fields in memory, further reducing the size of binary data. We redefined the field rules and used bit fields to represent some enumerated value fields. In this way, we can significantly reduce the size of the data packet.
Looking at the comparison between the previous table and the following table, there are mainly two data types adjustments. The rules are as follows:
第一种:部分字段只是一些枚举值,使用的
byte表示,即 8 位(bit),其中比如进程类型只有 2 个状态(0:应用,1:后台进程),正好可以用 1 位表示(0、1);像电源使用情况,无非就是 5 个状态,用 3 位可表示全(可表示 6 种状态);The second type: partial float data types. In reality, we only require accuracy to 1 decimal place. The percentage expressed by the number will not exceed 1 (i.e. 100.0%), you can consider rounding, for example, 23.3%, the passed 23.3, multiplied by 10, and passed 233, the maximum will not exceed 1000 (i.e. 100.0, 100%). After another process parses the data and divides it by 10, then the data type of 4 bytes and 32 bits represented by float can be optimized to 10 bits (maximum value 1024).
According to this rule, we redefine the field rules as follows:
| field name | data type | description | example |
|---|---|---|---|
| PID | int | process ID | 10565 |
| Name | string? | process name | Mafang |
| Publisher | string? | publisher | Wolf at the end of the desert |
| CommandLine | string? | command line | dotnet CodeWF.Tools.dll |
| Data | byte[8] | Why are several fields of fixed size 8 bytes long? (Note: deserialization will also define 4 more bytes to represent the byte[] length, so the Data field takes up a total of 12 bytes)? |
A detailed description of the fixed field (Data) is as follows:
| field name | Offset | Size | description | example |
|---|---|---|---|---|
| CPU | 0 | 10 | CPU (total processing utilization of all cores), with the last digit representing the decimal place, e.g. 23 for 2.3% | 23 |
| Memory | 10 | 10 | Memory (physical memory occupied by the process), the last digit represents the decimal place, such as 1 represents 0.1%, the value can be calculated according to the basic information | 1 |
| Disk | 20 | 10 | Disk (total utilization of all physical drives), last digit represents decimal place, e.g. 1 represents 0.1%, value can be calculated from basic information | 1 |
| Network | 30 | 10 | Network (network utilization on current major networks), the last digit represents decimal places, for example, 253 represents 25.3%, and the value can be calculated based on basic information | 0 |
| GPU | 40 | 10 | GPU(the highest utilization of all GPU engines), the last digit represents decimal places, for example, 253 represents 25.3 | 22 |
| GPUEngine | 50 | 1 | GPU engine, 0: None, 1: GPU 0 - 3D | 1 |
| PowerUsage | 51 | 3 | Power usage (impact of CPU, disk and GPU on power consumption), 0: very low, 1: low, 2: medium, 3: high, 4: very high | 1 |
| PowerUsageTrend | 54 | 3 | Power usage trends (impact of CPU, disk and GPU on power consumption over time), 0: very low, 1: low, 2: medium, 3: high, 4: very high | 0 |
| Type | 57 | 1 | Process type, 0: application, 1: background process | 0 |
| Status | 58 | 2 | Process status, 0: Normal operation, 1: Efficiency mode, 2: Suspend | 1 |
The above table is a bit field rule table that fixes some example fields. Offset represents the position of the field in the Data byte array (calculated in bits), and Size represents the size of the field in Data (also calculated in bits). For example, the Memory field occupies 10 to 20 bits of space in the Data byte array.
As a result, the 10 fields with a fixed size and original length of 25 bytes are optimized to 8 bytes (the 32 bits of 5 float 4 bytes are optimized to 10 bits, and the 8 bits of a single byte are optimized to 2 bits, 4 bits, and 6 bits, that is, 200 bits (25*8) are optimized to 64 bits (actually 60 bits, since the minimum unit of network transmission is byte, so round up 8 bytes to 64 bits)).
Modify the class definition as follows, pay attention to the comments in the code:
[NetHead(1, 3)]
public class SystemProcess3 : INetObject
{
public int PID { get; set; }
public string? Name { get; set; }
public string? Publisher { get; set; }
public string? CommandLine { get; set; }
private byte[]? _data;
/// <summary>
/// 序列化,这是实际需要序列化的数据
/// </summary>
public byte[]? Data
{
get => _data;
set
{
_data = value;
// 这是关键:在反序列化将byte转换为对象,方便程序中使用(位域操作)
_processData = _data?.ToFieldObject<SystemProcessData>();
}
}
private SystemProcessData? _processData;
/// <summary>
/// 进程数据,添加NetIgnoreMember在序列化会忽略
/// </summary>
[NetIgnoreMember]
public SystemProcessData? ProcessData
{
get => _processData;
set
{
_processData = value;
// 这里关键:将对象转换为byte[](位域序列化操作)
_data = _processData?.FieldObjectBuffer();
}
}
}
public record SystemProcessData
{
[NetFieldOffset(0, 10)] public short CPU { get; set; }
[NetFieldOffset(10, 10)] public short Memory { get; set; }
[NetFieldOffset(20, 10)] public short Disk { get; set; }
[NetFieldOffset(30, 10)] public short Network { get; set; }
[NetFieldOffset(40, 10)] public short GPU { get; set; }
[NetFieldOffset(50, 1)] public byte GPUEngine { get; set; }
[NetFieldOffset(51, 3)] public byte PowerUsage { get; set; }
[NetFieldOffset(54, 3)] public byte PowerUsageTrend { get; set; }
[NetFieldOffset(57, 1)] public byte Type { get; set; }
[NetFieldOffset(58, 2)] public byte Status { get; set; }
}
Add unit tests as follows:
/// <summary>
/// 极限优化字段数据类型
/// </summary>
private SystemProcess3 _codeWFObject3 = new SystemProcess3()
{
PID = 10565,
Name = "码坊",
Publisher = "沙漠尽头的狼",
CommandLine = "dotnet CodeWF.Tools.dll",
ProcessData = new SystemProcessData()
{
CPU = 23,
Memory = 1,
Disk = 1,
Network = 0,
GPU = 22,
GPUEngine = 1,
PowerUsage = 1,
PowerUsageTrend = 0,
Type = 0,
Status = 1
}
};
/// <summary>
/// 二进制极限序列化测试
/// </summary>
[Fact]
public void Test_SerializeToBytes3_Success()
{
var buffer = SerializeHelper.SerializeByNative(_codeWFObject3, 1);
_testOutputHelper.WriteLine($"序列化后二进制长度:{buffer.Length}");
var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess3>(buffer);
Assert.Equal("码坊", deserializeObj.Name);
Assert.Equal(23, deserializeObj.ProcessData.CPU);
Assert.Equal(1, deserializeObj.ProcessData.PowerUsage);
}
Test output:
标准输出:
序列化后二进制长度:86
99 is optimized to 86 bytes, 13 bytes, which is very impressive in a limited network environment. For example, 1 million data, wouldn't that be 12.4MB? The code for bit field serialization and deserialization will not be detailed here. It is very boring and the webmaster may not be able to explain clearly. The code length is like this:
public partial class SerializeHelper
{
public static byte[] FieldObjectBuffer<T>(this T obj) where T : class
{
var properties = typeof(T).GetProperties();
var totalSize = 0;
// 计算总的bit长度
foreach (var property in properties)
{
if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
{
continue;
}
var offsetAttribute =
(NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
totalSize = Math.Max(totalSize, offsetAttribute.Offset + offsetAttribute.Size);
}
var bufferLength = (int)Math.Ceiling((double)totalSize / 8);
var buffer = new byte[bufferLength];
foreach (var property in properties)
{
if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
{
continue;
}
var offsetAttribute =
(NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
dynamic value = property.GetValue(obj)!; // 使用dynamic类型动态获取属性值
SetBitValue(ref buffer, value, offsetAttribute.Offset, offsetAttribute.Size);
}
return buffer;
}
public static T ToFieldObject<T>(this byte[] buffer) where T : class, new()
{
var obj = new T();
var properties = typeof(T).GetProperties();
foreach (var property in properties)
{
if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
{
continue;
}
var offsetAttribute =
(NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
dynamic value = GetValueFromBit(buffer, offsetAttribute.Offset, offsetAttribute.Size,
property.PropertyType);
property.SetValue(obj, value);
}
return obj;
}
/// <summary>
/// 将值按位写入buffer
/// </summary>
/// <param name="buffer"></param>
/// <param name="value"></param>
/// <param name="offset"></param>
/// <param name="size"></param>
private static void SetBitValue(ref byte[] buffer, int value, int offset, int size)
{
var mask = (1 << size) - 1;
buffer[offset / 8] |= (byte)((value & mask) << (offset % 8));
if (offset % 8 + size > 8)
{
buffer[offset / 8 + 1] |= (byte)((value & mask) >> (8 - offset % 8));
}
}
/// <summary>
/// 从buffer中按位读取值
/// </summary>
/// <param name="buffer"></param>
/// <param name="offset"></param>
/// <param name="size"></param>
/// <param name="propertyType"></param>
/// <returns></returns>
private static dynamic GetValueFromBit(byte[] buffer, int offset, int size, Type propertyType)
{
var mask = (1 << size) - 1;
var bitValue = (buffer[offset / 8] >> (offset % 8)) & mask;
if (offset % 8 + size > 8)
{
bitValue |= (buffer[offset / 8 + 1] << (8 - offset % 8)) & mask;
}
dynamic result = Convert.ChangeType(bitValue, propertyType); // 根据属性类型进行转换
return result;
}
}
3. Optimization effect and summary
Through gradual optimization, we reduced the number from the initial Json serialization of 366 bytes to 152 bytes using ordinary binary serialization, and further optimized it to 86 bytes using bit field technology. This optimization is very impressive in network transmission, especially when large amounts of data need to be transmitted.
This paper discusses the optimization method of binary serialization of C#objects through an example case. By using bit field technology, we achieve extreme compression of the packet size and improve the efficiency of network transmission. This is a pleasure for developing C/S programs and a reflection of the pursuit of ultimate performance.
Finally, we provide a GitHub link to the test source code for this article for readers to refer and learn.
彩蛋:该仓库有上篇《C#百万对象序列化深度剖析:如何在网络传输中实现速度与体积的完美平衡 (dotnet9.com)》案例代码,也附带了 TCP、UDP 服务端与客户端联调测试程序哦。