C# Object Binary Serialization Optimization: Achieving Extreme Compression with Bitfield Techniques

1. Introduction

In operating systems, process information is crucial for system monitoring and performance analysis. Suppose we need to develop a monitoring program that can capture process information from the current operating system and efficiently transmit it to other endpoints (such as a server or monitoring console). During this process, converting captured process objects into binary data and optimizing it to minimize packet size becomes a key challenge. This article explores how to use bit-field technology to optimize binary serialization of C# objects through a step-by-step analysis.

Operating system process information

First, we provide an example of field definitions for a process object. To transmit this object over the network (TCP/UDP), we need to convert it into binary format. The challenge lies in achieving the smallest possible packet size during this conversion.

Field Name	Description	Example
PID	Process ID	10565
Name	Process name	码坊
Publisher	Publisher	沙漠尽头的狼
CommandLine	Command line	`dotnet CodeWF.Tools.dll`
CPU	CPU (total processing utilization across all cores)	2.3%
Memory	Memory (physical memory used by the process)	0.1%
Disk	Disk (total utilization across all physical drives)	0.1 MB/s
Network	Network (network utilization on the primary network)	0 Mbps
GPU	GPU (highest utilization across all GPU engines)	2.2%
GPUEngine	GPU engine	GPU 0 - 3D
PowerUsage	Power usage (impact of CPU, disk, and GPU on power consumption)	Low
PowerUsageTrend	Power usage trend (impact of CPU, disk, and GPU over time)	Very low
Type	Process type	Application
Status	Process status	Efficiency mode

2. Optimization Process

2.1. Process Object Definition and Preliminary Analysis

We determined the data type for each field based on the example values.

Field Name	Data Type	Description	Example
PID	int	Process ID	10565
Name	string?	Process name	码坊
Publisher	string?	Publisher	沙漠尽头的狼
CommandLine	string?	Command line	`dotnet CodeWF.Tools.dll`
CPU	string?	CPU (total processing utilization across all cores)	2.3%
Memory	string?	Memory (physical memory used by the process)	0.1%
Disk	string?	Disk (total utilization across all physical drives)	0.1 MB/s
Network	string?	Network (network utilization on the primary network)	0 Mbps
GPU	string?	GPU (highest utilization across all GPU engines)	2.2%
GPUEngine	string?	GPU engine	GPU 0 - 3D
PowerUsage	string?	Power usage (impact of CPU, disk, and GPU on power consumption)	Low
PowerUsageTrend	string?	Power usage trend (impact of CPU, disk, and GPU over time)	Very low
Type	string?	Process type	Application
Status	string?	Process status	Efficiency mode

Create a C# class SystemProcess to represent process information:

public class SystemProcess
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public string? CPU { get; set; }
    public string? Memory { get; set; }
    public string? Disk { get; set; }
    public string? Network { get; set; }
    public string? GPU { get; set; }
    public string? GPUEngine { get; set; }
    public string? PowerUsage { get; set; }
    public string? PowerUsageTrend { get; set; }
    public string? Type { get; set; }
    public string? Status { get; set; }
}

Define test data

private SystemProcess _codeWFObject = new SystemProcess()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = "2.3%",
    Memory = "0.1%",
    Disk = "0.1 MB/秒",
    Network = "0 Mbps",
    GPU = "2.2%",
    GPUEngine = "GPU 0 - 3D",
    PowerUsage = "低",
    PowerUsageTrend = "非常低",
    Type = "应用",
    Status = "效率模式"
};

2.2. Excluding JSON Serialization

Converting the object to a JSON string is the most common approach in web development because it is concise and easy to handle on both the frontend and backend:

public class SysteProcessUnitTest
{
    private readonly ITestOutputHelper _testOutputHelper;

    private SystemProcess _codeWFObject // defined earlier, omitted here

    public SysteProcessUnitTest(ITestOutputHelper testOutputHelper)
    {
        _testOutputHelper = testOutputHelper;
    }

    /// <summary>
    /// JSON serialization size test
    /// </summary>
    [Fact]
    public void Test_SerializeJsonData_Success()
    {
        var jsonData = JsonSerializer.Serialize(_codeWFObject);
        _testOutputHelper.WriteLine($"Json length: {jsonData.Length}");

        var jsonDataBytes = Encoding.UTF8.GetBytes(jsonData);
        _testOutputHelper.WriteLine($"JSON binary length: {jsonDataBytes.Length}");
    }
}

Standard Output: 
Json length: 366
JSON binary length: 366

Although JSON serialization is very popular in web development for its simplicity and ease of processing, in TCP/UDP network transmission, JSON serialization can lead to unnecessary packet size increase (redundant field name declarations). Therefore, we exclude JSON serialization and look for other more efficient binary serialization methods.

{
  "PID": 10565,
  "Name": "\u7801\u754C\u5DE5\u574A",
  "Publisher": "\u6C99\u6F20\u5C3D\u5934\u7684\u72FC",
  "CommandLine": "dotnet CodeWF.Tools.dll",
  "CPU": "2.3%",
  "Memory": "0.1%",
  "Disk": "0.1 MB/\u79D2",
  "Network": "0 Mbps",
  "GPU": "2.2%",
  "GPUEngine": "GPU 0 - 3D",
  "PowerUsage": "\u4F4E",
  "PowerUsageTrend": "\u975E\u5E38\u4F4E",
  "Type": "\u5E94\u7528",
  "Status": "\u6548\u7387\u6A21\u5F0F"
}

2.3. Binary Serialization Using BinaryWriter

Using the binary serialization helper class SerializeHelper from a previous article by the site owner, which uses BinaryWriter to convert objects to binary data (deserialization uses BinaryReader).

First, we make the SystemProcess class implement an empty interface INetObject and add the NetHeadAttribute attribute to the class (adds a packet header definition to facilitate identification during deserialization of multiple network objects; serialization will add a few extra bytes, mainly for system ID, network object ID, object version number, and other serialization auxiliary fields).

/// <summary>
/// Network object serialization interface
/// </summary>
public interface INetObject
{
}

[NetHead(1, 1)]
public class SystemProcess : INetObject
{
 	// Field definitions omitted
}

Then, we write a test method to verify the correctness of serialization and deserialization, and print the length of the serialized binary data.

/// <summary>
/// Binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
}

Standard Output: 
Binary length after serialization: 152

This is more than half the size of JSON (366 to 152, and with a few extra fields). The unit test above also verifies data correctness after deserialization. We will continue optimizing based on this foundation.

2.4. Data Type Adjustments

To further optimize the binary data size, we adjusted the data types. Through analysis of the process data example, we found that some fields could be represented more compactly. For example, CPU utilization can transmit only the numeric part (e.g., 2.3) without the percent sign; the process type can transmit only the enum value instead of a personalized string. Such adjustments reduce packet size.

Field Name	Data Type	Description	Example
PID	int	Process ID	10565
Name	string?	Process name	码坊
Publisher	string?	Publisher	沙漠尽头的狼
CommandLine	string?	Command line	`dotnet CodeWF.Tools.dll`
CPU	float	CPU (total processing utilization across all cores)	2.3
Memory	float	Memory (physical memory used by the process)	0.1
Disk	float	Disk (total utilization across all physical drives)	0.1
Network	float	Network (network utilization on the primary network)	0
GPU	float	GPU (highest utilization across all GPU engines)	2.2
GPUEngine	byte	GPU engine, 0: None, 1: GPU 0 - 3D	1
PowerUsage	byte	Power usage (impact of CPU, disk, and GPU), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high	1
PowerUsageTrend	byte	Power usage trend (impact over time), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high	0
Type	byte	Process type, 0: Application, 1: Background process	0
Status	byte	Process status, 0: Normal, 1: Efficiency mode, 2: Suspended	1

Modify the test data definition:

[NetHead(1, 2)]
public class SystemProcess2 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    public float CPU { get; set; }
    public float Memory { get; set; }
    public float Disk { get; set; }
    public float Network { get; set; }
    public float GPU { get; set; }
    public byte GPUEngine { get; set; }
    public byte PowerUsage { get; set; }
    public byte PowerUsageTrend { get; set; }
    public byte Type { get; set; }
    public byte Status { get; set; }
}

/// <summary>
/// Ordinary optimized field data types
/// </summary>
private SystemProcess2 _codeWFObject2 = new SystemProcess2()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    CPU = 2.3f,
    Memory = 0.1f,
    Disk = 0.1f,
    Network = 0,
    GPU = 2.2f,
    GPUEngine = 1,
    PowerUsage = 1,
    PowerUsageTrend = 0,
    Type = 0,
    Status = 1
};

Add unit test as follows:

/// <summary>
/// Binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes2_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject2, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess2>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(2.2f, deserializeObj.GPU);
}

Test result:

Standard Output: 
Binary length after serialization: 99

Packet size is further reduced by about one-third, from 152 bytes to 99 bytes, thanks to adjusting some field data types from string? to float or byte.

2.5. Further Data Type Adjustments and Bit-Field Optimization

Going a step further, we introduced bit-field technology. Bit-fields allow us to control the memory layout of fields more finely, thereby further reducing binary data size. We redefined the field rules and used bit-fields to represent some enum value fields. In this way, we can significantly reduce packet size.

Comparing the previous table with the one below, the main adjustments involve two types of data type changes, with the following rules:

First type: Some fields are just enum values, represented by byte (8 bits). For instance, the process type has only two states (0: Application, 1: Background process), which can be represented by 1 bit (0, 1); power usage, for example, has only 5 states, which can be represented by 3 bits (can represent 6 states).
Second type: Some float data types, in practice we only require precision to one decimal place. The values represent percentages, so they will not exceed 1 (i.e., 100.0%). We can consider rounding, e.g., 23.3% passes 23.3, multiply by 10 to get 233, max not exceeding 1000 (i.e., 100.0, 100%). Another process parses the data and divides by 10 for use. Thus, the data type can be optimized from a float (4 bytes, 32 bits) to 10 bits (maximum value 1024).

According to this rule, we redefine the field rules as follows:

Field Name	Data Type	Description	Example
PID	int	Process ID	10565
Name	string?	Process name	码坊
Publisher	string?	Publisher	沙漠尽头的狼
CommandLine	string?	Command line	`dotnet CodeWF.Tools.dll`
Data	byte[8]	Fixed-size fields. Why 8 bytes? (Note: During deserialization, an additional 4 bytes are used to represent the byte[] length, so the Data field occupies 12 bytes total)

Detailed description of the fixed fields (Data):

Field Name	Offset	Size	Description	Example
CPU	0	10	CPU (total processing utilization across all cores), last digit indicates decimal place, e.g., 23 means 2.3%	23
Memory	10	10	Memory (physical memory used by the process), last digit indicates decimal place, e.g., 1 means 0.1%, value computed from basic info	1
Disk	20	10	Disk (total utilization across all physical drives), last digit indicates decimal place, e.g., 1 means 0.1%	1
Network	30	10	Network (network utilization on the primary network), last digit indicates decimal place, e.g., 253 means 25.3%	0
GPU	40	10	GPU (highest utilization across all GPU engines), last digit indicates decimal place, e.g., 253 means 25.3%	22
GPUEngine	50	1	GPU engine, 0: None, 1: GPU 0 - 3D	1
PowerUsage	51	3	Power usage (impact of CPU, disk, and GPU), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high	1
PowerUsageTrend	54	3	Power usage trend (impact over time), 0: Very low, 1: Low, 2: Medium, 3: High, 4: Very high	0
Type	57	1	Process type, 0: Application, 1: Background process	0
Status	58	2	Process status, 0: Normal, 1: Efficiency mode, 2: Suspended	1

The table above shows the bit-field rules for some fixed example fields. Offset indicates the position of the field in the Data byte array (calculated in bits), and Size indicates the size occupied by the field in Data (also in bits). For example, the Memory field occupies bits 10 to 20 in the Data byte array.

Thus, the 10 fixed-size fields, originally 25 bytes, are optimized to 8 bytes (5 float fields from 4 bytes/32 bits to 10 bits each, single-byte 8-bit fields optimized to 2, 4, or 6 bits, i.e., 200 bits (25*8) optimized to 64 bits (actually 60 bits, but since the smallest unit for network transmission is a byte, it is rounded up to 8 bytes/64 bits)).

Modify the class definition as follows; pay attention to the comments in the code:

[NetHead(1, 3)]
public class SystemProcess3 : INetObject
{
    public int PID { get; set; }
    public string? Name { get; set; }
    public string? Publisher { get; set; }
    public string? CommandLine { get; set; }
    private byte[]? _data;
    /// <summary>
    /// Serialization: this is the actual data that needs to be serialized
    /// </summary>
    public byte[]? Data
    {
        get => _data;
        set
        {
            _data = value;

            // Key: convert byte array to object during deserialization for program use (bit-field operations)
            _processData = _data?.ToFieldObject<SystemProcessData>();
        }
    }

    private SystemProcessData? _processData;

    /// <summary>
    /// Process data: adding NetIgnoreMember will ignore this field during serialization
    /// </summary>
    [NetIgnoreMember]
    public SystemProcessData? ProcessData
    {
        get => _processData;
        set
        {
            _processData = value;

            // Key: convert object to byte array (bit-field serialization)
            _data = _processData?.FieldObjectBuffer();
        }
    }
}

public record SystemProcessData
{
    [NetFieldOffset(0, 10)] public short CPU { get; set; }
    [NetFieldOffset(10, 10)] public short Memory { get; set; }
    [NetFieldOffset(20, 10)] public short Disk { get; set; }
    [NetFieldOffset(30, 10)] public short Network { get; set; }
    [NetFieldOffset(40, 10)] public short GPU { get; set; }
    [NetFieldOffset(50, 1)] public byte GPUEngine { get; set; }
    [NetFieldOffset(51, 3)] public byte PowerUsage { get; set; }
    [NetFieldOffset(54, 3)] public byte PowerUsageTrend { get; set; }
    [NetFieldOffset(57, 1)] public byte Type { get; set; }
    [NetFieldOffset(58, 2)] public byte Status { get; set; }
}

Add unit test as follows:

/// <summary>
/// Extreme optimized field data types
/// </summary>
private SystemProcess3 _codeWFObject3 = new SystemProcess3()
{
    PID = 10565,
    Name = "码坊",
    Publisher = "沙漠尽头的狼",
    CommandLine = "dotnet CodeWF.Tools.dll",
    ProcessData = new SystemProcessData()
    {
        CPU = 23,
        Memory = 1,
        Disk = 1,
        Network = 0,
        GPU = 22,
        GPUEngine = 1,
        PowerUsage = 1,
        PowerUsageTrend = 0,
        Type = 0,
        Status = 1
    }
};

/// <summary>
/// Extreme binary serialization test
/// </summary>
[Fact]
public void Test_SerializeToBytes3_Success()
{
    var buffer = SerializeHelper.SerializeByNative(_codeWFObject3, 1);
    _testOutputHelper.WriteLine($"Binary length after serialization: {buffer.Length}");

    var deserializeObj = SerializeHelper.DeserializeByNative<SystemProcess3>(buffer);
    Assert.Equal("码坊", deserializeObj.Name);
    Assert.Equal(23, deserializeObj.ProcessData.CPU);
    Assert.Equal(1, deserializeObj.ProcessData.PowerUsage);
}

Test output:

Standard Output: 
Binary length after serialization: 86

Reduced from 99 to 86 bytes – a saving of 13 bytes, which is very significant in extreme network environments. For example, with 1 million data points, that would be 12.4 MB! The bit-field serialization and deserialization code is not detailed here (it can be dry and the site owner may not explain it clearly). The code looks like this:

public partial class SerializeHelper
{
    public static byte[] FieldObjectBuffer<T>(this T obj) where T : class
    {
        var properties = typeof(T).GetProperties();
        var totalSize = 0;

        // Calculate total bit length
        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            totalSize = Math.Max(totalSize, offsetAttribute.Offset + offsetAttribute.Size);
        }

        var bufferLength = (int)Math.Ceiling((double)totalSize / 8);
        var buffer = new byte[bufferLength];

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = property.GetValue(obj)!; // Use dynamic to get property value
            SetBitValue(ref buffer, value, offsetAttribute.Offset, offsetAttribute.Size);
        }

        return buffer;
    }

    public static T ToFieldObject<T>(this byte[] buffer) where T : class, new()
    {
        var obj = new T();
        var properties = typeof(T).GetProperties();

        foreach (var property in properties)
        {
            if (!Attribute.IsDefined(property, typeof(NetFieldOffsetAttribute)))
            {
                continue;
            }

            var offsetAttribute =
                (NetFieldOffsetAttribute)property.GetCustomAttribute(typeof(NetFieldOffsetAttribute))!;
            dynamic value = GetValueFromBit(buffer, offsetAttribute.Offset, offsetAttribute.Size,
                property.PropertyType);
            property.SetValue(obj, value);
        }

        return obj;
    }

    /// <summary>
    /// Write value to buffer bit by bit
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="value"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    private static void SetBitValue(ref byte[] buffer, int value, int offset, int size)
    {
        var mask = (1 << size) - 1;
        buffer[offset / 8] |= (byte)((value & mask) << (offset % 8));
        if (offset % 8 + size > 8)
        {
            buffer[offset / 8 + 1] |= (byte)((value & mask) >> (8 - offset % 8));
        }
    }

    /// <summary>
    /// Read value from buffer bit by bit
    /// </summary>
    /// <param name="buffer"></param>
    /// <param name="offset"></param>
    /// <param name="size"></param>
    /// <param name="propertyType"></param>
    /// <returns></returns>
    private static dynamic GetValueFromBit(byte[] buffer, int offset, int size, Type propertyType)
    {
        var mask = (1 << size) - 1;
        var bitValue = (buffer[offset / 8] >> (offset % 8)) & mask;
        if (offset % 8 + size > 8)
        {
            bitValue |= (buffer[offset / 8 + 1] << (8 - offset % 8)) & mask;
        }

        dynamic result = Convert.ChangeType(bitValue, propertyType); // Convert based on property type
        return result;
    }
}

3. Optimization Results and Summary

Through step-by-step optimization, we reduced the size from the initial JSON serialization of 366 bytes to 152 bytes using ordinary binary serialization, and then further to 86 bytes using bit-field technology. This optimization is very significant for network transmission, especially when large amounts of data need to be transferred.

This article explores optimization methods for binary serialization of C# objects using an example case. By employing bit-field technology, we achieved extreme compression of packet size, improving network transmission efficiency. This is a pleasure for C/S program development and a reflection of pursuing extreme performance.

Finally, we provide the GitHub link to the test source code for readers' reference and study.

https://github.com/dotnet9/CsharpSocketTest

Bonus: The repository also contains the case code from the previous article "C# Million Object Serialization Deep Analysis: How to Achieve a Perfect Balance Between Speed and Volume in Network Transmission", as well as TCP/UDP server and client debugging programs.

C# Object Binary Serialization Optimization: Achieving Extreme Compression with Bitfield Techniques

1. Introduction

2. Optimization Process

2.1. Process Object Definition and Preliminary Analysis

2.2. Excluding JSON Serialization

2.3. Binary Serialization Using BinaryWriter

2.4. Data Type Adjustments

2.5. Further Data Type Adjustments and Bit-Field Optimization

3. Optimization Results and Summary

Related Reading

.NET Support Status Across OS Versions (Updated 250707)

.NET Cross-Platform Native Library Integration in Practice

AI Reconstruct Razor Pages Website Completed

Summary of AOT Usage Experience