[.NET]使用十年股价对比各种序列化技术

1. 前言#

上一家公司有搞股票,当时很任性地直接从服务器读取一个股票10年份的股价(还有各种指标)在客户端的图表上显示,而且因为是桌面客户端,传输的数据也是简单粗暴地使用Soap序列化。获取报价的接口大概如下,通过symbol、beginDate和endDate三个参数获取股票某个时间段的股价:

Copypublic IEnumerable<StockPrice> LoadStockPrices(string symbol,DateTime beginDate,DateTime endDate)
{
    //some code
}

后来用Xamarin.Forms做了移动客户端,在手机上就不敢这么任性了,移动端不仅对流量比较敏感,而且显示这么多数据也不现实,于是限制为不可以获取这么长时间的股价,选择一种新的序列化方式也被提上了日程。不过当时我也快离职了所以没关心这件事。
上周看到这篇问文章:【开源】C#.NET股票历史数据采集,【附18年历史数据和源代码】,一时兴起就试试用各种常用的序列化技术实现以前的需求。

2. 数据结构#

Copy[Serializable]
[ProtoContract]
[DataContract]
public class StockPrice
{
    [ProtoMember(1)]
    [DataMember]
    public double ClosePrice { get; set; }

    [ProtoMember(2)]
    [DataMember]
    public DateTime Date { get; set; }

    [ProtoMember(3)]
    [DataMember]
    public double HighPrice { get; set; }

    [ProtoMember(4)]
    [DataMember]
    public double LowPrice { get; set; }

    [ProtoMember(5)]
    [DataMember]
    public double OpenPrice { get; set; }

    [ProtoMember(6)]
    [DataMember]
    public double PrvClosePrice { get; set; }

    [ProtoMember(7)]
    [DataMember]
    public string Symbol { get; set; }

    [ProtoMember(8)]
    [DataMember]
    public double Turnover { get; set; }

    [ProtoMember(9)]
    [DataMember]
    public double Volume { get; set; }
}

上面是股价的数据结构,包含股票代号、日期、OHLC、前收市价(PreClosePice),成交额(Turnover)和成交量(Volume),这里我已经把序列化要用到的Attribute加上了。

测试数据使用長和(00001)2003年开始10年的股价,共2717条数据。为了方便测试已经把它们从数据库导出到文本文档。其实大小也就200K而已。

[.NET]使用十年股价对比各种序列化技术

3. 各种序列化技术#

在.NET中要执行序列化有很多可以考虑的东西,如网络传输、安全性、.NET Remoting的远程对象等内容。但这里单纯只考虑序列化本身。

3.1 二进制序列化#

二进制序列化将对象的公共字段和私有字段以及类(包括含有该类的程序集)的名称都转换成字节流,对该对象进行反序列化时,将创建原始对象的准确克隆。除了.NET可序列化的类型,其它类型要想序列化,最简单的方法是使用 SerializableAttribute 对其进行标记。

.NET中使用BinaryFormatter实现二进制序列化,代码如下:

Copypublic override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        IFormatter formatter = new BinaryFormatter();
        formatter.Serialize(stream, instance);
        return stream.ToArray();
    }
}


public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        IFormatter formatter = new BinaryFormatter();
        var target = formatter.Deserialize(stream);
        return target as List<StockPrice>;
    }
}

结果:

NameSerialize(ms)Deserialize(ms)Bytes
BinarySerializer11712242,460

3.2 XML#

XML序列化将对象的公共字段和属性或者方法的参数及返回值转换(序列化)为符合特定 XML架构定义语言 (XSD) 文档的 XML 流。由于 XML 是开放式的标准,因此可以根据需要由任何应用程序处理 XML流,而与平台无关。

.NET中执行Xml序列化可以使用XmlSerializer:

Copypublic override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(List<StockPrice>));
        serializer.Serialize(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        var serializer = new System.Xml.Serialization.XmlSerializer(typeof(List<StockPrice>));
        var target = serializer.Deserialize(stream);
        return target as List<StockPrice>;
    }
}

结果如下,因为XML格式为了有较好的可读性引入了一些冗余的文本信息,所以体积膨胀了不少:

NameSerialize(ms)Deserialize(ms)Bytes
XmlSerializer13326922,900

3.3 SOAP#

XML 序列化还可用于将对象序列化为符合 SOAP 规范的 XML 流。 SOAP 是一种基于 XML 的协议,它是专门为使用 XML 来传输过程调用而设计的,熟悉WCF的应该不会对SOAP感到陌生。

.NET中使用SoapFormatter实现序列化,代码如下:

Copypublic override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        IFormatter formatter = new SoapFormatter();
        formatter.Serialize(stream, instance.ToArray());
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        IFormatter formatter = new SoapFormatter();
        var target = formatter.Deserialize(stream);
        return (target as StockPrice[]).ToList();
    }
}

结果如下,由于它本身的特性,体积膨胀得更可怕了(我记得WCF默认就是使用SOAP?):

NameSerialize(ms)Deserialize(ms)Bytes
SoapSerializer1051232,858,416

3.4 JSON#

JSON(JavaScript Object Notation)是一种由道格拉斯·克罗克福特构想和设计、轻量级的资料交换语言,该语言以易于让人阅读的文字为基础,用来传输由属性值或者序列性的值组成的数据对象。

虽然.NET提供了DataContractJsonSerializer,但Json.NET更受欢迎,代码如下:

Copypublic override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        var serializer = new DataContractJsonSerializer(typeof(List<StockPrice>));
        serializer.WriteObject(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        var serializer = new DataContractJsonSerializer(typeof(List<StockPrice>));
        var target = serializer.ReadObject(stream);
        return target as List<StockPrice>;
    }
}

结果如下,JSON的体积比XML小很多:

NameSerialize(ms)Deserialize(ms)Bytes
JsonSerializer4060504,320

3.5 Protobuf#

其实一开始我和我的同事就清楚用Protobuf最好。

Protocol Buffers 是 Google提供的数据序列化机制。它性能高,压缩效率好,但是为了提高性能,Protobuf采用了二进制格式进行编码,导致可读性较差。

使用protobuf-net需要将序列化的对象使用ProtoContractAttribute和ProtoMemberAttribute进行标记。序列化和反序列化代码如下:

Copypublic override byte[] Serialize(List<StockPrice> instance)
{
    using (var stream = new MemoryStream())
    {
        Serializer.Serialize(stream, instance);
        return stream.ToArray();
    }
}

public override List<StockPrice> Deserialize(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        return Serializer.Deserialize<List<StockPrice>>(stream);
    }
}

结果十分优秀:

NameSerialize(ms)Deserialize(ms)Bytes
ProtobufSerializer9318211,926

3.6 结果对比#

NameSerialize(ms)Deserialize(ms)Bytes
BinarySerializer11712242,460
XmlSerializer13326922,900
SoapSerializer1051232,858,416
JsonSerializer4060504,320
ProtobufSerializer9318211,926

将上述方案的结果列出来对比,Protobuf序列化后体积最少。不过即使是Protobuf,压缩后的数据仍然比文本文档的200K还大,那还不如直接传输这个文本文档。

4. 优化数据结构#

其实传输的数据结构上有很大的优化空间。

首先是股票代号Symbol,前面提到获取股价的接口大概是这样:IEnumerable LoadStockPrices(string symbol,DateTime beginDate,DateTime endDate)。既然都知道要获取的股票代号,StockPrice中Symbol这个属性完全就是多余的。

其次是OHLC和PreClosePrice,港股(不记得其它Market是不是这样)的报价肯定是4位有效数字(如95.05和102.4),用float精度也够了,不必用 double。

最后是Date,反正只需要知道日期,不必知道时分秒,直接用与1970-01-01相差的天数作为存储应该就可以了。

Copyprivate static DateTime _beginDate = new DateTime(1970, 1, 1);

public DateTime Date
{
    get => _beginDate.AddDays(DaysFrom1970);
    set => DaysFrom1970 = (short) Math.Floor((value - _beginDate).TotalDays);
}

[ProtoMember(2)]
[DataMember]
public short DaysFrom1970 { get; set; }

不要以为Volume可以改为int,有些仙股有时会有几十亿的成交量,超过int的最大值2147483647(顺便一提Int32的最大值是2的31次方减1,有时面试会考)。

这样修改后的类结构如下:

Copy[Serializable]
[ProtoContract]
[DataContract]
public class StockPriceSlim
{
    [ProtoMember(1)]
    [DataMember]
    public float ClosePrice { get; set; }

    private static DateTime _beginDate = new DateTime(1970, 1, 1);

    public DateTime Date
    {
        get => _beginDate.AddDays(DaysFrom1970);
        set => DaysFrom1970 = (short) Math.Floor((value - _beginDate).TotalDays);
    }

    [ProtoMember(2)]
    [DataMember]
    public short DaysFrom1970 { get; set; }

    [ProtoMember(3)]
    [DataMember]
    public float HighPrice { get; set; }

    [ProtoMember(4)]
    [DataMember]
    public float LowPrice { get; set; }

    [ProtoMember(5)]
    [DataMember]
    public float OpenPrice { get; set; }

    [ProtoMember(6)]
    [DataMember]
    public float PrvClosePrice { get; set; }

    [ProtoMember(8)]
    [DataMember]
    public double Turnover { get; set; }

    [ProtoMember(9)]
    [DataMember]
    public double Volume { get; set; }
}

序列化的体积大幅减少:

NameSerialize(ms)Deserialize(ms)Bytes
BinarySerializer1112141,930
XmlSerializer4224977,248
SoapSerializer48892,586,720
JsonSerializer1733411,942
ProtobufSerializer73130,416

其实之所以有这么大的优化空间,一来是因为传输的对象本身就是ORM生成的对象没针对网络传输做优化,二来各个券商的数据源差不多都是这样传输数据的,最后,本来这个接口是给桌面客户端用的根本就懒得考虑传输数据的大小。

5. 自定义的序列化#

由于股票的数据结构相对稳定,而且这个接口不需要通用性,可以自己实现序列化。StockPriceSlim所有属性加起来是38个字节,测试数据是2717条报价,共103246字节,少于Protobuf的130416字节。要达到每个报价只存储38个字节,只需将每个属性的值填入固定的位置:

Copy
public override byte[] SerializeSlim(List<StockPriceSlim> instance)
{
    var list = new List<byte>();
    foreach (var item in instance)
    {
        var bytes = BitConverter.GetBytes(item.DaysFrom1970);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.OpenPrice);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.HighPrice);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.LowPrice);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.ClosePrice);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.PrvClosePrice);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.Volume);
        list.AddRange(bytes);

        bytes = BitConverter.GetBytes(item.Turnover);
        list.AddRange(bytes);
    }

    return list.ToArray();
}


public override List<StockPriceSlim> DeserializeSlim(byte[] source)
{
    var result = new List<StockPriceSlim>();
    var index = 0;
    using (var stream = new MemoryStream(source))
    {
        while (index < source.Length)
        {
            var price = new StockPriceSlim();
            var bytes = new byte[sizeof(short)];
            stream.Read(bytes, 0, sizeof(short));
            var days = BitConverter.ToInt16(bytes, 0);
            price.DaysFrom1970 = days;
            index += bytes.Length;

            bytes = new byte[sizeof(float)];
            stream.Read(bytes, 0, sizeof(float));
            var value = BitConverter.ToSingle(bytes, 0);
            price.OpenPrice = value;
            index += bytes.Length;

            stream.Read(bytes, 0, sizeof(float));
            value = BitConverter.ToSingle(bytes, 0);
            price.HighPrice = value;
            index += bytes.Length;

            stream.Read(bytes, 0, sizeof(float));
            value = BitConverter.ToSingle(bytes, 0);
            price.LowPrice = value;
            index += bytes.Length;

            stream.Read(bytes, 0, sizeof(float));
            value = BitConverter.ToSingle(bytes, 0);
            price.ClosePrice = value;
            index += bytes.Length;

            stream.Read(bytes, 0, sizeof(float));
            value = BitConverter.ToSingle(bytes, 0);
            price.PrvClosePrice = value;
            index += bytes.Length;

            bytes = new byte[sizeof(double)];
            stream.Read(bytes, 0, sizeof(double));
            var volume = BitConverter.ToDouble(bytes, 0);
            price.Volume = volume;
            index += bytes.Length;

            bytes = new byte[sizeof(double)];
            stream.Read(bytes, 0, sizeof(double));
            var turnover = BitConverter.ToDouble(bytes, 0);
            price.Turnover = turnover;
            index += bytes.Length;

            result.Add(price);
        }
        return result;
    }
}

结果如下:

NameSerialize(ms)Deserialize(ms)Bytes
CustomSerializer51103,246

这种方式不仅序列化后的体积最小,而且序列化和反序列化的速度都十分优秀,不过代码十分难看而且没有扩展性。尝试用反射改进一下:

Copypublic override byte[] SerializeSlim(List<StockPriceSlim> instance)
{
    var result = new List<byte>();
    foreach (var item in instance)
        foreach (var property in typeof(StockPriceSlim).GetProperties())
        {
            if (property.GetCustomAttribute(typeof(DataMemberAttribute)) == null)
                continue;

            var value = property.GetValue(item);
            byte[] bytes = null;
            if (property.PropertyType == typeof(int))
                bytes = BitConverter.GetBytes((int)value);
            else if (property.PropertyType == typeof(short))
                bytes = BitConverter.GetBytes((short)value);
            else if (property.PropertyType == typeof(float))
                bytes = BitConverter.GetBytes((float)value);
            else if (property.PropertyType == typeof(double))
                bytes = BitConverter.GetBytes((double)value);
            result.AddRange(bytes);
        }

    return result.ToArray();
}

public override List<StockPriceSlim> DeserializeSlim(byte[] source)
{
    using (var stream = new MemoryStream(source))
    {
        var result = new List<StockPriceSlim>();
        var index = 0;

        while (index < source.Length)
        {
            var price = new StockPriceSlim();
            foreach (var property in typeof(StockPriceSlim).GetProperties())
            {
                if (property.GetCustomAttribute(typeof(DataMemberAttribute)) == null)
                    continue;

                byte[] bytes = null;
                object value = null;

                if (property.PropertyType == typeof(int))
                {
                    bytes = new byte[sizeof(int)];
                    stream.Read(bytes, 0, bytes.Length);
                    value = BitConverter.ToInt32(bytes, 0);
                }
                else if (property.PropertyType == typeof(short))
                {
                    bytes = new byte[sizeof(short)];
                    stream.Read(bytes, 0, bytes.Length);
                    value = BitConverter.ToInt16(bytes, 0);
                }
                else if (property.PropertyType == typeof(float))
                {
                    bytes = new byte[sizeof(float)];
                    stream.Read(bytes, 0, bytes.Length);
                    value = BitConverter.ToSingle(bytes, 0);
                }
                else if (property.PropertyType == typeof(double))
                {
                    bytes = new byte[sizeof(double)];
                    stream.Read(bytes, 0, bytes.Length);
                    value = BitConverter.ToDouble(bytes, 0);
                }

                property.SetValue(price, value);
                index += bytes.Length;
            }


            result.Add(price);
        }
        return result;
    }
}
NameSerialize(ms)Deserialize(ms)Bytes
ReflectionSerializer413431103,246

好像好了一些,但性能大幅下降。我好像记得有人说过.NET会将反射缓存让我不必担心反射带来的性能问题,看来我的理解有出入。索性自己缓存些反射结果:

Copyprivate readonly IEnumerable<PropertyInfo> _properties;

public ExtendReflectionSerializer()
{
    _properties = typeof(StockPriceSlim).GetProperties().Where(p => p.GetCustomAttribute(typeof(DataMemberAttribute)) != null).ToList();
}
NameSerialize(ms)Deserialize(ms)Bytes
ExtendReflectionSerializer1111103,246

这样改进后性能还可以接受。

6. 最后试试压缩#

最后试试在序列化的基础上再随便压缩一下:

Copypublic byte[] SerializeWithZip(List<StockPriceSlim> instance)
{
    var bytes = SerializeSlim(instance);

    using (var memoryStream = new MemoryStream())
    {
        using (var deflateStream = new DeflateStream(memoryStream, CompressionLevel.Fastest))
        {
            deflateStream.Write(bytes, 0, bytes.Length);
        }
        return memoryStream.ToArray();
    }
}

public List<StockPriceSlim> DeserializeWithZip(byte[] source)
{
    using (var originalFileStream = new MemoryStream(source))
    {
        using (var memoryStream = new MemoryStream())
        {
            using (var decompressionStream = new DeflateStream(originalFileStream, CompressionMode.Decompress))
            {
                decompressionStream.CopyTo(memoryStream);
            }
            var bytes = memoryStream.ToArray();
            return DeserializeSlim(bytes);
        }
    }
}

结果看来不错:

NameSerialize(ms)Deserialize(ms)BytesSerialize With Zip(ms)Deserialize With Zip(ms)Bytes With Zip
BinarySerializer1112141,930221272,954
XmlSerializer4224977,2482428108,839
SoapSerializer48892,586,7206187140,391
JsonSerializer1733411,942243590,125
ProtobufSerializer73130,4167665,644
CustomSerializer51103,2469357,697
ReflectionSerializer413431103,24640137659,285
ExtendReflectionSerializer1111103,246131459,285

7. 结语#

满足了好奇心,顺便复习了一下各种序列化的方式。

因为原来的需求就很单一,没有测试各种数据量下的对比。

虽然Protobuf十分优秀,但在本地存储序列化文件时为了可读性我通常都会选择XML或JSON。

8. 参考#

二进制序列化
XML 和 SOAP 序列化
Json.NET
Protocol Buffers – Google’s data interchange format

9. 源码#

StockDataSample

原文出处:Dino.C

原文链接:https://www.cnblogs.com/dino623/p/Serialize.html

本文观点不代表Dotnet9立场,转载请联系原作者。

发表评论

登录后才能评论