• 字符串压缩(二)之LZ4


    一、LZ4压缩与解压

      LZ4有两个压缩函数。默认压缩函数原型:

      int LZ4_compress_default(const char* src, char* dst, int srcSize, int dstCapacity);

      快速压缩函数原型:

      int LZ4_compress_fast (const char* src, char* dst, int srcSize, int dstCapacity, int acceleration);

      快速压缩函数acceleration的参数范围:[1 ~ LZ4_ACCELERATION_MAX],其中LZ4_ACCELERATION_MAX为65537。什么意思呢,简单的说就是acceleration值越大,压缩速率越快,但是压缩比就越低,后面我会用实验数据来进行说明。

      另外,当acceleration = 1时,就是简化版的LZ4_compress_defaultLZ4_compress_default函数默认acceleration = 1。

      LZ4也有两个解缩函数。安全解缩函数原型:

      int LZ4_decompress_safe (const char* src, char* dst, int compressedSize, int dstCapacity);

      快速解缩函数原型:
      int LZ4_decompress_fast (const char* src, char* dst, int originalSize);

      快速解压函数不建议使用。因为LZ4_decompress_fast 缺少被压缩后的文本长度参数,被认为是不安全的,LZ4建议使用LZ4_decompress_safe。

      同样,我们先来看看LZ4的压缩与解压缩示例。

     1 #include 
     2 #include 
     3 #include 
     4 #include 
     5 #include 
     6 #include 
     7 
     8 using namespace std;
     9 
    10 int main()
    11 {
    12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
    13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
    14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
    15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
    16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
    17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
    18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
    19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
    20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
    21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
    22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
    23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
    24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
    25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
    26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
    27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
    28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
    29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
    30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
    31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
    32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
    33     It's only mud.";
    34 
    35     size_t com_space_size;
    36     size_t peppa_pig_text_size;
    37 
    38     char *com_ptr = NULL;
    39 
    40     // compress
    41     peppa_pig_text_size = strlen(peppa_pig_buf);
    42     com_space_size = LZ4_compressBound(peppa_pig_text_size);
    43     
    44     com_ptr = (char *)malloc(com_space_size);
    45     if(NULL == com_ptr) {
    46         cout << "compress malloc failed" << endl;
    47         return -1;
    48     }
    49 
    50     memset(com_ptr, 0, com_space_size);
    51 
    52     size_t com_size;
    53     //com_size = LZ4_compress_default(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size);
    54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
    55     cout << "peppa pig text size:" << peppa_pig_text_size << endl;
    56     cout << "compress text size:" << com_size << endl;
    57     cout << "compress ratio:" << (float)peppa_pig_text_size / (float)com_size << endl << endl;
    58 
    59 
    60     // decompress
    61     size_t decom_size;
    62     char* decom_ptr = NULL;
    63     
    64     decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
    65     if(NULL == decom_ptr) {
    66         cout << "decompress malloc failed" << endl;
    67         return -1;
    68     }
    69 
    70     decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
    71     cout << "decompress text size:" << decom_size << endl;
    72 
    73     // use decompress buf compare with origin buf
    74     if(strncmp(peppa_pig_buf, decom_ptr, peppa_pig_text_size)) {
    75         cout << "decompress text is not equal peppa pig text" << endl;
    76     }
    77     
    78     free(com_ptr);
    79     free(decom_ptr);
    80     return 0;
    81 }

    执行结果:

      从结果可以发现,压缩之前的peppa pig文本长度为1848,压缩后的文本长度为1125(上一篇ZSTD为759),压缩率为1.6,解压后的长度与压缩前相等。相同文本情况下,压缩率低于ZSTD的2.4。从文本被压缩后的长度表现来说,LZ4比ZSTD要差。

      下图图1是LZ4随着acceleration的递增,文本被压缩后的长度与acceleration的关系。随着acceleration的递增,文本被压缩后的长度越来越长。

    图1

      图2是LZ4随着acceleration的递增,压缩率与acceleration的关系。随着acceleration的递增,压缩率也越来越低。

     图2

      这是为什么呢?还是上一篇提到的 鱼(性能)和熊掌(压缩比)的关系。获得了压缩的高性能,失去了算法的压缩率。

    二、LZ4压缩性能探索

      接下来摸索一下LZ4的压缩性能,以及LZ4在不同acceleration级别下的压缩性能。

      测试方法是,使用LZ4_compress_fast,连续压缩同一段文本并持续10秒。每一次分别使用不同的acceleration级别,最后得到每一种acceleration级别下每秒的平均压缩速率。测试压缩性能的代码示例如下:

     1 #include 
     2 #include 
     3 #include 
     4 #include 
     5 #include 
     6 #include 
     7 
     8 using namespace std;
     9 
    10 int main()
    11 {
    12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
    13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
    14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
    15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
    16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
    17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
    18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
    19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
    20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
    21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
    22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
    23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
    24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
    25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
    26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
    27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
    28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
    29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
    30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
    31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
    32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
    33     It's only mud.";
    34 
    35     int cnt = 0;
    36     
    37     size_t com_size;
    38     size_t com_space_size;
    39     size_t peppa_pig_text_size;
    40 
    41     timeval st, et;
    42     char *com_ptr = NULL;
    43 
    44     peppa_pig_text_size = strlen(peppa_pig_buf);
    45     com_space_size = LZ4_compressBound(peppa_pig_text_size);
    46 
    47     int test_times = 6;
    48     int acceleration = 1;
    49     
    50     // compress performance test
    51     while(test_times >= 1) {
    52     
    53         gettimeofday(&st, NULL);
    54         while(1) {
    55         
    56             com_ptr = (char *)malloc(com_space_size);
    57             if(NULL == com_ptr) {
    58                 cout << "compress malloc failed" << endl;
    59                 return -1;
    60             }
    61             
    62             com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, acceleration);
    63             if(com_size <= 0) {
    64                 cout << "compress failed, error code:" << com_size << endl;
    65                 free(com_ptr);
    66                 return -1;
    67             }
    68             
    69             free(com_ptr);
    70         
    71             cnt++;
    72             gettimeofday(&et, NULL);
    73             if(et.tv_sec - st.tv_sec >= 10) {
    74                 break;
    75             }
    76         }
    77         
    78         cout << "acceleration:" << acceleration << ", compress per second:" << cnt/10 << " times" << endl;
    79 
    80         ++acceleration;
    81         --test_times;
    82     }
    83 
    84     return 0;
    85 }

    执行结果:

     

      结果可以总结为两点:一是acceleration为默认值1时,即LZ4_compress_default函数的默认值时,每秒的压缩性能在20W+;二是随着acceleration的递增,每秒的压缩性能也在递增,但是代价就是获得更低的压缩率。

      acceleration递增与压缩速率的关系如下图所示:

     图3

    三、LZ4解压性能探索

      接下来继续了解一下LZ4的解压性能。

      测试方法是先使用LZ4_compress_fastacceleration = 1压缩文本,再使用安全解压函数LZ4_decompress_safe,连续解压同一段文本并持续10秒,最后得到每秒的平均解压速率。测试解压性能的代码示例如下:

     1 #include 
     2 #include 
     3 #include 
     4 #include 
     5 #include 
     6 #include 
     7 
     8 using namespace std;
     9 
    10 int main()
    11 {
    12     char peppa_pig_buf[2048] = "Narrator: It is raining today. So, Peppa and George cannot \
    13     play outside.Peppa: Daddy, it's stopped raining. Can we go out to play?Daddy: Alright, \
    14     run along you two.Narrator: Peppa loves jumping in muddy puddles.Peppa: I love muddy puddles.\
    15     Mummy: Peppa. If you jumping in muddy puddles, you must wear your boots.Peppa: Sorry, Mummy.\
    16     Narrator: George likes to jump in muddy puddles, too.Peppa: George. If you jump in muddy \
    17     puddles, you must wear your boots.Narrator: Peppa likes to look after her little brother, \
    18     George.Peppa: George, let's find some more pud dles.Narrator: Peppa and George are having \
    19     a lot of fun. Peppa has found a lttle puddle. George hasfound a big puddle.Peppa: Look, \
    20     George. There's a really big puddle.Narrator: George wants to jump into the big puddle first.\
    21     Peppa: Stop, George. | must check if it's safe for you. Good. It is safe for you. \
    22     Sorry, George. It'sonly mud.Narrator: Peppa and George love jumping in muddy puddles.\
    23     Peppa: Come on, George. Let's go and show Daddy.Daddy: Goodness me.Peppa: Daddy. Daddy. \
    24     Guess what we' ve been doing.Daddy: Let me think... Have you been wa tching television?\
    25     Peppa: No. No. Daddy.Daddy: Have you just had a bath?Peppa: No. No.Daddy: | know. \
    26     You've been jumping in muddy puddles.Peppa: Yes. Yes. Daddy. We've been jumping in muddy \
    27     puddles.Daddy: Ho. Ho. And look at the mess you're in.Peppa: Oooh....Daddy: Oh, well, \
    28     it's only mud. Let's clean up quickly before Mummy sees the mess.Peppa: Daddy, \
    29     when we've cleaned up, will you and Mummy Come and play, too?Daddy: Yes, we can all play \
    30     in the garden.Narrator: Peppa and George are wearing their boots. Mummy and Daddy are \
    31     wearing their boots.Peppa loves jumping up and down in muddy puddles. Everyone loves jumping \
    32     up and down inmuddy puddles.Mummy: Oh, Daddy pig, look at the mess you're in. .Peppa: \
    33     It's only mud.";
    34 
    35     int cnt = 0;
    36     
    37     size_t com_size;
    38     size_t com_space_size;
    39     size_t peppa_pig_text_size;
    40 
    41     timeval st, et;
    42     char *com_ptr = NULL;
    43 
    44     // compress
    45     peppa_pig_text_size = strlen(peppa_pig_buf);
    46     com_space_size = LZ4_compressBound(peppa_pig_text_size);
    47 
    48     com_ptr = (char *)malloc(com_space_size);
    49     if(NULL == com_ptr) {
    50         cout << "compress malloc failed" << endl;
    51         return -1;
    52     }
    53 
    54     com_size = LZ4_compress_fast(peppa_pig_buf, com_ptr, peppa_pig_text_size, com_space_size, 1);
    55     if(com_size <= 0) {
    56         cout << "compress failed, error code:" << com_size << endl;
    57         free(com_ptr);
    58         return -1;
    59     }
    60 
    61     // decompress
    62     size_t decom_size;
    63     char* decom_ptr = NULL;
    64     
    65     // decompress performance test
    66     gettimeofday(&st, NULL);
    67     while(1) {
    68 
    69         decom_ptr = (char *)malloc((size_t)peppa_pig_text_size);
    70         if(NULL == decom_ptr) {
    71             cout << "decompress malloc failed" << endl;
    72             free(com_ptr);
    73             return -1;
    74         }
    75         
    76         decom_size = LZ4_decompress_safe(com_ptr, decom_ptr, com_size, peppa_pig_text_size);
    77         if(decom_size <= 0) {
    78             cout << "decompress failed, error code:" << decom_size << endl;
    79             free(com_ptr);
    80             free(decom_ptr);
    81             return -1;
    82         }
    83 
    84         free(decom_ptr);
    85 
    86         cnt++;
    87         gettimeofday(&et, NULL);
    88         if(et.tv_sec - st.tv_sec >= 10) {
    89             break;
    90         }
    91     }
    92 
    93     free(com_ptr);
    94     cout << "decompress per second:" << cnt/10 << " times" << endl;
    95     
    96     return 0;
    97 }

    执行结果:

       结果显示LZ4的解压性能大概在每秒54W次左右,解压速率还是非常可观。

    四、LZ4对比ZSTD

      使用相同的待压缩文本,分别使用ZSTD与LZ4进行压缩、解压、压缩性能、解压性能测试后有表1的数据。

    表1

      

      抛开算法的优劣对比,从实验结果来看,ZSTD更加侧重于压缩率,LZ4(acceleration = 1)更加侧重于压缩性能。

    五、总结

      无论任何算法,都很难做到既有高性能压缩的同时,又有特别高的压缩率。两者必须要做一个取舍,或者找到一个合适的平衡点。

      如果在性能可以接受的情况下,选择具有更高压缩率的ZSTD将更加节约存储空间(通过线程池进行多线程压缩可以进一步提升性能);如果对压缩率不是特别看中,追求更高的压缩性能,那LZ4也是一个不错的选择。

  • 相关阅读:
    Tableau 合集2:Table Extension通过python做词云图
    华为云国际站账号购买:WeLink安全服务,为企业数字化转型保驾护航
    混合IP-SDN网络实现统一智能管理目前需要解决的问题
    如何不加锁地将数据并发写入Apache Hudi?
    jar 命令启动java 指定配置文件路径 jar如何启动
    数仓数据同步策略
    python 对长页面进行截屏拼接成长图
    设计模式——行为型模式
    网络安全笔记 -- 文件操作(文件包含漏洞)
    Android面试题大全
  • 原文地址:https://blog.csdn.net/jh035512/article/details/128090609