根据查看spark sql源码(3.1.3)的源码,找到hive表输出文件压缩格式的设定方式:
文件输出格式 | 表属性 |
---|---|
text | compression |
csv | compression > codec |
json | compression |
parquet | compression > parquet.compression |
orc | compression > orc.compress |
文件输出格式 | 配置项 |
---|---|
orc | spark.sql.orc.compression.codec 可用值:"none", "uncompressed", "snappy", "zlib", "lzo" |
parquet | spark.sql.parquet.compression.codec 可用值:"none", "uncompressed", "snappy", "gzip", "lzo", "lz4", "brotli", "zstd" |
orc,parquet以外 | hive.exec.compress.output 可用值:"true","false" 可用值:"RECORD","BLOCK","NONE" |
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.SaveAsHiveFile
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.HiveOptions
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.parquet.ParquetOptions
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.orc.OrcOptions
spark sql源码 (3.1.3):org.apache.spark.sql.hive.execution.SaveAsHiveFile
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.text.TextFileFormat
spark sql源码 (3.1.3):org.apache.spark.sql.execution.datasources.text.TextOptions
spark sql源码 (3.1.3):org.apache.spark.sql.catalyst.util.CompressionCodecs