hive acid及事务表踩坑学习实录

表类型	ACID	文件格式	插入	更新/删除
托管表：CRUD事务	是	ORC	是	是
托管表：仅插入式事务	是	任意格式	是	没有
托管表：临时	没有	任意格式	是	没有
外部表	没有	任意格式	是	没有

三、事务表的适用场景

事务表适用于数仓中的行级数据更新删除需求比较频繁的表

如果事务表太多，并且存在大量的更新操作，metastore后台启动的合并线程会定期的提交MapReduce Job，也会一定程度上增重集群的负担

注意事项

1.不支持 BEGIN、COMMIT、ROLLBACK 等语句，所有的语句都是自动提交

2.仅支持ORC格式

四、开启事务的配置

客户端方面的修改
会话中设置或者修改HiveServer2的配置文件：

hive.support.concurrency – true
hive.enforce.bucketing – true (Hive 2.0之后就不用专门设置了)
hive.exec.dynamic.partition.mode – nonstrict
hive.txn.manager – org.apache.hadoop.hive.ql.lockmgr.DbTxnManager

服务端方面
主要修改MetaStore上的配置

hive.compactor.initiator.on – true
hive.compactor.worker.threads – 压缩任务的数量

五、参数详解

Configuration key	Values	Location	Notes
hive.txn.manager	Default: org.apache.hadoop.hive.ql.lockmgr.DummyTxnManager Value required for transactions: org.apache.hadoop.hive.ql.lockmgr.DbTxnManager	Client/ HiveServer2	DummyTxnManager replicates pre Hive-0.13 behavior and provides no transactions.
hive.txn.strict.locking.mode	Default: true	Client/ HiveServer2	In strict mode non-ACID resources use standard R/W lock semantics, e.g. INSERT will acquire exclusive lock. In non-strict mode, for non-ACID resources, INSERT will only acquire shared lock, which allows two concurrent writes to the same partition but still lets lock manager prevent DROP TABLE etc. when the table is being written to (as of Hive 2.2.0).
hive.txn.timeout	Default: 300	Client/ HiveServer2/ Metastore	Time after which transactions are declared aborted if the client has not sent a heartbeat, in seconds. It's critical that this property has the same value for all components/services.5
hive.txn.heartbeat.threadpool.size	Default: 5	Client/ HiveServer2	The number of threads to use for heartbeating (as of Hive 1.3.0 and 2.0.0).
hive.timedout.txn.reaper.start	Default: 100s	Metastore	Time delay of first reaper (the process which aborts timed-out transactions) run after the metastore starts (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.
hive.timedout.txn.reaper.interval	Default: 180s	Metastore	Time interval describing how often the reaper (the process which aborts timed-out transactions) runs (as of Hive 1.3.0). Controls AcidHouseKeeperServcie above.
hive.txn.max.open.batch	Default: 1000	Client	Maximum number of transactions that can be fetched in one call to open_txns().1
hive.max.open.txns	Default: 100000	HiveServer2/ Metastore	Maximum number of open transactions. If current open transactions reach this limit, future open transaction requests will be rejected, until the number goes below the limit. (As of Hive 1.3.0 and 2.1.0.)
hive.count.open.txns.interval	Default: 1s	HiveServer2/ Metastore	Time in seconds between checks to count open transactions (as of Hive 1.3.0 and 2.1.0).
hive.txn.retryable.sqlex.regex	Default: "" (empty string)	HiveServer2/ Metastore	Comma separated list of regular expression patterns for SQL state, error code, and error message of retryable SQLExceptions, that's suitable for the Hive metastore database (as of Hive 1.3.0 and 2.1.0). For an example, see Configuration Properties.
hive.compactor.initiator.on	Default: false Value required for transactions: true (for exactly one instance of the Thrift metastore service)	Metastore	Whether to run the initiator and cleaner threads on this metastore instance. Prior to Hive 1.3.0 it's critical that this is enabled on exactly one standalone metastore service instance (not enforced yet). As of Hive 1.3.0 this property may be enabled on any number of standalone metastore instances.
hive.compactor.worker.threads	Default: 0 Value required for transactions: > 0 on at least one instance of the Thrift metastore service	Metastore	How many compactor worker threads to run on this metastore instance.2
hive.compactor.worker.timeout	Default: 86400	Metastore	Time in seconds after which a compaction job will be declared failed and the compaction re-queued.
hive.compactor.cleaner.run.interval	Default: 5000	Metastore	Time in milliseconds between runs of the cleaner thread. (Hive 0.14.0 and later.)
hive.compactor.check.interval	Default: 300	Metastore	Time in seconds between checks to see if any tables or partitions need to be compacted.3
hive.compactor.delta.num.threshold	Default: 10	Metastore	Number of delta directories in a table or partition that will trigger a minor compaction.
hive.compactor.delta.pct.threshold	Default: 0.1	Metastore	Percentage (fractional) size of the delta files relative to the base that will trigger a major compaction. 1 = 100%, so the default 0.1 = 10%.
hive.compactor.abortedtxn.threshold	Default: 1000	Metastore	Number of aborted transactions involving a given table or partition that will trigger a major compaction.
hive.compactor.aborted.txn.time.threshold	Default: 12h	Metastore	Age of table/partition's oldest aborted transaction when compaction will be triggered. Default time unit is: hours. Set to a negative number to disable.
hive.compactor.max.num.delta	Default: 500	Metastore	Maximum number of delta files that the compactor will attempt to handle in a single job (as of Hive 1.3.0).4
hive.compactor.job.queue	Default: "" (empty string)	Metastore	Used to specify name of Hadoop queue to which Compaction jobs will be submitted. Set to empty string to let Hadoop choose the queue (as of Hive 1.3.0).
Compaction History
hive.compactor.history.retention.succeeded	Default: 3	Metastore	Number of successful compaction entries to retain in history (per partition).
hive.compactor.history.retention.failed	Default: 3	Metastore	Number of failed compaction entries to retain in history (per partition).
hive.compactor.history.retention.attempted	Default: 2	Metastore	Number of attempted compaction entries to retain in history (per partition).
hive.compactor.initiator.failed.compacts.threshold	Default: 2	Metastore	Number of of consecutive failed compactions for a given partition after which the Initiator will stop attempting to schedule compactions automatically. It is still possible to use ALTER TABLE to initiate compaction. Once a manually initiated compaction succeeds auto initiated compactions will resume. Note that this must be less than hive.compactor.history.retention.failed.
hive.compactor.history.reaper.interval	Default: 2m	Metastore	Controls how often the process to purge historical record of compactions runs.

相关阅读:
Leetcode 116. Populating Next Right Pointers in Each Node (BFS 题)
【二叉树的最近公共祖先】【后序遍历】Leetcode 236. 二叉树的最近公共祖先
 springboot中注解介绍
 巨好看的登录注册界面源码
 ubuntu22.04备份系统的完整操作过程
 万字长文：从计算机本源深入探寻volatile和Java内存模型
 idea 使用技巧
 redis的实际使用
 时间序列-ARIMA
JDK动态代理为什么必须要基于接口？
原文地址：https://blog.csdn.net/chimchim66/article/details/128079185