只使用 HMS,将hive 看做是单独的 table format layer,比如 spark/flink on hive: Hive as a table storage layer: These users have direct access to HDFS and the metastore server (which provides an API for metadata access);
完整地使用 hive, 包括 hiveserver2+ hms, 比如 hive on mr/tez/spark: Hive as a SQL query engine: These users have all data/metadata access happening through HiveServer2. They don't have direct access to HDFS or the metastore.
第一种只使用 hms 的方式下,spark 等应用会直接访问 hdfs, 所以授权依赖于 hdfs 的 authentication 机制:HDFS access is authorized through the use of HDFS permissions. (Metadata access needs to be authorized using Hive configuration.)
Hive Old Default Authorization 是 hive 2.0 之前默认的 authorization model, 支持类似 RDBMS 中对 user/group/roles 赋予 database/table 各种权限的机制:authorization based on users, groups and roles and granting them permissions to do operations on database or table
但是 Hive Old Default Authorization 并不是一个完整的权限控制模型:leaving many security gaps unaddressed, for example, the permissions needed to grant privileges for a user are not defined, and any user can grant themselves access to a table or database.
hive 2.0 之后默认的 authorization model 已经被切换为 SQL standards based authorization mode (HIVE-12429);
2.2 HIVE授权详解 - Storage Based Authorization in the Metastore Server
HMS 提供了对 hive metastore db 中的元数据的访问,为保护这些元数据被各种 hms 客户端如 spark/presto/flink 错误地访问和修改,HMS 不能完全依赖这些客户端自身的认证和授权等安全机制,为此 Hive 0.10 通过 HIVE-3705 增加了对 HMS 的 authorization 能力,即 Storage Based Authorization;
Storage Based Authorization 底层依赖 hdfs permissions 作为 source of truth: it uses the file system permissions for folders corresponding to the different metadata objects as the source of truth for the authorization policy.
Storage Based Authorization 提供了客户端直接访问 HMS 时,对底层元数据的保护:To control metadata access on the metadata objects such as Databases, Tables and Partitions, it checks if you have permission on corresponding directories on the file system;
Storage Based Authorization 通过代理机制(hive.server2.enable.doAs =true),也可以提供对客户端通过 hiveserver2 访问HIVE数据时,对底层元数据和数据的保护:You can also protect access through HiveServer2 by ensuring that the queries run as the end user;
Storage Based Authorization 可以通过 hdfs acl 提供权限的灵活性:Through the use of HDF ACL, you have a lot of flexibility in controlling access to the file system, which in turn provides more flexibility with Storage Based Authorization;
Storage based authorization 因为其自身的机制原理,在使用上也有其局限性:
由于Storage based authorization 的底层原理是依赖用户对底层存储系统中数据的访问权限,且该用户在hiveserver2开启代理与不开启代理机制下身份不同,所以其主要用来在用户直接访问 hms 时,提供对底层元数据的保护;
对于用户使用 hiveserver2 的情况,需要限制 HiveServer2 中可以执行的操作,此时不能单纯依靠 Storage based authorization, 还需要配合 “SQL Standards Based Authorization” 或 “Authorization using Apache Ranger & Sentry,或者配置使用 FallbackHiveAuthorizer:
2.3 HIVE授权详解 - SQL Standards Based Authorization in HiveServer2
Storage Based Authorization 只能基于文件系统的目录/文件的权限管理机制提供database/table/partition 粒度的权限管控,为提供更细粒度的权限管控,比如行级别,列级别,视图级别的权限管控, Hive 0.13.0 通过 HIVE-5837 引入了SQL Standards Based Authorization;
SQL Standards Based Authorization,其作用域是 HiveServer2,可以跟/需要跟 hms 的 storage based authorization 结合使用,以提供对HIVE元数据和数据的全面的安全管控;
由于 SQL Standards Based Authorization 的作用域是 HiveServer2,所以对于直接访问底层hdfs数据的用户,比如 spark on hive/Hive CLI/hadoop jar命令等,需要依赖 hms 的 storage based authorization 来进行安全管控;
SQL Standards Based Authorization,对 hiveserver2 中能提交的命令,做了限制;
那么这些底层基础设施 hdfs/yarn 进行权限校验时,是针对 hive 系统用户进行校验(hiveserver2 这个服务的系统用户一般是linux操作系统上的用户 hive),还是针对终端业务用户比如 hundsun进行校验呢?这点可以通过参数 hive.server2.enable.doAs进行控制(老版本参数是hive.server2.enable.impersonation):hive.server2.enable.doAs=false/TRUE:“Setting this property to true will have HiveServer2 execute Hive operations as the user making the calls to it.”;
hive.metastore.pre.event.listeners: The pre-event listener classes to be loaded on the metastore side to run code whenever databases, tables, and partitions are created, altered, or dropped. Set this configuration property to org.apache.hadoop.hive.ql.security.authorization.AuthorizationPreEventListener in hive-site.xml to turn on Hive metastore-side security;
hive.security.metastore.authorization.manager: This tells Hive which metastore-side authorization provider to use. Defaults to org.apache.hadoop.hive.ql.security.authorization.StorageBasedAuthorizationProvider .
hive.security.metastore.authenticator.manager:The authenticator manager class name to be used in the metastore for authentication, defaults to org.apache.hadoop.hive.ql.security.HadoopDefaultMetastoreAuthenticator