• Hadoop(3.3.1): Capacity Scheduler:通过设置资源队列来满足不同业务之间的资源隔离、队列的弹性以及队列权限


    通过设置yarn的资源队列,可以实现不同业务的资源隔离,同时设置队列的弹性范围,以便在某个队列资源紧张时,可以使用其他队列的资源。

    官网:hadoop CapacityScheduler

    一. 先看下官网(可略)

    1. Overview

    我们先对容量调度器有一个认识:即它适合多租户的业务场景,简单的说可以规划不同的业务使用不同的队列资源。

    The CapacityScheduler is designed to run Hadoop applications as a shared, multi-tenant cluster in an operator-friendly manner while maximizing the throughput and the utilization of the cluster.

    2. Configuration

    2.1. Setting up ResourceManager to use CapacityScheduler

    在yarn-site.xml文件中设置:

    PropertyValue
    yarn.resourcemanager.scheduler.classorg.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler

    2.2. Setting capacity-scheduler.xml

    etc/hadoop/capacity-scheduler.xml is the configuration file for the CapacityScheduler.

    设置:capacity-scheduler.xml

    1. setting up Queue

    我们接下来设置的所有队列都属于root队列的子集。通过逗号分隔来设置一个队列下的子队列。

    The CapacityScheduler has a predefined queue called root. All queues in the system are children of the root queue.

    Further queues can be setup by configuring yarn.scheduler.capacity.root.queues with a list of comma-separated child queues.

    queue-path的概念:通过queue path可以制定一个队列,一个完整的queue path:从root开头, . 来说明队列继承关系。

    yarn.scheduler.capacity..queues
    The configuration for CapacityScheduler uses a concept called queue path to configure the hierarchy of queues. The queue path is the full path of the queue’s hierarchy, starting at root, with . (dot) as the delimiter.

    如下:

    <property>
      <name>yarn.scheduler.capacity.root.queues</name>
      <value>a,b,c</value>
      <description>The queues at the this level (root is the root queue).
      </description>
    </property>
    
    <property>
      <name>yarn.scheduler.capacity.root.a.queues</name>
      <value>a1,a2</value>
      <description>The queues at the this level (root is the root queue).
      </description>
    </property>
    
    <property>
      <name>yarn.scheduler.capacity.root.b.queues</name>
      <value>b1,b2,b3</value>
      <description>The queues at the this level (root is the root queue).
      </description>
    </property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20

    2. Queue Properties

    	Resource Allocation
    	Resource Allocation using Absolute Resources configuration
    	Running and Pending Application Limits
    	Queue Administration & Permissions
    	Queue Mapping based on User or Group, Application Name or user defined placement rules
    	Queue lifetime for applications
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6

    3. application priority
    Application priority works only along with FIFO ordering policy. Default ordering policy is FIFO.

    4. Capacity Scheduler container preemption
    Capacity Scheduler 允许 container 分配多于其所在的队列资源

    5. Reservation Properties

    6. Configuring ReservationSystem with CapacityScheduler

    7. Dynamic Auto-Creation and Management of Leaf Queues
    CapacityScheduler支持通过queue mapping自动创建父队列下的子队列。

    8. Other Properties

     

    3. Changing Queue Configuration

    This behavior can be changed via yarn.scheduler.configuration.store.class in yarn-site.xml. Possible values are file, which allows modifying properties via file; memory, which allows modifying properties via API, but does not persist changes across restart; leveldb, which allows modifying properties via API and stores changes in leveldb backing store; and zk, which allows modifying properties via API and stores changes in zookeeper backing store. The default value is file.

    两种方式去设置队列,通过API或者文件,鉴于重启会导致API修改的队列配置失效(但可以通过zk持久化),本文通过文件来配置队列

    1. 编辑capacity-scheduler.xml 和 yarn-site.xml
    2. 执行yarn rmadmin -refreshQueues 可以使得队列配置生效。

    4. Updating a Container (Experimental - API may change in the future)

    期待一下

     
     

    二. 动手设置队列

    1. 设置容量调度器

    修改 yarn-site.xml

    <!-- 使用容量调度器 -->
    <property>
      <name>yarn.resourcemanager.scheduler.class</name>
      <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
    </property>
    
    • 1
    • 2
    • 3
    • 4
    • 5

     

    2. 设置capacity-scheduler.xml

    2.1. 设置队列资源

    1. 设置子队列:可以将整体资源分配成三个队列,default、online、offline,
    2. 设置队列资源:比如分别占用20%、30%、50%的资源。总量(必须是)100%。
    3. 设置弹性队列:例如 online队列默认分配30%,最大为50%的集群资源,当其他队列资源空闲时可以使用集群中资源的50%。

    在这里插入图片描述

    [root@bigdata01 hadoop]# vi capacity-scheduler.xml
      <property>
        <name>yarn.scheduler.capacity.root.queues</name>
        <value>default,test,test1</value>
        <description>队列列表,多个队列之间使用逗号分割</description>
      </property>
      <property>
        <name>yarn.scheduler.capacity.root.default.capacity</name>
        <value>20</value>
        <description>default队列20%</description>
      </property>
      <property>
        <name>yarn.scheduler.capacity.root.online.capacity</name>
        <value>30</value>
        <description>online队列30%</description>
      </property>
      <property>
        <name>yarn.scheduler.capacity.root.offline.capacity</name>
        <value>50</value>
        <description>offline队列50%</description>
      </property>
      <!-- 设置弹性队列 资源上xian--->
      <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacity</name>
        <value>40</value>
        <description>Default队列可使用的资源上限.</description>
      </property>
      <property>
        <name>yarn.scheduler.capacity.root.online.maximum-capacity</name>
        <value>50</value>
        <description>online队列可使用的资源上限.</description>
      </property>
      <property>
        <name>yarn.scheduler.capacity.root.offline.maximum-capacity</name>
        <value>60</value>
        <description>offline队列可使用的资源上限.</description>
      </property>
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38

     

    2.2. 统一权限控制

    队列分配资源后,对权限有严格的控制,队列只允许有权限用户的提交任务和管理任务.
    权限控制分 提交权限和控制权限:

    • 提交权限:拥有权限才能提交任务到该队列中;
    • 控制权限:拥有权限才能kill 任务;

    提交权限

     <!--  配置三个队列-->
       <property>
            <name>yarn.scheduler.capacity.root.queues</name>
            <value>default,online,offline</value>
            <!-- 3个队列-->
            <description>The queues at the this level (root is the root queue).</description>
        </property>
    
      <property>
          <name>yarn.scheduler.capacity.root.acl_submit_applications</name>
          <value> </value> #空格表示任何人都无法往root队列提交作业
      </property>
     #queue-name=root.default
     <property>
       <name>yarn.scheduler.capacity.root.default.acl_submit_applications</name>
          <value>test,b1</value> #testqueue只允许test用户和b1用户提交作业
      </property>
       <property>
       <name>yarn.scheduler.capacity.root.online.acl_submit_applications</name>
          <value>test</value> #online只允许test用户提交作业
      </property>
       <property>
       <name>yarn.scheduler.capacity.root.offlinea.acl_submit_applications</name>
          <value>b1</value> #offline只允许b1用户提交作业
      </property>
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25

    控制权限:

    #queue-name=root
      <property>
          <name>yarn.scheduler.capacity.root.acl_administer_queue</name>
          <value> </value> <!-- ACL继承性,父队列需控制权限-->
      </property>
     #queue-name=root.default
     <property>
       <name>yarn.scheduler.capacity.root.default.acl_administer_queue</name>
          <value>test,a1</value> #default队列的任务只允许test用户和a1用户停止
      </property> 
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10

     

    3. 执行生效

    `yarn rmadmin -refreshQueues` 
    
    • 1

     

    完整配置示例

    
    <configuration>
    
      <property>
        <name>yarn.scheduler.capacity.maximum-applicationsname>
        <value>10000value>
        <description>
          Maximum number of applications that can be pending and running.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.maximum-am-resource-percentname>
        <value>0.1value>
        <description>
          Maximum percent of resources in the cluster which can be used to run 
          application masters i.e. controls number of concurrent running
          applications.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.resource-calculatorname>
        <value>org.apache.hadoop.yarn.util.resource.DefaultResourceCalculatorvalue>
        <description>
          The ResourceCalculator implementation to be used to compare 
          Resources in the scheduler.
          The default i.e. DefaultResourceCalculator only uses Memory while
          DominantResourceCalculator uses dominant-resource to compare 
          multi-dimensional resources such as Memory, CPU etc.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.root.queuesname>
        <value>default,test1,test2value>
        <description>
          The queues at the this level (root is the root queue).
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.root.default.capacityname>
        <value>30value>
        <description>Default queue target capacity.description>
      property>
    <property>
        <name>yarn.scheduler.capacity.root.test1.capacityname>
        <value>30value>
        <description>test1 queue target capacity.description>
      property>
      <property>
        <name>yarn.scheduler.capacity.root.test2.capacityname>
        <value>40value>
        <description>test1 queue target capacity.description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.default.user-limit-factorname>
        <value>1value>
        <description>
          Default queue user limit a percentage from 0.0 to 1.0.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test1.user-limit-factorname>
        <value>1value>
        <description>
          Default queue user limit a percentage from 0.0 to 1.0.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test2.user-limit-factorname>
        <value>1value>
        <description>
          Default queue user limit a percentage from 0.0 to 1.0.
        description>
      property>
      
    
      <property>
        <name>yarn.scheduler.capacity.root.default.maximum-capacityname>
        <value>70value>
        <description>
          The maximum capacity of the default queue. 
        description>
      property>
      <property>
        <name>yarn.scheduler.capacity.root.test1.maximum-capacityname>
        <value>70value>
        <description>
          The maximum capacity of the default queue. 
        description>
      property>
      <property>
        <name>yarn.scheduler.capacity.root.test2.maximum-capacityname>
        <value>70value>
        <description>
          The maximum capacity of the default queue. 
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.root.default.statename>
        <value>RUNNINGvalue>
        <description>
          The state of the default queue. State can be one of RUNNING or STOPPED.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test1.statename>
        <value>RUNNINGvalue>
        <description>
          The state of the default queue. State can be one of RUNNING or STOPPED.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test2.statename>
        <value>RUNNINGvalue>
        <description>
          The state of the default queue. State can be one of RUNNING or STOPPED.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.root.default.acl_submit_applicationsname>
        <value>*value>
        <description>
          The ACL of who can submit jobs to the default queue.
        description>
      property>
       <property>
        <name>yarn.scheduler.capacity.root.test1.acl_submit_applicationsname>
        <value>*value>
        <description>
          The ACL of who can submit jobs to the default queue.
        description>
      property>
       <property>
        <name>yarn.scheduler.capacity.root.test2.acl_submit_applicationsname>
        <value>*value>
        <description>
          The ACL of who can submit jobs to the default queue.
        description>
      property>
      
      
      
      
    
      <property>
        <name>yarn.scheduler.capacity.root.default.acl_administer_queuename>
        <value>*value>
        <description>
          The ACL of who can administer jobs on the default queue.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test1.acl_administer_queuename>
        <value>*value>
        <description>
          The ACL of who can administer jobs on the default queue.
        description>
      property>
      
      <property>
        <name>yarn.scheduler.capacity.root.test2.acl_administer_queuename>
        <value>*value>
        <description>
          The ACL of who can administer jobs on the default queue.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.root.default.acl_application_max_priorityname>
        <value>*value>
        <description>
          The ACL of who can submit applications with configured priority.
          For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
        description>
      property>
      <property>
        <name>yarn.scheduler.capacity.root.test1.acl_application_max_priorityname>
        <value>*value>
        <description>
          The ACL of who can submit applications with configured priority.
          For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
        description>
      property>
      <property>
        <name>yarn.scheduler.capacity.root.test2.acl_application_max_priorityname>
        <value>*value>
        <description>
          The ACL of who can submit applications with configured priority.
          For e.g, [user={name} group={name} max_priority={priority} default_priority={priority}]
        description>
      property>
      
    
       <property>
         <name>yarn.scheduler.capacity.root.default.maximum-application-lifetime
         name>
         <value>-1value>
         <description>
            Maximum lifetime of an application which is submitted to a queue
            in seconds. Any value less than or equal to zero will be considered as
            disabled.
            This will be a hard time limit for all applications in this
            queue. If positive value is configured then any application submitted
            to this queue will be killed after exceeds the configured lifetime.
            User can also specify lifetime per application basis in
            application submission context. But user lifetime will be
            overridden if it exceeds queue maximum lifetime. It is point-in-time
            configuration.
            Note : Configuring too low value will result in killing application
            sooner. This feature is applicable only for leaf queue.
         description>
       property>
    
       <property>
         <name>yarn.scheduler.capacity.root.default.default-application-lifetime
         name>
         <value>-1value>
         <description>
            Default lifetime of an application which is submitted to a queue
            in seconds. Any value less than or equal to zero will be considered as
            disabled.
            If the user has not submitted application with lifetime value then this
            value will be taken. It is point-in-time configuration.
            Note : Default lifetime can't exceed maximum lifetime. This feature is
            applicable only for leaf queue.
         description>
       property>
    
      <property>
        <name>yarn.scheduler.capacity.node-locality-delayname>
        <value>40value>
        <description>
          Number of missed scheduling opportunities after which the CapacityScheduler 
          attempts to schedule rack-local containers.
          When setting this parameter, the size of the cluster should be taken into account.
          We use 40 as the default value, which is approximately the number of nodes in one rack.
          Note, if this value is -1, the locality constraint in the container request
          will be ignored, which disables the delay scheduling.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.rack-locality-additional-delayname>
        <value>-1value>
        <description>
          Number of additional missed scheduling opportunities over the node-locality-delay
          ones, after which the CapacityScheduler attempts to schedule off-switch containers,
          instead of rack-local ones.
          Example: with node-locality-delay=40 and rack-locality-delay=20, the scheduler will
          attempt rack-local assignments after 40 missed opportunities, and off-switch assignments
          after 40+20=60 missed opportunities.
          When setting this parameter, the size of the cluster should be taken into account.
          We use -1 as the default value, which disables this feature. In this case, the number
          of missed opportunities for assigning off-switch containers is calculated based on
          the number of containers and unique locations specified in the resource request,
          as well as the size of the cluster.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.queue-mappingsname>
        <value>value>
        <description>
          A list of mappings that will be used to assign jobs to queues
          The syntax for this list is [u|g]:[name]:[queue_name][,next mapping]*
          Typically this list will be used to map users to queues,
          for example, u:%user:%user maps all users to queues with the same name
          as the user.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.queue-mappings-override.enablename>
        <value>falsevalue>
        <description>
          If a queue mapping is present, will it override the value specified
          by the user? This can be used by administrators to place jobs in queues
          that are different than the one specified by the user.
          The default is false.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.per-node-heartbeat.maximum-offswitch-assignmentsname>
        <value>1value>
        <description>
          Controls the number of OFF_SWITCH assignments allowed
          during a node's heartbeat. Increasing this value can improve
          scheduling rate for OFF_SWITCH containers. Lower values reduce
          "clumping" of applications on particular nodes. The default is 1.
          Legal values are 1-MAX_INT. This config is refreshable.
        description>
      property>
    
    
      <property>
        <name>yarn.scheduler.capacity.application.fail-fastname>
        <value>falsevalue>
        <description>
          Whether RM should fail during recovery if previous applications'
          queue is no longer valid.
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.workflow-priority-mappingsname>
        <value>value>
        <description>
          A list of mappings that will be used to override application priority.
          The syntax for this list is
          [workflowId]:[full_queue_name]:[priority][,next mapping]*
          where an application submitted (or mapped to) queue "full_queue_name"
          and workflowId "workflowId" (as specified in application submission
          context) will be given priority "priority".
        description>
      property>
    
      <property>
        <name>yarn.scheduler.capacity.workflow-priority-mappings-override.enablename>
        <value>falsevalue>
        <description>
          If a priority mapping is present, will it override the value specified
          by the user? This can be used by administrators to give applications a
          priority that is different than the one specified by the user.
          The default is false.
        description>
      property>
    
    configuration>
    
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15
    • 16
    • 17
    • 18
    • 19
    • 20
    • 21
    • 22
    • 23
    • 24
    • 25
    • 26
    • 27
    • 28
    • 29
    • 30
    • 31
    • 32
    • 33
    • 34
    • 35
    • 36
    • 37
    • 38
    • 39
    • 40
    • 41
    • 42
    • 43
    • 44
    • 45
    • 46
    • 47
    • 48
    • 49
    • 50
    • 51
    • 52
    • 53
    • 54
    • 55
    • 56
    • 57
    • 58
    • 59
    • 60
    • 61
    • 62
    • 63
    • 64
    • 65
    • 66
    • 67
    • 68
    • 69
    • 70
    • 71
    • 72
    • 73
    • 74
    • 75
    • 76
    • 77
    • 78
    • 79
    • 80
    • 81
    • 82
    • 83
    • 84
    • 85
    • 86
    • 87
    • 88
    • 89
    • 90
    • 91
    • 92
    • 93
    • 94
    • 95
    • 96
    • 97
    • 98
    • 99
    • 100
    • 101
    • 102
    • 103
    • 104
    • 105
    • 106
    • 107
    • 108
    • 109
    • 110
    • 111
    • 112
    • 113
    • 114
    • 115
    • 116
    • 117
    • 118
    • 119
    • 120
    • 121
    • 122
    • 123
    • 124
    • 125
    • 126
    • 127
    • 128
    • 129
    • 130
    • 131
    • 132
    • 133
    • 134
    • 135
    • 136
    • 137
    • 138
    • 139
    • 140
    • 141
    • 142
    • 143
    • 144
    • 145
    • 146
    • 147
    • 148
    • 149
    • 150
    • 151
    • 152
    • 153
    • 154
    • 155
    • 156
    • 157
    • 158
    • 159
    • 160
    • 161
    • 162
    • 163
    • 164
    • 165
    • 166
    • 167
    • 168
    • 169
    • 170
    • 171
    • 172
    • 173
    • 174
    • 175
    • 176
    • 177
    • 178
    • 179
    • 180
    • 181
    • 182
    • 183
    • 184
    • 185
    • 186
    • 187
    • 188
    • 189
    • 190
    • 191
    • 192
    • 193
    • 194
    • 195
    • 196
    • 197
    • 198
    • 199
    • 200
    • 201
    • 202
    • 203
    • 204
    • 205
    • 206
    • 207
    • 208
    • 209
    • 210
    • 211
    • 212
    • 213
    • 214
    • 215
    • 216
    • 217
    • 218
    • 219
    • 220
    • 221
    • 222
    • 223
    • 224
    • 225
    • 226
    • 227
    • 228
    • 229
    • 230
    • 231
    • 232
    • 233
    • 234
    • 235
    • 236
    • 237
    • 238
    • 239
    • 240
    • 241
    • 242
    • 243
    • 244
    • 245
    • 246
    • 247
    • 248
    • 249
    • 250
    • 251
    • 252
    • 253
    • 254
    • 255
    • 256
    • 257
    • 258
    • 259
    • 260
    • 261
    • 262
    • 263
    • 264
    • 265
    • 266
    • 267
    • 268
    • 269
    • 270
    • 271
    • 272
    • 273
    • 274
    • 275
    • 276
    • 277
    • 278
    • 279
    • 280
    • 281
    • 282
    • 283
    • 284
    • 285
    • 286
    • 287
    • 288
    • 289
    • 290
    • 291
    • 292
    • 293
    • 294
    • 295
    • 296
    • 297
    • 298
    • 299
    • 300
    • 301
    • 302
    • 303
    • 304
    • 305
    • 306
    • 307
    • 308
    • 309
    • 310
    • 311
    • 312
    • 313
    • 314
    • 315
    • 316
    • 317
    • 318
    • 319
    • 320
    • 321
    • 322
    • 323
    • 324
    • 325
    • 326
    • 327
    • 328
    • 329
    • 330
    • 331
    • 332
    • 333
    • 334
    • 335
    • 336
    • 337
    • 338
    • 339
    • 340
    • 341
    • 342
    • 343
    • 344
    • 345
    • 346
    • 347
    • 348
    • 349
    • 350
    • 351
    • 352
    • 353
  • 相关阅读:
    数据链路层
    JavaScript Web APIs第一天笔记
    Nginx配置信息
    租用服务器可以干什么呢?
    springboot+大学生就业规划系统 毕业设计-附源码191451
    Shell及Linux三剑客grep、sed、awk
    【mq】从零开始实现 mq-05-实现优雅停机
    ES filter查询 高亮查询 聚合查询
    2022-10-27 C++并发编程( 三十八 )
    一眼万年,4款逆天好用的宝藏软件,内存爆满也不舍得卸载
  • 原文地址:https://blog.csdn.net/hiliang521/article/details/126501634