Impala主要有三个组件,分别是statestore,catalog和impalad,对于Impalad节点,每一个节点都可 以接收客户端的查询请求,并且对于连接到该Impalad的查询还要作为Coordinator节点(需要消耗一定的内存和CPU)存在,为了保证每一个节点的资源开销的平衡需要对于集群中的Impalad节点做一下负载均衡:
HAProxy方案
(1)安装haproxy
yum install haproxy -y
(2)配置文件
vim /etc/haproxy/haproxy.cfg
(3)具体配置内容
- #---------------------------------------------------------------------
- # Example configuration for a possible web application. See the
- # full configuration options online.
- #
- # http://haproxy.1wt.eu/download/1.4/doc/configuration.txt
- #
- #---------------------------------------------------------------------
-
- #---------------------------------------------------------------------
- # Global settings
- #---------------------------------------------------------------------
- global
- log 127.0.0.1 local2
- chroot /var/lib/haproxy
- pidfile /var/run/haproxy.pid
- maxconn 4000
- user haproxy
- group haproxy
- daemon
-
- # turn on stats unix socket
- stats socket /var/lib/haproxy/stats
-
- #---------------------------------------------------------------------
- # common defaults that all the 'listen' and 'backend' sections will
- # use if not designated in their block
- #---------------------------------------------------------------------
- defaults
- mode http#mode { tcp|http|health },tcp 表示4层,http表示7层,health仅作为健康检查使⽤
- log global
- option httplog
- option dontlognull
- #option http-server-close
- #option forwardfor except 127.0.0.0/8
- #option abortonclose##连接数过⼤⾃动关闭
- option redispatch#如果失效则强制转换其他服务器
- retries 3#尝试3次失败则从集群摘除
- timeout http-request 10s
- timeout queue 1m
- #timeout connect 10s
- #timeout client 1m
- #timeout server 1m
- timeout connect 1d #连接超时时间,重要,hive查询数据能返回结果的保证
- timeout client 1d #同上
- timeout server 1d #同上
- timeout http-keep-alive 10s
- timeout check 10s #健康检查时间
- maxconn 3000 #最⼤连接数
-
- listen status #定义管理界⾯
- bind 0.0.0.0:1080 #管理界⾯访问IP和端⼝
- mode http #管理界⾯所使⽤的协议
- option httplog
- maxconn 5000 #最⼤连接数
- stats refresh 30s #30秒⾃动刷新
- stats uri /stats
-
- listen impalashell
- bind 0.0.0.0:25003 #ha作为proxy所绑定的IP和端⼝
- mode tcp #以4层⽅式代理,重要
- option tcplog
- balance roundrobin #调度算法 'leastconn' 最少连接数分配,或者 'roundrobin',轮询分
- server impalashell_1 linux121:21000 check
- server impalashell_2 linux122:21000 check
- server impalashell_3 linux123:21000 check
-
- listen impalajdbc
- bind 0.0.0.0:25004 #ha作为proxy所绑定的IP和端⼝
- mode tcp #以4层⽅式代理,重要
- option tcplog
- balance roundrobin #调度算法 'leastconn' 最少连接数分配,或者 'roundrobin',轮询分
- server impalajdbc_1 linux121:21050 check
- server impalajdbc_2 linux122:21050 check
- server impalajdbc_3 linux122:21050 check
-
- #---------------------------------------------------------------------
- # main frontend which proxys to the backends
- #---------------------------------------------------------------------
- frontend main *:5000
- acl url_static path_beg -i /static /images /javascript /stylesheets
- acl url_static path_end -i .jpg .gif .png .css .js
- use_backend static if url_static
- default_backend app
-
- #---------------------------------------------------------------------
- # static backend for serving up images, stylesheets and such
- #---------------------------------------------------------------------
- backend static
- balance roundrobin
- server static 127.0.0.1:4331 check
-
- #---------------------------------------------------------------------
- # round robin balancing between the various backends
- #---------------------------------------------------------------------
- backend app
- balance roundrobin
- server app1 127.0.0.1:5001 check
- server app2 127.0.0.1:5002 check
- server app3 127.0.0.1:5003 check
- server app4 127.0.0.1:5004 check
(4)启动
开启: service haproxy start
关闭: service haproxy stop
重启: service haproxy restart
(5)使用
Impala-shell访问方式
impala-shell -i linux123:25003
使用起来十分方便,区别仅仅相当于是修改了一个ip地址和端口而已,其余不变。
jdbc:hive2://linux123:25004/default;auth=noSasl
Impala集群在操作过程中尽量多给内存,如果内存不能满足使用要求,Impala的执行很可能会报错!!