• vCenter下集群DRS故障排查恢复


            在vCenter环境中,经常发现vCLS创建失败后反复创建且提示“用户取消了任务” 这样的任务,如下图所示,这样的任务既影响环境中其他日志的查看,更多的可以反映出环境存在很严重问题。

            我们知道在vSphere7.0U1及之后版本,集群DRS功能都要依赖于vCLS的可用性,此时需要深入地排查一番。

    1、查看vCenter集群告警

            当虚拟机出现问题,诸如上电失败提示“虚拟机以固定到主机”时,先查看该集群的告警,导航到集群-监控-问题与警报-所有问题下查看,发现上报了关于DRS功能异常由于vCLS功能失效类似的告警,这时就完全确认是vCLS出问题。

    2、登录vCenter检查vmware-vsan-health服务

    首先ssh root登录成功后,输入shell进入后台:

    检查服务状态:

    service-control --status vmware-vsan-health

    此时发现服务确实是stoped状态,

    root@vcsa [ /storage/log/vmware ]# service-control --status vmware-vsan-health

    Stopped:

     vmware-vsan-health

    通过以下命令启动:

    service-control --start vmware-vsan-health

    此时出现报错,回显如下:

    --- Logging error ---

    Traceback (most recent call last):

      File "/usr/lib/python3.7/logging/__init__.py", line 1029, in emit

        self.flush()

      File "/usr/lib/python3.7/logging/__init__.py", line 1009, in flush

        self.stream.flush()

    OSError: [Errno 28] No space left on device

    Call stack:

      File "/usr/bin/service-control", line 181, in

        logging.info('********** Start %s **********', sys.argv[1:])

    Message: '********** Start %s **********'

    Arguments: (['--status', 'vmware-vsan-health'],)

    --- Logging error ---

    Traceback (most recent call last):

      File "/usr/lib/python3.7/logging/__init__.py", line 1029, in emit

        self.flush()

      File "/usr/lib/python3.7/logging/__init__.py", line 1009, in flush

        self.stream.flush()

    OSError: [Errno 28] No space left on device

    Call stack:

      File "/usr/bin/service-control", line 183, in

        exit(main())

      File "/usr/bin/service-control", line 167, in main

        process_arguments(parsed_args)

      File "/usr/bin/service-control", line 121, in process_arguments

        svcs_status = get_services_status(svc_names, ignore_err=ignore_err)

      File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 643, in get_s                                                                                        ervices_status

        quiet=_quiet)

      File "/usr/lib/vmware/site-packages/cis/svcsController.py", line 62, in _svc_l                                                                                        og

        logging.info(msg)

    Message: "Get services status, svcnames=['vmware-vsan-health']"

    Arguments: ()

     --- Logging error ---

    Traceback (most recent call last):

      File "/usr/lib/python3.7/logging/__init__.py", line 1029, in emit

        self.flush()

      File "/usr/lib/python3.7/logging/__init__.py", line 1009, in flush

        self.stream.flush()

    OSError: [Errno 28] No space left on device

    Call stack:

      File "/usr/bin/service-control", line 183, in

        exit(main())

      File "/usr/bin/service-control", line 167, in main

        process_arguments(parsed_args)

      File "/usr/bin/service-control", line 149, in process_arguments

        logging.info(msg)

    Message: 'Stopped:\n vmware-vsan-health'

    Arguments: ()

    Stopped:

     vmware-vsan-health

    仔细看发现是由于系统没有足够的空间导致执行失败,这也可能是服务出现问题的根因。查看磁盘大小分析,发现/storage/log目录爆满100%,将一些时间久远的日志删除,又通过mv使其有足够的空间。

            注:vmware vCenter虚拟机经常容易出现log目录写满的情况,可以根据实际情况释放相关磁盘空间。

     再次执行启动服务命令:

    root@vcsa [ /storage/log/vmware ]# service-control --start vmware-vsan-health

    Operation not cancellable. Please wait for it to finish...

    Performing start operation on service vsan-health...

    Successfully started service vsan-health

    OK,服务正常了。

    3、检查集群DRS状态

            返回vCenter查看集群的告警,发现相关DRS的告警已经消除,vCLS的虚机又开始正常创建成功且上电,此时DRS故障已恢复,相关问题虚拟机已能正常上电。

  • 相关阅读:
    中文编程开发语言编程实际案例:程序控制灯电路以及桌球台球室用这个程序计时计费
    linux高级编程(网络)(www,http,URL)
    小程序容器技术在构建超级App的技术价值
    leetcode刷题(125)——931. 下降路径最小和
    路由、 网络、互联网、因特网、公网私网IP、NAT技术
    从0开始学云计算之服务器:服务的定义,特点,应用场景,分类
    【编译原理】手工打造词法分析器
    基于微信小程序的日语学习的系统,附源码
    impedance control for robot arm through C++
    面试题:如何测试App性能?
  • 原文地址:https://blog.csdn.net/m0_58983558/article/details/127594456