• Linux中如何获得进程的运行时堆栈


    关于这个话题,我们一般是为了处理一下生产环境中程序出现死循环或者死锁等问题。我们一般想到的方法就是gdb attach上一个运行中的进程。但是这个需要手动交互。通过网上查找和实践,可以有以下几种选择:

    • 第一种:pstack 进程ID    (pstack就是一个利用gdb实现的shell脚本)
    • 第二种:gcore 进程IP    (gcore也是一个gdb实现的脚本)
    • (看来目前现成的工具都是站在gdb的肩膀上了,除非我们利用ptrace()API参考gdb的源码自己写一个)
    • 第三种:利用fork()的方式,继承一个进程,然后再新的子进程里面直接使用异常信号产生coredump。
    描述优点/缺点
    pstack依赖系统中的gdb,会是程序短暂的停止运行。

    优点:不需要对原有程序做任何改变,直接可以产看运行时。

    缺点: 依赖gdb

    gcore依赖系统中的gdb,会是程序短暂的停止运行。

    优点:不需要对原有程序做任何改变,直接可以产看运行时。

    缺点: 依赖gdb

    fork()需要改造原有程序,增加事件代码触发fork()动作

    优点:不依赖gdb。

    缺点: 需要修改源程序

    相关代码:

    1. pstack

    1. [root@localhost ~]# cat /usr/bin/gstack
    2. #!/bin/sh
    3. if test $# -ne 1; then
    4. echo "Usage: `basename $0 .sh` " 1>&2
    5. exit 1
    6. fi
    7. if test ! -r /proc/$1; then
    8. echo "Process $1 not found." 1>&2
    9. exit 1
    10. fi
    11. # GDB doesn't allow "thread apply all bt" when the process isn't
    12. # threaded; need to peek at the process to determine if that or the
    13. # simpler "bt" should be used.
    14. backtrace="bt"
    15. if test -d /proc/$1/task ; then
    16. # Newer kernel; has a task/ directory.
    17. if test `/bin/ls /proc/$1/task | /usr/bin/wc -l` -gt 1 2>/dev/null ; then
    18. backtrace="thread apply all bt"
    19. fi
    20. elif test -f /proc/$1/maps ; then
    21. # Older kernel; go by it loading libpthread.
    22. if /bin/grep -e libpthread /proc/$1/maps > /dev/null 2>&1 ; then
    23. backtrace="thread apply all bt"
    24. fi
    25. fi
    26. GDB=${GDB:-/usr/bin/gdb}
    27. if $GDB -nx --quiet --batch --readnever > /dev/null 2>&1; then
    28. readnever=--readnever
    29. else
    30. readnever=
    31. fi
    32. # Run GDB, strip out unwanted noise.
    33. $GDB --quiet $readnever -nx /proc/$1/exe $1 <<EOF 2>&1 |
    34. $backtrace
    35. EOF
    36. /bin/sed -n \
    37. -e 's/^(gdb) //' \
    38. -e '/^#/p' \
    39. -e '/^Thread/p'
    40. [root@localhost ~]#

    2. gcore

    1. root@xxx:/App/Log# cat /usr/bin/gcore
    2. #!/bin/sh
    3. # Copyright (C) 2003-2016 Free Software Foundation, Inc.
    4. # This program is free software; you can redistribute it and/or modify
    5. # it under the terms of the GNU General Public License as published by
    6. # the Free Software Foundation; either version 3 of the License, or
    7. # (at your option) any later version.
    8. #
    9. # This program is distributed in the hope that it will be useful,
    10. # but WITHOUT ANY WARRANTY; without even the implied warranty of
    11. # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
    12. # GNU General Public License for more details.
    13. #
    14. # You should have received a copy of the GNU General Public License
    15. # along with this program. If not, see .
    16. ##############################
    17. # check /opt/tmp/corefile and left lastest gcore files!
    18. # check /App/corefile and left lastest gcore.tar.gz files!
    19. ##############################
    20. function check_gcore_files () {
    21. dir_list=(/opt/tmp/corefile /App/corefile)
    22. for item in ${dir_list[*]}
    23. do
    24. echo "$item"
    25. cd $item
    26. corecounts=0
    27. for file in $(ls -t gcore-*)
    28. do
    29. #echo file=$file
    30. corecounts=`expr $corecounts + 1`;
    31. #echo corecounts=$corecounts
    32. # rm the more file
    33. if [ $corecounts -gt 5 ]; then
    34. rm $file
    35. echo "rm $file"
    36. fi
    37. done
    38. #go back path:
    39. cd -
    40. done
    41. }
    42. #
    43. # Script to generate a core file of a running program.
    44. # It starts up gdb, attaches to the given PID and invokes the gcore command.
    45. #
    46. if [ "$#" -eq "0" ]
    47. then
    48. echo "usage: $0 [-o filename] pid"
    49. exit 2
    50. fi
    51. # Need to check for -o option, but set default basename to "core".
    52. name_tail=`date +"%Y-%m-%d-%H.%M.%S"`
    53. tmp_name=gcore-"$name_tail"
    54. name=gcore-"$name_tail"
    55. if [ "$1" = "-o" ]
    56. then
    57. if [ "$#" -lt "3" ]
    58. then
    59. # Not enough arguments.
    60. echo "usage: gcore [-o filename] pid"
    61. exit 2
    62. fi
    63. name=$2
    64. # Shift over to start of pid list
    65. shift; shift
    66. fi
    67. echo "tmpfile:$tmp_name, outfile:$name"
    68. # Attempt to fetch the absolute path to the gcore script that was
    69. # called.
    70. #binary_path=`dirname "$0"`
    71. binary_path="/usr/bin"
    72. if test "x$binary_path" = x. ; then
    73. # We got "." back as a path. This means the user executed
    74. # the gcore script locally (i.e. ./gcore) or called the
    75. # script via a shell interpreter (i.e. sh gcore).
    76. binary_basename=`basename "$0"`
    77. # If the gcore script was called like "sh gcore" and the script
    78. # lives in the current directory, "which" will not give us "gcore".
    79. # So first we check if the script is in the current directory
    80. # before using the output of "which".
    81. if test -f "$binary_basename" ; then
    82. # We have a local gcore script in ".". This covers the case of
    83. # doing "./gcore" or "sh gcore".
    84. binary_path="."
    85. else
    86. # The gcore script was not found in ".", which means the script
    87. # was called from somewhere else in $PATH by "sh gcore".
    88. # Extract the correct path now.
    89. binary_path_from_env=`which "$0"`
    90. binary_path=`dirname "$binary_path_from_env"`
    91. fi
    92. fi
    93. # Check if the GDB binary is in the expected path. If not, just
    94. # quit with a message.
    95. if [ ! -f "$binary_path"/gdb ]; then
    96. echo "gcore: GDB binary (${binary_path}/gdb) not found"
    97. exit 1
    98. fi
    99. # Initialise return code.
    100. rc=0
    101. echo "---------------------------"
    102. # Loop through pids
    103. for pid in $*
    104. do
    105. # `
    106. # available but not accessible as GDB would get stopped on SIGTTIN.
    107. date
    108. $binary_path/gdb
    109. -ex "set pagination off" -ex "set height 0" -ex "set width 0" \
    110. -ex "attach $pid" -ex "gcore /tmp/$tmp_name.$pid" -ex detach -ex quit
    111. if [ -r "/tmp/$tmp_name.$pid" ] ; then
    112. rc=0
    113. echo "------------------"
    114. date
    115. tar -czvPf $name.$pid.tar.gz "/tmp/$tmp_name.$pid"
    116. echo "------------------"
    117. date
    118. echo "------------------"
    119. rm -rf "/tmp/$tmp_name.$pid"
    120. date
    121. else
    122. echo "gcore: failed to create $name.$pid"
    123. rc=1
    124. break
    125. fi
    126. check_gcore_files
    127. done
    128. echo "------------------"
    129. exit $rc

    Note: 我们可以一些参数控制gcore参数的coredump文件的大小

    3. 使用fork()  (代码略)

    参考:

    如何获取运行时进程堆栈

  • 相关阅读:
    牛客刷题<21>三段式状态机
    ros学习笔记13——unknown package [sensor_msgs] on search path [{{‘ros_to_deepstream
    浅谈敏捷开发
    [附源码]Python计算机毕业设计SSM康健医药公司进销存管理系统(程序+LW)
    java自学第三天
    【数据结构】归并排序
    平安城市与智能交通系统建设方案
    记英语单词的有效方法就那么几个,别整一些花里胡哨的
    C/C++轻量级并发TCP服务器框架Zinx-游戏服务器开发004:游戏核心消息处理 - 玩家类的实现
    【踩坑记录】为VMware虚拟机引用主机代理
  • 原文地址:https://blog.csdn.net/paky_du/article/details/128068814