在看skywalking-agent源码时发现agent会采集线程的状态,但这些状态是什么场景触发的呢?当线程数超过阈值后,我们怎样去更快地定位问题呢?带上这样的疑问进行下面的分析

在java.lang.Thread类的内部枚举类State中定义了Java线程的六个状态
public enum State {
/**
* Thread state for a thread which has not yet started.
*/
NEW,
/**
* Thread state for a runnable thread. A thread in the runnable
* state is executing in the Java virtual machine but it may
* be waiting for other resources from the operating system
* such as processor.
*/
RUNNABLE,
/**
* Thread state for a thread blocked waiting for a monitor lock.
* A thread in the blocked state is waiting for a monitor lock
* to enter a synchronized block/method or
* reenter a synchronized block/method after calling
* {@link Object#wait() Object.wait}.
*/
BLOCKED,
/**
* Thread state for a waiting thread.
* A thread is in the waiting state due to calling one of the
* following methods:
*
* - {@link Object#wait() Object.wait} with no timeout
* - {@link #join() Thread.join} with no timeout
* - {@link LockSupport#park() LockSupport.park}
*
*
* A thread in the waiting state is waiting for another thread to
* perform a particular action.
*
* For example, a thread that has called Object.wait()
* on an object is waiting for another thread to call
* Object.notify() or Object.notifyAll() on
* that object. A thread that has called Thread.join()
* is waiting for a specified thread to terminate.
*/
WAITING,
/**
* Thread state for a waiting thread with a specified waiting time.
* A thread is in the timed waiting state due to calling one of
* the following methods with a specified positive waiting time:
*
* - {@link #sleep Thread.sleep}
* - {@link Object#wait(long) Object.wait} with timeout
* - {@link #join(long) Thread.join} with timeout
* - {@link LockSupport#parkNanos LockSupport.parkNanos}
* - {@link LockSupport#parkUntil LockSupport.parkUntil}
*
*/
TIMED_WAITING,
/**
* Thread state for a terminated thread.
* The thread has completed execution.
*/
TERMINATED;
}
在操作系统层面,线程存在五类状态(状态流转如下图)

2.1 Java中的RUNNABLE包含了OS中的RUNNING和READY
2.2 Java中的WAITING、TIMED_WAITING、BLOCKED对应OS中的WAITING
下面按照java.lang.Thread$State中对状态的描述场景去模拟测试,主要分析WAITING、TIMED_WAITING、BLOCKED
@Test
public void testSynchronizedWaitBlock(){
Object monitor = new Object();
Thread thread1 = new Thread(()->{
synchronized (monitor){
try {
monitor.wait();
} catch (InterruptedException e) {
System.out.println("捕获到InterruptedException");
throw new RuntimeException(e);
}
System.out.println("执行thread1方法");
}
},"thread1");
Thread thread2 = new Thread(()->{
synchronized (monitor){
monitor.notifyAll();
System.out.println("执行thread2方法");
}
},"thread2");
thread1.start();
thread2.start();
System.out.println("断点暂停主线程");
}



WAITING状态注释翻译:处于等待状态的线程正在等待另一个线程执行特定操作,调用以下方法之一,线程处于等待状态:
1.1 没有超时参数的Object.wait
1.2 没有超时参数的Thread.join
1.3 LockSupport.park
场景1:没有超时参数的Thread.join
2.1 模拟代码如下
// Thread.join 等待这个线程死掉
@Test
public void testJoin(){
Thread thread1 = new Thread(()->{
System.out.println("执行thread1方法.....");
},"thread1");
Thread thread2 = new Thread(()->{
try {
thread1.join();
} catch (InterruptedException e) {
throw new RuntimeException(e);
}
System.out.println("执行thread2方法.....");
},"thread2");
thread1.start();
thread2.start();
System.out.println("断点暂停主线程");
}
2.2 thread2执行thread1.join等待thread1执行完,此时thread2的状态为WAITING

场景2:LockSupport.park
3.1 模拟代码如下
/**
* 除非许可可用,否则禁用当前线程以进行线程调度
* --> 当前线程将出于线程调度目的而被禁用并处于休眠状态,直到发生以下三种情况之一:
* 1. 其他一些线程以当前线程为目标调用unpark
* 2. 其他一些线程中断当前线程
* @throws Exception
*/
@Test
public void testLockSupport(){
Thread thread1 = new Thread(() -> {
System.out.println("do something start");
LockSupport.park();
System.out.println("do something end");
},"thread1");
thread1.start();
System.out.println("给子线程thread增加一个许可");
LockSupport.unpark(thread1);
System.out.println("主线程执行完毕");
}
3.2 thread1调用LockSupport.park()将当前线程thread1处于休眠状态,状态为WAITING

3.3 主线程调用LockSupport.unpark(thread1)将thread1唤醒,thread1继续执行,状态为RUNNABLE
/**
* 使当前执行的线程休眠(暂时停止执行)指定的毫秒数,
* 取决于系统计时器和调度程序的精度和准确性。 该线程不会失去任何监视器的所有权
*/
@Test
public void testSleep(){
Object monitor = new Object();
Thread thread1 = new Thread(()->{
synchronized (monitor){
try {
Thread.sleep(1000 * 10);
} catch (InterruptedException e) {
System.out.println("捕获到InterruptedException");
throw new RuntimeException(e);
}
System.out.println("执行thread1方法");
}
},"thread1");
Thread thread2 = new Thread(()->{
synchronized (monitor){
System.out.println("执行thread2方法");
}
},"thread2");
thread1.start();
thread2.start();
System.out.println("断点暂停主线程");
}


个人认为统计线程信息是为监控系统健康状况,当线程数达到预设阈值后,应该将当时线程堆栈信息打印出来保存起来(维护现场)并及时进行告警通知,后续更好的定位问题。
(在skywalking中没有找到这样的设置,希望大佬指点~~~~)