今天dba在进行DB的主从切换,导致应用一直报错,获取不到DB连接,druid的错误信息如下:
Could not open JDBC Connection for transaction; nested exception is com.alibaba.druid.pool.GetConnectionTimeoutException: wait millis 15000, active 20
可以看到活跃连接数active=20, 而应用配置中设置的maxActive=100,远远没有达到最大的连接数,为什么就创建不了连接呢?
查看和db连接有关的错误日志,发现如下
ERROR com.alibaba.druid.pool.DruidDataSource$CreateConnectionThread (DruidDataSource.java:1713) - create connection holder error
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
The last packet successfully received from the server was 0 milliseconds ago.
错误信息展示CreateConnectionThread 创建连接线程报错,那问题就变成了猜测连接线程已经被中止了,导致没法创建连接,使用jstack命令导出应用线程,发现确实CreateConnectionThread 线程确实不存在了,所以问题已经明朗,是因为CreateConnectionThread 线程中止了,导致再也无法创建连接。
对CreateConnectionThread 线程的代码进行检查:
public void run() {
initedLatch.countDown();
int errorCount = 0;
for (;;) {
// addLast
try {
lock.lockInterruptibly();
} catch (InterruptedException e2) {
break;
}
try {
// 必须存在线程等待,才创建连接
if (poolingCount >= notEmptyWaitThreadCount) {
empty.await();
}
// 防止创建超过maxActive数量的连接
if (activeCount + poolingCount >= maxActive) {
empty.await();
continue;
}
} catch (InterruptedException e) {
lastCreateError = e;
lastErrorTimeMillis = System.currentTimeMillis();
break;
} finally {
lock.unlock();
}
Connection connection = null;
try {
connection = createPhysicalConnection();
} catch (SQLException e) {
LOG.error("create connection error", e);
errorCount++;
if (errorCount > connectionErrorRetryAttempts && timeBetweenConnectErrorMillis > 0) {
if (breakAfterAcquireFailure) {
break;
}
try {
Thread.sleep(timeBetweenConnectErrorMillis);
} catch (InterruptedException interruptEx) {
break;
}
}
} catch (RuntimeException e) {
LOG.error("create connection error", e);
continue;
} catch (Error e) {
LOG.error("create connection error", e);
break;
}
if (connection == null) {
continue;
}
DruidConnectionHolder holder = null;
try {
holder = new DruidConnectionHolder(DruidDataSource.this, connection);
} catch (SQLException ex) {// 主从切换过程中,这里有报错,导致线程中止
LOG.error("create connection holder error", ex);
break;
}
lock.lock();
try {
connections[poolingCount++] = holder;
if (poolingCount > poolingPeak) {
poolingPeak = poolingCount;
poolingPeakTime = System.currentTimeMillis();
}
errorCount = 0; // reset errorCount
notEmpty.signal();
notEmptySignalCount++;
} finally {
lock.unlock();
}
}
}
}
我使用的版本是druid 1.7,升级到最新的1.15版本后,这个bug已经没有解决了