在日常数据库的使用过程中,我们经常会遇到各种超时,特别是在网络不稳定和业务高并发的情况下。
理解这些超时的背后原理和工作机制,以及不同数据库下的超时参数和设置方式,无疑会对异常状况下的问题排查大有裨益;通过合理配置这些超时参数,也可以减少各种异常情况下应用宕机恢复的时间,从而提高 RTO 和 RPO,满足 SLA 的要求。
本片文章,我们就来一起学习下这些超时的相关知识。
数据库的超时参数有以下几种:
除了以上几个数据库的超时参数,还需要注意,应用程序和数据库所在的服务器也可以配置操作系统级别的套接字超时检测机制。
事务超时,即 transaction timeout, 可以用来限制某个事务中所有 statement 语句的处理时间之和的最大值,简单来说,事务超时时间 statement timeout = 语句超时时间 statement/query timeout * 事务中语句个数 + 其他耗时(如业务代码处理时间,gc 垃圾回收时间等)
事务超时一般在应用框架中进行配置, 如 spring 中,可以使用注解 @Transactional 指定。
查询超时,即 query timeout,有时也被称为语句超时 Statement timeout,可以用来限制某个 statement 语句(可以是增删改查)的最大执行时间,若该 sql语句在该超时时间内还没有返回执行结果,应用端的数据库驱动程序就会抛出超时异常,并发送取消执行的信号给远程的数据库管理系统,由数据库管理系统取消该语句的执行。

查询超时在不同数据库管理系统和不同驱动下,其工作机制略有不同,但其工作原理是相似的,即大都是通过一个独立的线程来跟踪语句的执行时间,在执行时间超过指定的超时时间时,应用端抛出超时的错误,并通过底层的数据库连接发送取消执行的信号给远程的数据库管理系统,由数据库管理系统取消该语句的执行。
比如 Oracle数据库中,其查询超时的工作机制大体如下:
创建待执行 statement:Creates a statement by calling Connection.createStatement();
触发执行 statement:Calls Statement.executeQuery();
通过 statement 底层的连接将 statement 远程传输给数据库管理系统:The statement transmits the Query to Oracle DBMS by using its own connection.
注册该 statement 到超时处理线程 OracleTimeoutPollingThread:The statement registers a statement to OracleTimeoutPollingThread (1 for each classloader) for timeout process.

执行时发生了超时:Timeout occurs.
超时处理线程调用方法取消语句的执行:OracleTimeoutPollingThread calls OracleStatement.cancel().

通过 statement 底层的连接,发送取消执行的信号给远程的数据库管理系统,以取消语句的执行:Sends a cancel message through the connection and cancels the query being executed.

再比如Mysql中,其查询超时的工作机制大体如下:

数据库连接的套接字超时,即 socket timeout, 具体又包括登录超时 (loginTimeout),网络超时/连接超时 (connectTimeout/NetworkTimeout),和常规的套接字超时 socket timeout,各自的含义如下。
登录超时,即 loginTimeout,是数据库用户成功登录到数据库服务器的超时时间。


连接超时,即 connectTimeout,有时也被称为 网络超时 NetworkTimeout,是驱动程序建立 JDBC 底层的 TCP 连接的超时时间。

常规的套接字超时,即 socket timeout, 其含义如下:
登录超时,连接超时,常规的套接字超时,三者的区别与联系如下:
- The loginTimeout specifies how long the whole process of logging into the database is allowed to take. It governs the operation of connecting and authenticating to the dbms server, this involves establishing a TCP connection followed by one or more exchanges of packets for the handshake and authentication to the dbms server;
- The connectTimeout specifies how long to wait for a TCP network connection to get established, it governs the time needed to establish a TCP socket connection, and as establishing a TCP connection is part of establishing a database connection and doesn't guarantee a login, so loginTimeout >= connectTimeout;
- The The socketTimeout specifies how long the client will wait for a response to a command from the server before throwing an error, it governs the time a socket can be blocked waiting to read from a socket, this involves all reads from the server, not just during connect, but also during subsequent interaction with the server (eg executing queries),so you may want to set it higher (eg for other operations that take a long time to get a response back) than you are willing to wait for the login to complete;
- A socketTimeout can be used as both a brute force global query timeout and a method of detecting network problems;
- the loginTimeout and connectTimeout are related to establishing a connection, while socketTimeout is relevant for the whole database session;
- connectTimeout and socketTimeout are timeouts on low-level socket operations, while loginTimeout is on a high level - the database level;
- Generally, the application hangs from network issues when the application is calling Socket.read(). However, depending on the network composition or the error type, it can rarely be in waiting status while running Socket.write(). When the application calls Socket.write(), the data is recorded to the OS kernel buffer and then the right to control is returned to the application immediately. Thus, as long as a valid value is recorded to the kernel buffer, Socket.write() is always successful. However, if the OS kernel buffer is full due to a special network error, even Socket.write() can be put into waiting status;
我们经常遇到开发同学抱怨,明明对某个SQL语句配置了查询超时,但看起来查询超时就是不生效,其实这种情况是因为底层的网络出了问题,而查询超时机制在网络异常的状况下是不生效的,其原因如下:
- The higher level timeout is dependent on the lower level timeout. The higher level timeout will operate normally only if the lower level timeout operates normally as well. If the JDBC driver socket timeout does not work properly, then higher level timeouts such as statement timeout and transaction timeout will not work properly either.
- The statement timeout does not handle the timeouts at the time of network failure, it does only one thing: restricts the operation time of 1 statement,and handling timeout to prevent network failure must be done by JDBC Driver;
- Socket timeout value for JDBC driver is necessary when the DBMS is terminated abruptly or an network error has occured (equipment malfunction, etc.).
- Because of the structure of TCP/IP, there are no means for the socket to detect network errors. Therefore, the application cannot detect any disconnection with the DBMS. If the socket timeout is not configured, then the application may wait for the results from the DBMS indefinitely. (This connection is also called a "dead connection."),to prevent dead connections, a timeout must be configured for the socket.
- Socket timeout can be configured via JDBC driver. By setting up the socket timeout, you can prevent the infinite waiting situation when there is a network error and shorten the failure time.
- It is not recommended to use the socket timeout value to limit the statement execution time. So the socket timeout value must be higher than the statement timeout value.
- If the socket timeout value is smaller than the statement timeout value, as the socket timeout will be executed first, and the statement timeout value becomes meaningless and will not be executed.


下面总结下常见数据库中,套接字连接超时和读写超时的配置方式:


# 配置参数
final static String url= "jdbc:oracle:thin:@myhost:1521/myorcldbservicename";
final static String user = "hr";
final static String password = "hr";
final static String CONNECT_TIMEOUT = "20000";
final static String READ_TIMEOUT = "50000";
# 使用 DataSource 获取连接
Properties connectionProperties = new Properties();
connectionProperties.put(“oracle.net.CONNECT_TIMEOUT”, CONNECT_TIMEOUT);
connectionProperties.put(“oracle.jdbc.ReadTimeout”, READ_TIMEOUT);
OracleDataSource ods = new OracleDataSource();
ods.setURL(url);
ods.setUser(user);
ods.setPassword(password);
ods.setConnectionProperties(connectionProperties);
# 使用 DriverManager 获取连接
Class<?> oracleDriverClass = Class.forName("oracle.jdbc.driver.OracleDriver");
Properties connectionProperties = new Properties();
connectionProperties.put(“oracle.net.CONNECT_TIMEOUT”, CONNECT_TIMEOUT);
connectionProperties.put(“oracle.jdbc.ReadTimeout”, READ_TIMEOUT);
//也可以通过环境变量/系统参数设置,注意需要在 connection 连接之前设置
//System.setProperty("oracle.net.CONNECT_TIMEOUT", connectTimeout);
//System.setProperty("oracle.jdbc.ReadTimeout", readTimeout);
connectionProperties.put(“user”, user);
connectionProperties.put(“password”, password);
Connection con=DriverManager.getConnection(url, props);
除了以上几个数据库的超时参数,还需要注意,应用程序和数据库所在的服务器也可以配置操作系统级别的套接字超时检测。
# 查询内核参数
- sysctl -a //显示当前所有可用的内核参数
- sysctl net.ipv4.tcp_keepalive_time //查询某个内核参数
- cat /proc/sys/net/ipv4/tcp_keepalive_time //查询某个内核参数
#修改内核参数
- sysctl net.ipv4.tcp_keepalive_time=3600//修改某个内核参数
- vim /etc/sysctl.conf//在配置文件中修改内核参数
- sysctl -p //从配置文件 sysctl.conf 中重新加载内核参数
- If the socket timeout or the connect timeout is not configured, most of the time, applications cannot detect network errors. So, until the applications are connected or are able to read data, they will wait indefinitely.
- To prevent this, we can configure a socket timeout time at the OS level, so the Linux servers can check the network connection at the OS level.
- If you set the KeepAlive checking cycle for the Linux servers to 30 minutes, then even if someone set the JDBC driver‘s socket timeout to 0, which means no timeout, the DBMS network connection problems caused by network issues do not surpass 30 minutes/The JDBC connection hang recovers 30 minutes after the network connection failure, that is to say, the JDBC driver's socket timeout is affected by the OS's socket timeout configuration.
- Generally, the application hangs from network issues when the application is calling Socket.read(). However, depending on the network composition or the error type, it can rarely be in waiting status while running Socket.write(). When the application calls Socket.write(), the data is recorded to the OS kernel buffer and then the right to control is returned to the application immediately. Thus, as long as a valid value is recorded to the kernel buffer, Socket.write() is always successful. However, if the OS kernel buffer is full due to a special network error, even Socket.write() can be put into waiting status. In this case, the OS tries to resend the packet for a certain amount of time, and generates an error when it reaches the limit.
# JDBC API 相关类与方法
java.sql.DriverManager#setLoginTimeout
javax.sql.CommonDataSource#setLoginTimeout
java.sql.Connection#getNetworkTimeout
java.sql.Connection#setNetworkTimeout
java.sql.Statement#setQueryTimeout
# oracle JDBC driver 相关类与方法
oracle.jdbc.OracleDriver
oracle.jdbc.pool.OracleDataSource#setLoginTimeout
oracle.jdbc.OracleConnection
oracle.jdbc.OracleConnection#CONNECTION_PROPERTY_THIN_READ_TIMEOUT
oracle.jdbc.OracleConnection#CONNECTION_PROPERTY_THIN_NET_CONNECT_TIMEOUT。
oracle.jdbc.OracleConnectionWrapper#setNetworkTimeout
oracle.jdbc.driver.PhysicalConnection#setNetworkTimeout
oracle.jdbc.driver.OracleStatement#setQueryTimeout
oracle.jdbc.driver.OracleStatement#doExecuteWithTimeout
oracle.jdbc.driver.OraclePreparedStatement#executeForRowsWithTimeout
oracle.jdbc.driver.OracleTimeoutPollingThread
# mysql JDBC driver 相关类与方法
com.mysql.cj.jdbc.Driver
com.mysql.cj.jdbc.MysqlDataSource#setLoginTimeout
com.mysql.cj.jdbc.ConnectionImpl#setNetworkTimeout
com.mysql.cj.jdbc.ConnectionWrapper#setNetworkTimeout
com.mysql.cj.jdbc.StatementImpl#setQueryTimeout
com.mysql.cj.jdbc.StatementWrapper#setQueryTimeout
#参考链接
- https://www.cubrid.org/blog/3826470