问题发生在网路连接阶段,可以直接跟踪网络连接拦截器CallServerInterceptor:
@Override public Response intercept(Chain chain) throws IOException {
...
responseBuilder = exchange.readResponseHeaders(true); //(1)
...
}
//okhttp3.internal.http1.Http1ExchangeCodec#readResponseHeaders():
@Override public Response.Builder readResponseHeaders(boolean expectContinue) throws IOException {
...
try {
StatusLine statusLine = StatusLine.parse(readHeaderLine());
Response.Builder responseBuilder = new Response.Builder()
.protocol(statusLine.protocol)
.code(statusLine.code)
.message(statusLine.message)
.headers(readHeaders());
...
return responseBuilder;
} catch (EOFException e) {
...
//(1)找到了抛出发生错误的位置
throw new IOException("unexpected end of stream on "
+ address, e);
}
}
首先是连接复用问题,可以聚焦到OkHttp的连接池ConnectionPool上,而ConnectionPool的实现类是RealConnectionPool,通过跟进连接池中Connection的放出和移出逻辑发现判断时机在cleanup()方法中:
long cleanup(long now) {
int inUseConnectionCount = 0;
int idleConnectionCount = 0;
RealConnection longestIdleConnection = null;
long longestIdleDurationNs = Long.MIN_VALUE;
// Find either a connection to evict, or the time that the next eviction is due.
synchronized (this) {
for (Iterator<RealConnection> i = connections.iterator(); i.hasNext(); ) {
RealConnection connection = i.next();
// If the connection is in use, keep searching.
if (pruneAndGetAllocationCount(connection, now) > 0) {
inUseConnectionCount++;
continue;
}
idleConnectionCount++;
// If the connection is ready to be evicted, we're done.
long idleDurationNs = now - connection.idleAtNanos;
if (idleDurationNs > longestIdleDurationNs) {
longestIdleDurationNs = idleDurationNs;
longestIdleConnection = connection;
}
}
if (longestIdleDurationNs >= this.keepAliveDurationNs
|| idleConnectionCount > this.maxIdleConnections) {
//(1)
connections.remove(longestIdleConnection);
} else if (idleConnectionCount > 0) {
// A connection will be ready to evict soon.
return keepAliveDurationNs - longestIdleDurationNs;
} else if (inUseConnectionCount > 0) {
// All connections are in use. It'll be at least the keep alive duration 'til we run again.
return keepAliveDurationNs;
} else {
// No connections, idle or in use.
cleanupRunning = false;
return -1;
}
}
//(2)
closeQuietly(longestIdleConnection.socket());
// Cleanup again immediately.
return 0;
}
配置Keep-Alive超时的位置:
在构建OKHttpClient时,Builder有开放connectPool()接口让使用方自己配置:
举例:
OkHttpClient okHttpClient = new OkHttpClient.Builder()
.readTimeout(1000, TimeUnit.SECONDS)
.writeTimeout(1000, TimeUnit.SECONDS)
//配置自定义连接池参数
.connectionPool(new ConnectionPool(5, 60, TimeUnit.SECONDS))
.build();
为了验证该问题,抛开实际项目中的额外逻辑(OkHttp的客制化逻辑等),我们采用本地模拟该条件进行还原,摸索是否能解决该问题。
在构建OKHttpClient的时候开启连接失败重试开关:
OkHttpClient client = new OkHttpClient.Builder()
...
.retryOnConnectionFailure(true) //开始连接失败时重连逻辑
.build();
修改后本地测试未复现,但因线上环境复杂,可能不同地区网络状态差异较大,该问题需要进一步分析线上埋点数据分析改善效果。