非拥塞随机丢包,BBRv2 有些拉胯,分析结论如下:
BBRv2 在 cruise 阶段不允许 probe up,来自 When not Probing for Bandwidth:
When not explicitly accelerating to probe for bandwidth (Drain, ProbeRTT, ProbeBW_DOWN, ProbeBW_CRUISE), BBR responds to loss by slowing down to some extent. This is because loss suggests that the available bandwidth and safe volume of in-flight data may have decreased recently(这不是真实的,要考虑非拥塞丢包), and the flow needs to adapt, slowing down toward the latest delivery process. BBR flows implement this response by reducing the short-term model parameters, BBR.bw_lo and BBR.inflight_lo.
BBRv2 对 loss 激进,走向 BBRv1 的反面:
若遭遇随机丢包,BBRv2 在 cruise 阶段会持续降速,cycle 内没有任何补偿措施,造成吞吐持续下降,降低了带宽利用率,损害了对自身的公平性。
遭遇丢包时,BBRv2 试图用 round 内最大的 bw 和 inflight 兜住 bw_lo 和 inflight_lo:
bbr->inflight_lo =
max_t(u32, bbr->inflight_latest,
(u64)bbr->inflight_lo *
(BBR_UNIT - beta) >> BBR_SCALE);
但很难奏效。inflight_lo 作为 cwnd 的 bound 已经降低,没有足够的数据包支撑更高的 bw,这是循环依赖。单纯一次 MI probe up,很难支撑约 60 rounds 内多次 MD。
仅试图弥补,先给出过程示意图:
将 inflight_hi - headroom 做为当前 cycle 的 inflight 基准(类似 BBRv1 中 “坚持 probe up 的所得”,但 BBRv2 同时出让了 headroom),在 cruise 阶段做两件事:
这样既公平,又保证带宽利用率。
再给出示意代码:My-TCP BBRv2:
// 0. 定义 u8 cruise_inc,有丢包时 reset 到 0。
// 1. 开始 cruise 时记录本 cycle 带宽。
bbr->round_start_bw = bbr_bw(sk);
bbr2_set_cycle_idx(sk, BBR_BW_PROBE_CRUISE);
// 2. 开启新 loss round 时判断上一个 loss round 有无 loss。
if (!bbr->loss_in_round) {
bbr->no_loss_in_prev_round = 1;
bbr->cruise_inc = 1;
}
bbr->loss_in_round = 0;
// 3. cruise 期间若一个 round 没有 loss,则递增(u8 回绕,先快后慢) inflt_lo,恢复 bw
case BBR_BW_PROBE_CRUISE:
if (bbr2_check_time_to_probe_bw(sk, rs))
return; /* already decided state transition */
if (bbr->loss_round_start == 1 && bbr->no_loss_in_prev_round == 1) {
if (bbr->inflight_lo != ~0U)
bbr->inflight_lo =
min_t(u32, bbr->inflight_lo + bbr->cruise_inc,
bbr2_inflight_with_headroom(sk));
if (bbr->bw_lo == ~0U)
bbr->bw_lo =
min_t(u32, bbr->bw_lo + bbr->cruise_inc,
bbr->round_start_bw);
bbr->cruise_inc *= 2;
}
break;
除代码所示部分,还有一个细节,只要重新采到更小的 RTT,就可开始将 inflight_lo 拉回 inflight_hi - headroom,并开始递增 bw_lo 到 round_start_bw,RTT 降低意味着排队缓解,此时即便轻微丢包,大概率是非拥塞丢包。
说完第一件事,还有一件事,BBRv2 的 maxbw-filter 的长度。
RFC draft 说是 2 probebw cycles。但我一直有疑问,为什么不是 1 个 cycle?理论上应该是 “1 次 probe up” 算 1 个有效周期,与 BBv1 大致一致。但 BBRv2 给出了一个不现实的理想作为理由,将 maxbw-filter 长度拖到 2 probebw cycles,来自 BBR.max_bw Max Filter:
- It is long enough to cover at least one entire ProbeBW cycle (see the “ProbeBW” section). This ensures that the window contains at least some delivery rate samples that are the result of data transmitted with a super-unity pacing_gain (a pacing_gain larger than 1.0). Such super-unity delivery rate samples are instrumental in revealing the path’s underlying available bandwidth even when there is noise from delivery rate shortfalls due to aggregation delays, queuing delays from variable cross traffic, lossy link layers with uncorrected losses, or short-term buffer exhaustion (e.g., brief coincident bursts in a shallow buffer).
- It aims to be long enough to cover short-term fluctuations in the network’s delivery rate due to the aforementioned sources of noise. In particular, the delivery rate for radio link layers (e.g., wifi and cellular technologies) can be highly variable, and the filter window needs to be long enough to remember “good” delivery rate samples in order to be robust to such variations. (企图用两个cycles 兜住无线信号抖动,太牵强)
- It aims to be short enough to respond in a timely manner to sustained reductions in the bandwidth available to a flow, whether this is because other flows are using a larger share of the bottleneck, or the bottleneck link service rate has reduced due to layer 1 or layer 2 changes, policy changes, or routing changes. In any of these cases, existing BBR flows traversing the bottleneck should, in a timely manner, reduce their BBR.max_bw estimates and thus pacing rate and in-flight data, in order to match the sending behavior to the new available bandwidth.
本文第一部分,修正 cruise 阶段 inflight 和 bw 持续下降问题的方法同时可将 maxbw-filter 长度安全拉回 1 probebw cycle,因为 cruise 阶段将 inflight_lo 向 inflight_hi - headroom 收敛已经抵抗了带宽的抖动。
代码改自:https://github.com/google/bbr/blob/v2alpha/net/ipv4/tcp_bbr2.c
源码树 & 编译 & 安装指引:https://github.com/google/bbr/tree/v2alpha
上周日遗留的测试结果以及一个想法,大致实现了一版,不超过 30 行代码的修改(主要还是不懂编程,写不了太多)。今早作文把这些解释一番。
浙江温州皮鞋湿,下雨进水不会胖。