之前耗了一段时间做SPDK下的vhost-user-nvme的开发,就是让VM内使用NVME存储。涉及到vhost-user-nvme设备的QEMU live迁移或者热升级的开发,碰到一个有意思的问题,值得写一下。
QEMU版本:https://github.com/spdk/qemu.git 最好用https://review.gerrithub.io/#/c/spdk/qemu/+/406011/
SPDK版本:https://github.com/spdk/spdk
现象:大约QEMU迁移1000次左右时,SPDK拉起的vhost进程就会因段错误crash掉。
直接原因就出在spdk_vhost_nvme_admin_passthrough函数下的SPDK_NVME_OPC_ABORT过程,当qemu下发NVME命令给vhost时,发生了段错误
case SPDK_NVME_OPC_DOORBELL_BUFFER_CONFIG:
ret = vhost_nvme_doorbell_buffer_config(nvme, req, cpl);
break;
case SPDK_NVME_OPC_ABORT:
//出问题的代码在此处
sq_tail = nvme->dbbuf_dbs[sq_offset(1, 1)] & 0xffffu;
cq_head = nvme->dbbuf_dbs[cq_offset(1, 1)] & 0xffffu;
SPDK_NOTICELOG("ABORT: CID %u, SQ_TAIL %u, CQ_HEAD %u\n", (req->cdw10 >> 16) & 0xffffu, sq_tail, cq_head);
很明显,dbbuf_dbs此时已经为空,仔细翻看一下代码就清楚了,只有destroy_device_poller_cb执行后才会释放dbbuf_dbs。spdk_vhost_nvme_stop_device会调用destroy_device_poller_cb,spdk_vhost_nvme_stop_device则是spdk_vhost_nvme_device_backend结构中stop_device的定义:
static const struct spdk_vhost_dev_backend spdk_vhost_nvme_device_backend = {
.start_device = spdk_vhost_nvme_start_device,
.stop_device = spdk_vhost_nvme_stop_device,
.dump_info_json = spdk_vhost_nvme_dump_info_json,
.write_config_json = spdk_vhost_nvme_write_config_json,
.remove_device = spdk_vhost_nvme_dev_remove,
};
在往上走就是stop_device,即spdk_vhost_event_send(vdev, vdev->backend->stop_device, 3, "stop device"),stop_device同样是另外一个结构体内的函数定义
const struct vhost_device_ops g_spdk_vhost_ops = {
.new_device = start_device,