之前耗了一段时间做SPDK下的vhost-user-nvme的开发,就是让VM内使用NVME存储。涉及到vhost-user-nvme设备的QEMU live迁移或者热升级的开发,碰到一个有意思的问题,值得写一下。
QEMU版本:https://github.com/spdk/qemu.git 最好用https://review.gerrithub.io/#/c/spdk/qemu/+/406011/
SPDK版本:https://github.com/spdk/spdk
现象:大约QEMU迁移1000次左右时,SPDK拉起的vhost进程就会因段错误crash掉。
直接原因就出在spdk_vhost_nvme_admin_passthrough函数下的SPDK_NVME_OPC_ABORT过程,当qemu下发NVME命令给vhost时,发生了段错误
case SPDK_NVME_OPC_DOORBELL_BUFFER_CONFIG: ret = vhost_nvme_doorbell_buffer_config(nvme, req, cpl); break; case SPDK_NVME_OPC_ABORT: //出问题的代码在此处 sq_tail = nvme->dbbuf_dbs[sq_offset(1, 1)] & 0xffffu; cq_head = nvme->dbbuf_dbs[cq_offset(1, 1)] & 0xffffu; SPDK_NOTICELOG("ABORT: CID %u, SQ_TAIL %u, CQ_HEAD %u\n", (req->cdw10 >> 16) & 0xffffu, sq_tail, cq_head);
很明显,dbbuf_dbs此时已经为空,仔细翻看一下代码就清楚了,只有destroy_device_poller_cb执行后才会释放dbbuf_dbs。spdk_vhost_nvme_stop_device会调用destroy_device_poller_cb,spdk_vhost_nvme_stop_device则是spdk_vhost_nvme_device_backend结构中stop_device的定义:
static const struct spdk_vhost_dev_backend spdk_vhost_nvme_device_backend = { .start_device = spdk_vhost_nvme_start_device, .stop_device = spdk_vhost_nvme_stop_device, .dump_info_json = spdk_vhost_nvme_dump_info_json, .write_config_json = spdk_vhost_nvme_write_config_json, .remove_device = spdk_vhost_nvme_dev_remove, };
在往上走就是stop_device,即spdk_vhost_event_send(vdev, vdev->backend->stop_device, 3, "stop device"),stop_device同样是另外一个结构体内的函数定义
const struct vhost_device_ops g_spdk_vhost_ops = { .new_device = start_device,