qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

nvdimm: add 'unarmed' option

Currently the only vNVDIMM backend can guarantee the guest write
persistence is device DAX on Linux, because no host-side kernel cache
is involved in the guest access to it. The approach to detect whether
the backend is device DAX needs to access sysfs, which may not work
with SELinux.

Instead, we add the 'unarmed' option to device 'nvdimm', so that users
or management utils, which have enough knowledge about the backend,
can control the unarmed flag in guest ACPI NFIT via this option. The
guest Linux NVDIMM driver, for example, will mark the corresponding
vNVDIMM device read-only if the unarmed flag in guest NFIT is set.

The default value of 'unarmed' option is 'off' in order to keep the
backwards compatibility.

Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com>
Message-Id: <20171211072806.2812-4-haozhong.zhang@intel.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Eduardo Habkost <ehabkost@redhat.com>

authored by

Haozhong Zhang and committed by
Eduardo Habkost
cb836434 da6789c2

+57
+15
docs/nvdimm.txt
··· 138 138 139 139 -object memory-backend-file,id=mem1,share=on,mem-path=/dev/dax0.0,size=4G,align=2M 140 140 -device nvdimm,id=nvdimm1,memdev=mem1 141 + 142 + Guest Data Persistence 143 + ---------------------- 144 + 145 + Though QEMU supports multiple types of vNVDIMM backends on Linux, 146 + currently the only one that can guarantee the guest write persistence 147 + is the device DAX on the real NVDIMM device (e.g., /dev/dax0.0), to 148 + which all guest access do not involve any host-side kernel cache. 149 + 150 + When using other types of backends, it's suggested to set 'unarmed' 151 + option of '-device nvdimm' to 'on', which sets the unarmed flag of the 152 + guest NVDIMM region mapping structure. This unarmed flag indicates 153 + guest software that this vNVDIMM device contains a region that cannot 154 + accept persistent writes. In result, for example, the guest Linux 155 + NVDIMM driver, marks such vNVDIMM device as read-only.
+7
hw/acpi/nvdimm.c
··· 138 138 } QEMU_PACKED; 139 139 typedef struct NvdimmNfitMemDev NvdimmNfitMemDev; 140 140 141 + #define ACPI_NFIT_MEM_NOT_ARMED (1 << 3) 142 + 141 143 /* 142 144 * NVDIMM Control Region Structure 143 145 * ··· 284 286 nvdimm_build_structure_memdev(GArray *structures, DeviceState *dev) 285 287 { 286 288 NvdimmNfitMemDev *nfit_memdev; 289 + NVDIMMDevice *nvdimm = NVDIMM(OBJECT(dev)); 287 290 uint64_t size = object_property_get_uint(OBJECT(dev), PC_DIMM_SIZE_PROP, 288 291 NULL); 289 292 int slot = object_property_get_int(OBJECT(dev), PC_DIMM_SLOT_PROP, ··· 312 315 313 316 /* Only one interleave for PMEM. */ 314 317 nfit_memdev->interleave_ways = cpu_to_le16(1); 318 + 319 + if (nvdimm->unarmed) { 320 + nfit_memdev->flags |= cpu_to_le16(ACPI_NFIT_MEM_NOT_ARMED); 321 + } 315 322 } 316 323 317 324 /*
+26
hw/mem/nvdimm.c
··· 25 25 #include "qemu/osdep.h" 26 26 #include "qapi/error.h" 27 27 #include "qapi/visitor.h" 28 + #include "qapi-visit.h" 28 29 #include "hw/mem/nvdimm.h" 29 30 30 31 static void nvdimm_get_label_size(Object *obj, Visitor *v, const char *name, ··· 64 65 error_propagate(errp, local_err); 65 66 } 66 67 68 + static bool nvdimm_get_unarmed(Object *obj, Error **errp) 69 + { 70 + NVDIMMDevice *nvdimm = NVDIMM(obj); 71 + 72 + return nvdimm->unarmed; 73 + } 74 + 75 + static void nvdimm_set_unarmed(Object *obj, bool value, Error **errp) 76 + { 77 + NVDIMMDevice *nvdimm = NVDIMM(obj); 78 + Error *local_err = NULL; 79 + 80 + if (memory_region_size(&nvdimm->nvdimm_mr)) { 81 + error_setg(&local_err, "cannot change property value"); 82 + goto out; 83 + } 84 + 85 + nvdimm->unarmed = value; 86 + 87 + out: 88 + error_propagate(errp, local_err); 89 + } 90 + 67 91 static void nvdimm_init(Object *obj) 68 92 { 69 93 object_property_add(obj, NVDIMM_LABLE_SIZE_PROP, "int", 70 94 nvdimm_get_label_size, nvdimm_set_label_size, NULL, 71 95 NULL, NULL); 96 + object_property_add_bool(obj, NVDIMM_UNARMED_PROP, 97 + nvdimm_get_unarmed, nvdimm_set_unarmed, NULL); 72 98 } 73 99 74 100 static MemoryRegion *nvdimm_get_memory_region(PCDIMMDevice *dimm, Error **errp)
+9
include/hw/mem/nvdimm.h
··· 49 49 TYPE_NVDIMM) 50 50 51 51 #define NVDIMM_LABLE_SIZE_PROP "label-size" 52 + #define NVDIMM_UNARMED_PROP "unarmed" 52 53 53 54 struct NVDIMMDevice { 54 55 /* private */ ··· 74 75 * guest via ACPI NFIT and _FIT method if NVDIMM hotplug is supported. 75 76 */ 76 77 MemoryRegion nvdimm_mr; 78 + 79 + /* 80 + * The 'on' value results in the unarmed flag set in ACPI NFIT, 81 + * which can be used to notify guest implicitly that the host 82 + * backend (e.g., files on HDD, /dev/pmemX, etc.) cannot guarantee 83 + * the guest write persistence. 84 + */ 85 + bool unarmed; 77 86 }; 78 87 typedef struct NVDIMMDevice NVDIMMDevice; 79 88