qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

vfio: Convert to ram_block_discard_disable()

VFIO is (except devices without a physical IOMMU or some mediated devices)
incompatible with discarding of RAM. The kernel will pin basically all VM
memory. Let's convert to ram_block_discard_disable(), which can now
fail, in contrast to qemu_balloon_inhibit().

Leave "x-balloon-allowed" named as it is for now.

Reviewed-by: Tony Krowiak <akrowiak@linux.ibm.com>
Acked-by: Cornelia Huck <cohuck@redhat.com>
Cc: Cornelia Huck <cohuck@redhat.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Christian Borntraeger <borntraeger@de.ibm.com>
Cc: Tony Krowiak <akrowiak@linux.ibm.com>
Cc: Halil Pasic <pasic@linux.ibm.com>
Cc: Pierre Morel <pmorel@linux.ibm.com>
Cc: Eric Farman <farman@linux.ibm.com>
Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20200626072248.78761-4-david@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

authored by

David Hildenbrand and committed by
Michael S. Tsirkin
aff92b82 d24f31db

+44 -38
+4 -4
hw/vfio/ap.c
··· 105 105 vapdev->vdev.dev = dev; 106 106 107 107 /* 108 - * vfio-ap devices operate in a way compatible with 109 - * memory ballooning, as no pages are pinned in the host. 108 + * vfio-ap devices operate in a way compatible with discarding of 109 + * memory in RAM blocks, as no pages are pinned in the host. 110 110 * This needs to be set before vfio_get_device() for vfio common to 111 - * handle the balloon inhibitor. 111 + * handle ram_block_discard_disable(). 112 112 */ 113 - vapdev->vdev.balloon_allowed = true; 113 + vapdev->vdev.ram_block_discard_allowed = true; 114 114 115 115 ret = vfio_get_device(vfio_group, mdevid, &vapdev->vdev, errp); 116 116 if (ret) {
+6 -5
hw/vfio/ccw.c
··· 574 574 575 575 /* 576 576 * All vfio-ccw devices are believed to operate in a way compatible with 577 - * memory ballooning, ie. pages pinned in the host are in the current 578 - * working set of the guest driver and therefore never overlap with pages 579 - * available to the guest balloon driver. This needs to be set before 580 - * vfio_get_device() for vfio common to handle the balloon inhibitor. 577 + * discarding of memory in RAM blocks, ie. pages pinned in the host are 578 + * in the current working set of the guest driver and therefore never 579 + * overlap e.g., with pages available to the guest balloon driver. This 580 + * needs to be set before vfio_get_device() for vfio common to handle 581 + * ram_block_discard_disable(). 581 582 */ 582 - vcdev->vdev.balloon_allowed = true; 583 + vcdev->vdev.ram_block_discard_allowed = true; 583 584 584 585 if (vfio_get_device(group, vcdev->cdev.mdevid, &vcdev->vdev, errp)) { 585 586 goto out_err;
+29 -24
hw/vfio/common.c
··· 33 33 #include "qemu/error-report.h" 34 34 #include "qemu/main-loop.h" 35 35 #include "qemu/range.h" 36 - #include "sysemu/balloon.h" 37 36 #include "sysemu/kvm.h" 38 37 #include "sysemu/reset.h" 39 38 #include "trace.h" ··· 1215 1214 space = vfio_get_address_space(as); 1216 1215 1217 1216 /* 1218 - * VFIO is currently incompatible with memory ballooning insofar as the 1217 + * VFIO is currently incompatible with discarding of RAM insofar as the 1219 1218 * madvise to purge (zap) the page from QEMU's address space does not 1220 1219 * interact with the memory API and therefore leaves stale virtual to 1221 1220 * physical mappings in the IOMMU if the page was previously pinned. We 1222 - * therefore add a balloon inhibit for each group added to a container, 1221 + * therefore set discarding broken for each group added to a container, 1223 1222 * whether the container is used individually or shared. This provides 1224 1223 * us with options to allow devices within a group to opt-in and allow 1225 - * ballooning, so long as it is done consistently for a group (for instance 1224 + * discarding, so long as it is done consistently for a group (for instance 1226 1225 * if the device is an mdev device where it is known that the host vendor 1227 1226 * driver will never pin pages outside of the working set of the guest 1228 - * driver, which would thus not be ballooning candidates). 1227 + * driver, which would thus not be discarding candidates). 1229 1228 * 1230 1229 * The first opportunity to induce pinning occurs here where we attempt to 1231 1230 * attach the group to existing containers within the AddressSpace. If any 1232 - * pages are already zapped from the virtual address space, such as from a 1233 - * previous ballooning opt-in, new pinning will cause valid mappings to be 1231 + * pages are already zapped from the virtual address space, such as from 1232 + * previous discards, new pinning will cause valid mappings to be 1234 1233 * re-established. Likewise, when the overall MemoryListener for a new 1235 1234 * container is registered, a replay of mappings within the AddressSpace 1236 1235 * will occur, re-establishing any previously zapped pages as well. 1237 1236 * 1238 - * NB. Balloon inhibiting does not currently block operation of the 1239 - * balloon driver or revoke previously pinned pages, it only prevents 1240 - * calling madvise to modify the virtual mapping of ballooned pages. 1237 + * Especially virtio-balloon is currently only prevented from discarding 1238 + * new memory, it will not yet set ram_block_discard_set_required() and 1239 + * therefore, neither stops us here or deals with the sudden memory 1240 + * consumption of inflated memory. 1241 1241 */ 1242 - qemu_balloon_inhibit(true); 1242 + ret = ram_block_discard_disable(true); 1243 + if (ret) { 1244 + error_setg_errno(errp, -ret, "Cannot set discarding of RAM broken"); 1245 + return ret; 1246 + } 1243 1247 1244 1248 QLIST_FOREACH(container, &space->containers, next) { 1245 1249 if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) { ··· 1405 1409 close(fd); 1406 1410 1407 1411 put_space_exit: 1408 - qemu_balloon_inhibit(false); 1412 + ram_block_discard_disable(false); 1409 1413 vfio_put_address_space(space); 1410 1414 1411 1415 return ret; ··· 1526 1530 return; 1527 1531 } 1528 1532 1529 - if (!group->balloon_allowed) { 1530 - qemu_balloon_inhibit(false); 1533 + if (!group->ram_block_discard_allowed) { 1534 + ram_block_discard_disable(false); 1531 1535 } 1532 1536 vfio_kvm_device_del_group(group); 1533 1537 vfio_disconnect_container(group); ··· 1565 1569 } 1566 1570 1567 1571 /* 1568 - * Clear the balloon inhibitor for this group if the driver knows the 1569 - * device operates compatibly with ballooning. Setting must be consistent 1570 - * per group, but since compatibility is really only possible with mdev 1571 - * currently, we expect singleton groups. 1572 + * Set discarding of RAM as not broken for this group if the driver knows 1573 + * the device operates compatibly with discarding. Setting must be 1574 + * consistent per group, but since compatibility is really only possible 1575 + * with mdev currently, we expect singleton groups. 1572 1576 */ 1573 - if (vbasedev->balloon_allowed != group->balloon_allowed) { 1577 + if (vbasedev->ram_block_discard_allowed != 1578 + group->ram_block_discard_allowed) { 1574 1579 if (!QLIST_EMPTY(&group->device_list)) { 1575 - error_setg(errp, 1576 - "Inconsistent device balloon setting within group"); 1580 + error_setg(errp, "Inconsistent setting of support for discarding " 1581 + "RAM (e.g., balloon) within group"); 1577 1582 close(fd); 1578 1583 return -1; 1579 1584 } 1580 1585 1581 - if (!group->balloon_allowed) { 1582 - group->balloon_allowed = true; 1583 - qemu_balloon_inhibit(false); 1586 + if (!group->ram_block_discard_allowed) { 1587 + group->ram_block_discard_allowed = true; 1588 + ram_block_discard_disable(false); 1584 1589 } 1585 1590 } 1586 1591
+3 -3
hw/vfio/pci.c
··· 2789 2789 } 2790 2790 2791 2791 /* 2792 - * Mediated devices *might* operate compatibly with memory ballooning, but 2792 + * Mediated devices *might* operate compatibly with discarding of RAM, but 2793 2793 * we cannot know for certain, it depends on whether the mdev vendor driver 2794 2794 * stays in sync with the active working set of the guest driver. Prevent 2795 2795 * the x-balloon-allowed option unless this is minimally an mdev device. ··· 2802 2802 2803 2803 trace_vfio_mdev(vdev->vbasedev.name, is_mdev); 2804 2804 2805 - if (vdev->vbasedev.balloon_allowed && !is_mdev) { 2805 + if (vdev->vbasedev.ram_block_discard_allowed && !is_mdev) { 2806 2806 error_setg(errp, "x-balloon-allowed only potentially compatible " 2807 2807 "with mdev devices"); 2808 2808 vfio_put_group(group); ··· 3156 3156 VFIO_FEATURE_ENABLE_IGD_OPREGION_BIT, false), 3157 3157 DEFINE_PROP_BOOL("x-no-mmap", VFIOPCIDevice, vbasedev.no_mmap, false), 3158 3158 DEFINE_PROP_BOOL("x-balloon-allowed", VFIOPCIDevice, 3159 - vbasedev.balloon_allowed, false), 3159 + vbasedev.ram_block_discard_allowed, false), 3160 3160 DEFINE_PROP_BOOL("x-no-kvm-intx", VFIOPCIDevice, no_kvm_intx, false), 3161 3161 DEFINE_PROP_BOOL("x-no-kvm-msi", VFIOPCIDevice, no_kvm_msi, false), 3162 3162 DEFINE_PROP_BOOL("x-no-kvm-msix", VFIOPCIDevice, no_kvm_msix, false),
+2 -2
include/hw/vfio/vfio-common.h
··· 108 108 bool reset_works; 109 109 bool needs_reset; 110 110 bool no_mmap; 111 - bool balloon_allowed; 111 + bool ram_block_discard_allowed; 112 112 VFIODeviceOps *ops; 113 113 unsigned int num_irqs; 114 114 unsigned int num_regions; ··· 128 128 QLIST_HEAD(, VFIODevice) device_list; 129 129 QLIST_ENTRY(VFIOGroup) next; 130 130 QLIST_ENTRY(VFIOGroup) container_next; 131 - bool balloon_allowed; 131 + bool ram_block_discard_allowed; 132 132 } VFIOGroup; 133 133 134 134 typedef struct VFIODMABuf {