qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

docs/system: Convert security.texi to rST format

security.texi is included from qemu-doc.texi but is not used
in the qemu.1 manpage. So we can do a straightforward conversion
of the contents, which go into the system manual.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-id: 20200228153619.9906-17-peter.maydell@linaro.org
Message-id: 20200226113034.6741-16-pbonzini@redhat.com

+174
+1
docs/system/index.rst
··· 14 14 .. toctree:: 15 15 :maxdepth: 2 16 16 17 + security 17 18 vfio-ap
+173
docs/system/security.rst
··· 1 + Security 2 + ======== 3 + 4 + Overview 5 + -------- 6 + 7 + This chapter explains the security requirements that QEMU is designed to meet 8 + and principles for securely deploying QEMU. 9 + 10 + Security Requirements 11 + --------------------- 12 + 13 + QEMU supports many different use cases, some of which have stricter security 14 + requirements than others. The community has agreed on the overall security 15 + requirements that users may depend on. These requirements define what is 16 + considered supported from a security perspective. 17 + 18 + Virtualization Use Case 19 + ''''''''''''''''''''''' 20 + 21 + The virtualization use case covers cloud and virtual private server (VPS) 22 + hosting, as well as traditional data center and desktop virtualization. These 23 + use cases rely on hardware virtualization extensions to execute guest code 24 + safely on the physical CPU at close-to-native speed. 25 + 26 + The following entities are untrusted, meaning that they may be buggy or 27 + malicious: 28 + 29 + - Guest 30 + - User-facing interfaces (e.g. VNC, SPICE, WebSocket) 31 + - Network protocols (e.g. NBD, live migration) 32 + - User-supplied files (e.g. disk images, kernels, device trees) 33 + - Passthrough devices (e.g. PCI, USB) 34 + 35 + Bugs affecting these entities are evaluated on whether they can cause damage in 36 + real-world use cases and treated as security bugs if this is the case. 37 + 38 + Non-virtualization Use Case 39 + ''''''''''''''''''''''''''' 40 + 41 + The non-virtualization use case covers emulation using the Tiny Code Generator 42 + (TCG). In principle the TCG and device emulation code used in conjunction with 43 + the non-virtualization use case should meet the same security requirements as 44 + the virtualization use case. However, for historical reasons much of the 45 + non-virtualization use case code was not written with these security 46 + requirements in mind. 47 + 48 + Bugs affecting the non-virtualization use case are not considered security 49 + bugs at this time. Users with non-virtualization use cases must not rely on 50 + QEMU to provide guest isolation or any security guarantees. 51 + 52 + Architecture 53 + ------------ 54 + 55 + This section describes the design principles that ensure the security 56 + requirements are met. 57 + 58 + Guest Isolation 59 + ''''''''''''''' 60 + 61 + Guest isolation is the confinement of guest code to the virtual machine. When 62 + guest code gains control of execution on the host this is called escaping the 63 + virtual machine. Isolation also includes resource limits such as throttling of 64 + CPU, memory, disk, or network. Guests must be unable to exceed their resource 65 + limits. 66 + 67 + QEMU presents an attack surface to the guest in the form of emulated devices. 68 + The guest must not be able to gain control of QEMU. Bugs in emulated devices 69 + could allow malicious guests to gain code execution in QEMU. At this point the 70 + guest has escaped the virtual machine and is able to act in the context of the 71 + QEMU process on the host. 72 + 73 + Guests often interact with other guests and share resources with them. A 74 + malicious guest must not gain control of other guests or access their data. 75 + Disk image files and network traffic must be protected from other guests unless 76 + explicitly shared between them by the user. 77 + 78 + Principle of Least Privilege 79 + '''''''''''''''''''''''''''' 80 + 81 + The principle of least privilege states that each component only has access to 82 + the privileges necessary for its function. In the case of QEMU this means that 83 + each process only has access to resources belonging to the guest. 84 + 85 + The QEMU process should not have access to any resources that are inaccessible 86 + to the guest. This way the guest does not gain anything by escaping into the 87 + QEMU process since it already has access to those same resources from within 88 + the guest. 89 + 90 + Following the principle of least privilege immediately fulfills guest isolation 91 + requirements. For example, guest A only has access to its own disk image file 92 + ``a.img`` and not guest B's disk image file ``b.img``. 93 + 94 + In reality certain resources are inaccessible to the guest but must be 95 + available to QEMU to perform its function. For example, host system calls are 96 + necessary for QEMU but are not exposed to guests. A guest that escapes into 97 + the QEMU process can then begin invoking host system calls. 98 + 99 + New features must be designed to follow the principle of least privilege. 100 + Should this not be possible for technical reasons, the security risk must be 101 + clearly documented so users are aware of the trade-off of enabling the feature. 102 + 103 + Isolation mechanisms 104 + '''''''''''''''''''' 105 + 106 + Several isolation mechanisms are available to realize this architecture of 107 + guest isolation and the principle of least privilege. With the exception of 108 + Linux seccomp, these mechanisms are all deployed by management tools that 109 + launch QEMU, such as libvirt. They are also platform-specific so they are only 110 + described briefly for Linux here. 111 + 112 + The fundamental isolation mechanism is that QEMU processes must run as 113 + unprivileged users. Sometimes it seems more convenient to launch QEMU as 114 + root to give it access to host devices (e.g. ``/dev/net/tun``) but this poses a 115 + huge security risk. File descriptor passing can be used to give an otherwise 116 + unprivileged QEMU process access to host devices without running QEMU as root. 117 + It is also possible to launch QEMU as a non-root user and configure UNIX groups 118 + for access to ``/dev/kvm``, ``/dev/net/tun``, and other device nodes. 119 + Some Linux distros already ship with UNIX groups for these devices by default. 120 + 121 + - SELinux and AppArmor make it possible to confine processes beyond the 122 + traditional UNIX process and file permissions model. They restrict the QEMU 123 + process from accessing processes and files on the host system that are not 124 + needed by QEMU. 125 + 126 + - Resource limits and cgroup controllers provide throughput and utilization 127 + limits on key resources such as CPU time, memory, and I/O bandwidth. 128 + 129 + - Linux namespaces can be used to make process, file system, and other system 130 + resources unavailable to QEMU. A namespaced QEMU process is restricted to only 131 + those resources that were granted to it. 132 + 133 + - Linux seccomp is available via the QEMU ``--sandbox`` option. It disables 134 + system calls that are not needed by QEMU, thereby reducing the host kernel 135 + attack surface. 136 + 137 + Sensitive configurations 138 + ------------------------ 139 + 140 + There are aspects of QEMU that can have security implications which users & 141 + management applications must be aware of. 142 + 143 + Monitor console (QMP and HMP) 144 + ''''''''''''''''''''''''''''' 145 + 146 + The monitor console (whether used with QMP or HMP) provides an interface 147 + to dynamically control many aspects of QEMU's runtime operation. Many of the 148 + commands exposed will instruct QEMU to access content on the host file system 149 + and/or trigger spawning of external processes. 150 + 151 + For example, the ``migrate`` command allows for the spawning of arbitrary 152 + processes for the purpose of tunnelling the migration data stream. The 153 + ``blockdev-add`` command instructs QEMU to open arbitrary files, exposing 154 + their content to the guest as a virtual disk. 155 + 156 + Unless QEMU is otherwise confined using technologies such as SELinux, AppArmor, 157 + or Linux namespaces, the monitor console should be considered to have privileges 158 + equivalent to those of the user account QEMU is running under. 159 + 160 + It is further important to consider the security of the character device backend 161 + over which the monitor console is exposed. It needs to have protection against 162 + malicious third parties which might try to make unauthorized connections, or 163 + perform man-in-the-middle attacks. Many of the character device backends do not 164 + satisfy this requirement and so must not be used for the monitor console. 165 + 166 + The general recommendation is that the monitor console should be exposed over 167 + a UNIX domain socket backend to the local host only. Use of the TCP based 168 + character device backend is inappropriate unless configured to use both TLS 169 + encryption and authorization control policy on client connections. 170 + 171 + In summary, the monitor console is considered a privileged control interface to 172 + QEMU and as such should only be made accessible to a trusted management 173 + application or user.