Merge remote-tracking branch 'remotes/cohuck/tags/s390x-20200227' into staging

+2 -2

MAINTAINERS

··· 1259 1259 F: hw/s390x/ipl.* 1260 1260 F: pc-bios/s390-ccw/ 1261 1261 F: pc-bios/s390-ccw.img 1262 - F: docs/devel/s390-dasd-ipl.txt 1262 + F: docs/devel/s390-dasd-ipl.rst 1263 1263 T: git https://github.com/borntraeger/qemu.git s390-next 1264 1264 L: qemu-s390x@nongnu.org 1265 1265 ··· 1570 1570 F: include/hw/s390x/ap-device.h 1571 1571 F: include/hw/s390x/ap-bridge.h 1572 1572 F: hw/vfio/ap.c 1573 - F: docs/vfio-ap.txt 1573 + F: docs/system/vfio-ap.rst 1574 1574 L: qemu-s390x@nongnu.org 1575 1575 1576 1576 vhost

+1

docs/devel/index.rst

··· 25 25 tcg-plugins 26 26 bitops 27 27 reset 28 + s390-dasd-ipl

+138

docs/devel/s390-dasd-ipl.rst

··· 1 + Booting from real channel-attached devices on s390x 2 + =================================================== 3 + 4 + s390 hardware IPL 5 + ----------------- 6 + 7 + The s390 hardware IPL process consists of the following steps. 8 + 9 + 1. A READ IPL ccw is constructed in memory location ``0x0``. 10 + This ccw, by definition, reads the IPL1 record which is located on the disk 11 + at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw 12 + so when it is complete another ccw will be fetched and executed from memory 13 + location ``0x08``. 14 + 15 + 2. Execute the Read IPL ccw at ``0x00``, thereby reading IPL1 data into ``0x00``. 16 + IPL1 data is 24 bytes in length and consists of the following pieces of 17 + information: ``[psw][read ccw][tic ccw]``. When the machine executes the Read 18 + IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at 19 + location ``0x0``. Then the ccw program at ``0x08`` which consists of a read 20 + ccw and a tic ccw is automatically executed because of the chain flag from 21 + the original READ IPL ccw. The read ccw will read the IPL2 data into memory 22 + and the TIC (Transfer In Channel) will transfer control to the channel 23 + program contained in the IPL2 data. The TIC channel command is the 24 + equivalent of a branch/jump/goto instruction for channel programs. 25 + 26 + NOTE: The ccws in IPL1 are defined by the architecture to be format 0. 27 + 28 + 3. Execute IPL2. 29 + The TIC ccw instruction at the end of the IPL1 channel program will begin 30 + the execution of the IPL2 channel program. IPL2 is stage-2 of the boot 31 + process and will contain a larger channel program than IPL1. The point of 32 + IPL2 is to find and load either the operating system or a small program that 33 + loads the operating system from disk. At the end of this step all or some of 34 + the real operating system is loaded into memory and we are ready to hand 35 + control over to the guest operating system. At this point the guest 36 + operating system is entirely responsible for loading any more data it might 37 + need to function. 38 + 39 + NOTE: The IPL2 channel program might read data into memory 40 + location ``0x0`` thereby overwriting the IPL1 psw and channel program. This is ok 41 + as long as the data placed in location ``0x0`` contains a psw whose instruction 42 + address points to the guest operating system code to execute at the end of 43 + the IPL/boot process. 44 + 45 + NOTE: The ccws in IPL2 are defined by the architecture to be format 0. 46 + 47 + 4. Start executing the guest operating system. 48 + The psw that was loaded into memory location ``0x0`` as part of the ipl process 49 + should contain the needed flags for the operating system we have loaded. The 50 + psw's instruction address will point to the location in memory where we want 51 + to start executing the operating system. This psw is loaded (via LPSW 52 + instruction) causing control to be passed to the operating system code. 53 + 54 + In a non-virtualized environment this process, handled entirely by the hardware, 55 + is kicked off by the user initiating a "Load" procedure from the hardware 56 + management console. This "Load" procedure crafts a special "Read IPL" ccw in 57 + memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking 58 + off the reading of IPL1 data. Since the channel program from IPL1 will be 59 + written immediately after the special "Read IPL" ccw, the IPL1 channel program 60 + will be executed immediately (the special read ccw has the chaining bit turned 61 + on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel 62 + program to be executed automatically. After this sequence completes the "Load" 63 + procedure then loads the psw from ``0x0``. 64 + 65 + How this all pertains to QEMU (and the kernel) 66 + ---------------------------------------------- 67 + 68 + In theory we should merely have to do the following to IPL/boot a guest 69 + operating system from a DASD device: 70 + 71 + 1. Place a "Read IPL" ccw into memory location ``0x0`` with chaining bit on. 72 + 2. Execute channel program at ``0x0``. 73 + 3. LPSW ``0x0``. 74 + 75 + However, our emulation of the machine's channel program logic within the kernel 76 + is missing one key feature that is required for this process to work: 77 + non-prefetch of ccw data. 78 + 79 + When we start a channel program we pass the channel subsystem parameters via an 80 + ORB (Operation Request Block). One of those parameters is a prefetch bit. If the 81 + bit is on then the vfio-ccw kernel driver is allowed to read the entire channel 82 + program from guest memory before it starts executing it. This means that any 83 + channel commands that read additional channel commands will not work as expected 84 + because the newly read commands will only exist in guest memory and NOT within 85 + the kernel's channel subsystem memory. The kernel vfio-ccw driver currently 86 + requires this bit to be on for all channel programs. This is a problem because 87 + the IPL process consists of transferring control from the "Read IPL" ccw 88 + immediately to the IPL1 channel program that was read by "Read IPL". 89 + 90 + Not being able to turn off prefetch will also prevent the TIC at the end of the 91 + IPL1 channel program from transferring control to the IPL2 channel program. 92 + 93 + Lastly, in some cases (the zipl bootloader for example) the IPL2 program also 94 + transfers control to another channel program segment immediately after reading 95 + it from the disk. So we need to be able to handle this case. 96 + 97 + What QEMU does 98 + -------------- 99 + 100 + Since we are forced to live with prefetch we cannot use the very simple IPL 101 + procedure we defined in the preceding section. So we compensate by doing the 102 + following. 103 + 104 + 1. Place "Read IPL" ccw into memory location ``0x0``, but turn off chaining bit. 105 + 2. Execute "Read IPL" at ``0x0``. 106 + 107 + So now IPL1's psw is at ``0x0`` and IPL1's channel program is at ``0x08``. 108 + 109 + 3. Write a custom channel program that will seek to the IPL2 record and then 110 + execute the READ and TIC ccws from IPL1. Normally the seek is not required 111 + because after reading the IPL1 record the disk is automatically positioned 112 + to read the very next record which will be IPL2. But since we are not reading 113 + both IPL1 and IPL2 as part of the same channel program we must manually set 114 + the position. 115 + 116 + 4. Grab the target address of the TIC instruction from the IPL1 channel program. 117 + This address is where the IPL2 channel program starts. 118 + 119 + Now IPL2 is loaded into memory somewhere, and we know the address. 120 + 121 + 5. Execute the IPL2 channel program at the address obtained in step #4. 122 + 123 + Because this channel program can be dynamic, we must use a special algorithm 124 + that detects a READ immediately followed by a TIC and breaks the ccw chain 125 + by turning off the chain bit in the READ ccw. When control is returned from 126 + the kernel/hardware to the QEMU bios code we immediately issue another start 127 + subchannel to execute the remaining TIC instruction. This causes the entire 128 + channel program (starting from the TIC) and all needed data to be refetched 129 + thereby stepping around the limitation that would otherwise prevent this 130 + channel program from executing properly. 131 + 132 + Now the operating system code is loaded somewhere in guest memory and the psw 133 + in memory location ``0x0`` will point to entry code for the guest operating 134 + system. 135 + 136 + 6. LPSW ``0x0`` 137 + 138 + LPSW transfers control to the guest operating system and we're done.

-133

docs/devel/s390-dasd-ipl.txt

··· 1 - ***************************** 2 - ***** s390 hardware IPL ***** 3 - ***************************** 4 - 5 - The s390 hardware IPL process consists of the following steps. 6 - 7 - 1. A READ IPL ccw is constructed in memory location 0x0. 8 - This ccw, by definition, reads the IPL1 record which is located on the disk 9 - at cylinder 0 track 0 record 1. Note that the chain flag is on in this ccw 10 - so when it is complete another ccw will be fetched and executed from memory 11 - location 0x08. 12 - 13 - 2. Execute the Read IPL ccw at 0x00, thereby reading IPL1 data into 0x00. 14 - IPL1 data is 24 bytes in length and consists of the following pieces of 15 - information: [psw][read ccw][tic ccw]. When the machine executes the Read 16 - IPL ccw it read the 24-bytes of IPL1 to be read into memory starting at 17 - location 0x0. Then the ccw program at 0x08 which consists of a read 18 - ccw and a tic ccw is automatically executed because of the chain flag from 19 - the original READ IPL ccw. The read ccw will read the IPL2 data into memory 20 - and the TIC (Transfer In Channel) will transfer control to the channel 21 - program contained in the IPL2 data. The TIC channel command is the 22 - equivalent of a branch/jump/goto instruction for channel programs. 23 - NOTE: The ccws in IPL1 are defined by the architecture to be format 0. 24 - 25 - 3. Execute IPL2. 26 - The TIC ccw instruction at the end of the IPL1 channel program will begin 27 - the execution of the IPL2 channel program. IPL2 is stage-2 of the boot 28 - process and will contain a larger channel program than IPL1. The point of 29 - IPL2 is to find and load either the operating system or a small program that 30 - loads the operating system from disk. At the end of this step all or some of 31 - the real operating system is loaded into memory and we are ready to hand 32 - control over to the guest operating system. At this point the guest 33 - operating system is entirely responsible for loading any more data it might 34 - need to function. NOTE: The IPL2 channel program might read data into memory 35 - location 0 thereby overwriting the IPL1 psw and channel program. This is ok 36 - as long as the data placed in location 0 contains a psw whose instruction 37 - address points to the guest operating system code to execute at the end of 38 - the IPL/boot process. 39 - NOTE: The ccws in IPL2 are defined by the architecture to be format 0. 40 - 41 - 4. Start executing the guest operating system. 42 - The psw that was loaded into memory location 0 as part of the ipl process 43 - should contain the needed flags for the operating system we have loaded. The 44 - psw's instruction address will point to the location in memory where we want 45 - to start executing the operating system. This psw is loaded (via LPSW 46 - instruction) causing control to be passed to the operating system code. 47 - 48 - In a non-virtualized environment this process, handled entirely by the hardware, 49 - is kicked off by the user initiating a "Load" procedure from the hardware 50 - management console. This "Load" procedure crafts a special "Read IPL" ccw in 51 - memory location 0x0 that reads IPL1. It then executes this ccw thereby kicking 52 - off the reading of IPL1 data. Since the channel program from IPL1 will be 53 - written immediately after the special "Read IPL" ccw, the IPL1 channel program 54 - will be executed immediately (the special read ccw has the chaining bit turned 55 - on). The TIC at the end of the IPL1 channel program will cause the IPL2 channel 56 - program to be executed automatically. After this sequence completes the "Load" 57 - procedure then loads the psw from 0x0. 58 - 59 - ********************************************************** 60 - ***** How this all pertains to QEMU (and the kernel) ***** 61 - ********************************************************** 62 - 63 - In theory we should merely have to do the following to IPL/boot a guest 64 - operating system from a DASD device: 65 - 66 - 1. Place a "Read IPL" ccw into memory location 0x0 with chaining bit on. 67 - 2. Execute channel program at 0x0. 68 - 3. LPSW 0x0. 69 - 70 - However, our emulation of the machine's channel program logic within the kernel 71 - is missing one key feature that is required for this process to work: 72 - non-prefetch of ccw data. 73 - 74 - When we start a channel program we pass the channel subsystem parameters via an 75 - ORB (Operation Request Block). One of those parameters is a prefetch bit. If the 76 - bit is on then the vfio-ccw kernel driver is allowed to read the entire channel 77 - program from guest memory before it starts executing it. This means that any 78 - channel commands that read additional channel commands will not work as expected 79 - because the newly read commands will only exist in guest memory and NOT within 80 - the kernel's channel subsystem memory. The kernel vfio-ccw driver currently 81 - requires this bit to be on for all channel programs. This is a problem because 82 - the IPL process consists of transferring control from the "Read IPL" ccw 83 - immediately to the IPL1 channel program that was read by "Read IPL". 84 - 85 - Not being able to turn off prefetch will also prevent the TIC at the end of the 86 - IPL1 channel program from transferring control to the IPL2 channel program. 87 - 88 - Lastly, in some cases (the zipl bootloader for example) the IPL2 program also 89 - transfers control to another channel program segment immediately after reading 90 - it from the disk. So we need to be able to handle this case. 91 - 92 - ************************** 93 - ***** What QEMU does ***** 94 - ************************** 95 - 96 - Since we are forced to live with prefetch we cannot use the very simple IPL 97 - procedure we defined in the preceding section. So we compensate by doing the 98 - following. 99 - 100 - 1. Place "Read IPL" ccw into memory location 0x0, but turn off chaining bit. 101 - 2. Execute "Read IPL" at 0x0. 102 - 103 - So now IPL1's psw is at 0x0 and IPL1's channel program is at 0x08. 104 - 105 - 4. Write a custom channel program that will seek to the IPL2 record and then 106 - execute the READ and TIC ccws from IPL1. Normally the seek is not required 107 - because after reading the IPL1 record the disk is automatically positioned 108 - to read the very next record which will be IPL2. But since we are not reading 109 - both IPL1 and IPL2 as part of the same channel program we must manually set 110 - the position. 111 - 112 - 5. Grab the target address of the TIC instruction from the IPL1 channel program. 113 - This address is where the IPL2 channel program starts. 114 - 115 - Now IPL2 is loaded into memory somewhere, and we know the address. 116 - 117 - 6. Execute the IPL2 channel program at the address obtained in step #5. 118 - 119 - Because this channel program can be dynamic, we must use a special algorithm 120 - that detects a READ immediately followed by a TIC and breaks the ccw chain 121 - by turning off the chain bit in the READ ccw. When control is returned from 122 - the kernel/hardware to the QEMU bios code we immediately issue another start 123 - subchannel to execute the remaining TIC instruction. This causes the entire 124 - channel program (starting from the TIC) and all needed data to be refetched 125 - thereby stepping around the limitation that would otherwise prevent this 126 - channel program from executing properly. 127 - 128 - Now the operating system code is loaded somewhere in guest memory and the psw 129 - in memory location 0x0 will point to entry code for the guest operating 130 - system. 131 - 132 - 7. LPSW 0x0. 133 - LPSW transfers control to the guest operating system and we're done.

+1

docs/system/index.rst

··· 15 15 :maxdepth: 2 16 16 17 17 qemu-block-drivers 18 + vfio-ap

+916

docs/system/vfio-ap.rst

··· 1 + Adjunct Processor (AP) Device 2 + ============================= 3 + 4 + .. contents:: 5 + 6 + Introduction 7 + ------------ 8 + 9 + The IBM Adjunct Processor (AP) Cryptographic Facility is comprised 10 + of three AP instructions and from 1 to 256 PCIe cryptographic adapter cards. 11 + These AP devices provide cryptographic functions to all CPUs assigned to a 12 + linux system running in an IBM Z system LPAR. 13 + 14 + On s390x, AP adapter cards are exposed via the AP bus. This document 15 + describes how those cards may be made available to KVM guests using the 16 + VFIO mediated device framework. 17 + 18 + AP Architectural Overview 19 + ------------------------- 20 + 21 + In order understand the terminology used in the rest of this document, let's 22 + start with some definitions: 23 + 24 + * AP adapter 25 + 26 + An AP adapter is an IBM Z adapter card that can perform cryptographic 27 + functions. There can be from 0 to 256 adapters assigned to an LPAR depending 28 + on the machine model. Adapters assigned to the LPAR in which a linux host is 29 + running will be available to the linux host. Each adapter is identified by a 30 + number from 0 to 255; however, the maximum adapter number allowed is 31 + determined by machine model. When installed, an AP adapter is accessed by 32 + AP instructions executed by any CPU. 33 + 34 + * AP domain 35 + 36 + An adapter is partitioned into domains. Each domain can be thought of as 37 + a set of hardware registers for processing AP instructions. An adapter can 38 + hold up to 256 domains; however, the maximum domain number allowed is 39 + determined by machine model. Each domain is identified by a number from 0 to 40 + 255. Domains can be further classified into two types: 41 + 42 + * Usage domains are domains that can be accessed directly to process AP 43 + commands 44 + 45 + * Control domains are domains that are accessed indirectly by AP 46 + commands sent to a usage domain to control or change the domain; for 47 + example, to set a secure private key for the domain. 48 + 49 + * AP Queue 50 + 51 + An AP queue is the means by which an AP command-request message is sent to an 52 + AP usage domain inside a specific AP. An AP queue is identified by a tuple 53 + comprised of an AP adapter ID (APID) and an AP queue index (APQI). The 54 + APQI corresponds to a given usage domain number within the adapter. This tuple 55 + forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP 56 + instructions include a field containing the APQN to identify the AP queue to 57 + which the AP command-request message is to be sent for processing. 58 + 59 + * AP Instructions: 60 + 61 + There are three AP instructions: 62 + 63 + * NQAP: to enqueue an AP command-request message to a queue 64 + * DQAP: to dequeue an AP command-reply message from a queue 65 + * PQAP: to administer the queues 66 + 67 + AP instructions identify the domain that is targeted to process the AP 68 + command; this must be one of the usage domains. An AP command may modify a 69 + domain that is not one of the usage domains, but the modified domain 70 + must be one of the control domains. 71 + 72 + Start Interpretive Execution (SIE) Instruction 73 + ---------------------------------------------- 74 + 75 + A KVM guest is started by executing the Start Interpretive Execution (SIE) 76 + instruction. The SIE state description is a control block that contains the 77 + state information for a KVM guest and is supplied as input to the SIE 78 + instruction. The SIE state description contains a satellite control block called 79 + the Crypto Control Block (CRYCB). The CRYCB contains three fields to identify 80 + the adapters, usage domains and control domains assigned to the KVM guest: 81 + 82 + * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned 83 + to the KVM guest. Each bit in the mask, from left to right, corresponds to 84 + an APID from 0-255. If a bit is set, the corresponding adapter is valid for 85 + use by the KVM guest. 86 + 87 + * The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains 88 + assigned to the KVM guest. Each bit in the mask, from left to right, 89 + corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the 90 + corresponding queue is valid for use by the KVM guest. 91 + 92 + * The AP Domain Mask field is a bit mask that identifies the AP control domains 93 + assigned to the KVM guest. The ADM bit mask controls which domains can be 94 + changed by an AP command-request message sent to a usage domain from the 95 + guest. Each bit in the mask, from left to right, corresponds to a domain from 96 + 0-255. If a bit is set, the corresponding domain can be modified by an AP 97 + command-request message sent to a usage domain. 98 + 99 + If you recall from the description of an AP Queue, AP instructions include 100 + an APQN to identify the AP adapter and AP queue to which an AP command-request 101 + message is to be sent (NQAP and PQAP instructions), or from which a 102 + command-reply message is to be received (DQAP instruction). The validity of an 103 + APQN is defined by the matrix calculated from the APM and AQM; it is the 104 + cross product of all assigned adapter numbers (APM) with all assigned queue 105 + indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are 106 + assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for 107 + the guest. 108 + 109 + The APQNs can provide secure key functionality - i.e., a private key is stored 110 + on the adapter card for each of its domains - so each APQN must be assigned to 111 + at most one guest or the linux host. 112 + 113 + Example 1: Valid configuration 114 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 115 + 116 + +----------+--------+--------+ 117 + | | Guest1 | Guest2 | 118 + +==========+========+========+ 119 + | adapters | 1, 2 | 1, 2 | 120 + +----------+--------+--------+ 121 + | domains | 5, 6 | 7 | 122 + +----------+--------+--------+ 123 + 124 + This is valid because both guests have a unique set of APQNs: 125 + 126 + * Guest1 has APQNs (1,5), (1,6), (2,5) and (2,6); 127 + * Guest2 has APQNs (1,7) and (2,7). 128 + 129 + Example 2: Valid configuration 130 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 131 + 132 + +----------+--------+--------+ 133 + | | Guest1 | Guest2 | 134 + +==========+========+========+ 135 + | adapters | 1, 2 | 3, 4 | 136 + +----------+--------+--------+ 137 + | domains | 5, 6 | 5, 6 | 138 + +----------+--------+--------+ 139 + 140 + This is also valid because both guests have a unique set of APQNs: 141 + 142 + * Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); 143 + * Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) 144 + 145 + Example 3: Invalid configuration 146 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 147 + 148 + +----------+--------+--------+ 149 + | | Guest1 | Guest2 | 150 + +==========+========+========+ 151 + | adapters | 1, 2 | 1 | 152 + +----------+--------+--------+ 153 + | domains | 5, 6 | 6, 7 | 154 + +----------+--------+--------+ 155 + 156 + This is an invalid configuration because both guests have access to 157 + APQN (1,6). 158 + 159 + AP Matrix Configuration on Linux Host 160 + ------------------------------------- 161 + 162 + A linux system is a guest of the LPAR in which it is running and has access to 163 + the AP resources configured for the LPAR. The LPAR's AP matrix is 164 + configured via its Activation Profile which can be edited on the HMC. When the 165 + linux system is started, the AP bus will detect the AP devices assigned to the 166 + LPAR and create the following in sysfs:: 167 + 168 + /sys/bus/ap 169 + ... [devices] 170 + ...... xx.yyyy 171 + ...... ... 172 + ...... cardxx 173 + ...... ... 174 + 175 + Where: 176 + 177 + ``cardxx`` 178 + is AP adapter number xx (in hex) 179 + 180 + ``xx.yyyy`` 181 + is an APQN with xx specifying the APID and yyyy specifying the APQI 182 + 183 + For example, if AP adapters 5 and 6 and domains 4, 71 (0x47), 171 (0xab) and 184 + 255 (0xff) are configured for the LPAR, the sysfs representation on the linux 185 + host system would look like this:: 186 + 187 + /sys/bus/ap 188 + ... [devices] 189 + ...... 05.0004 190 + ...... 05.0047 191 + ...... 05.00ab 192 + ...... 05.00ff 193 + ...... 06.0004 194 + ...... 06.0047 195 + ...... 06.00ab 196 + ...... 06.00ff 197 + ...... card05 198 + ...... card06 199 + 200 + A set of default device drivers are also created to control each type of AP 201 + device that can be assigned to the LPAR on which a linux host is running:: 202 + 203 + /sys/bus/ap 204 + ... [drivers] 205 + ...... [cex2acard] for Crypto Express 2/3 accelerator cards 206 + ...... [cex2aqueue] for AP queues served by Crypto Express 2/3 207 + accelerator cards 208 + ...... [cex4card] for Crypto Express 4/5/6 accelerator and coprocessor 209 + cards 210 + ...... [cex4queue] for AP queues served by Crypto Express 4/5/6 211 + accelerator and coprocessor cards 212 + ...... [pcixcccard] for Crypto Express 2/3 coprocessor cards 213 + ...... [pcixccqueue] for AP queues served by Crypto Express 2/3 214 + coprocessor cards 215 + 216 + Binding AP devices to device drivers 217 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 218 + 219 + There are two sysfs files that specify bitmasks marking a subset of the APQN 220 + range as 'usable by the default AP queue device drivers' or 'not usable by the 221 + default device drivers' and thus available for use by the alternate device 222 + driver(s). The sysfs locations of the masks are:: 223 + 224 + /sys/bus/ap/apmask 225 + /sys/bus/ap/aqmask 226 + 227 + The ``apmask`` is a 256-bit mask that identifies a set of AP adapter IDs 228 + (APID). Each bit in the mask, from left to right (i.e., from most significant 229 + to least significant bit in big endian order), corresponds to an APID from 230 + 0-255. If a bit is set, the APID is marked as usable only by the default AP 231 + queue device drivers; otherwise, the APID is usable by the vfio_ap 232 + device driver. 233 + 234 + The ``aqmask`` is a 256-bit mask that identifies a set of AP queue indexes 235 + (APQI). Each bit in the mask, from left to right (i.e., from most significant 236 + to least significant bit in big endian order), corresponds to an APQI from 237 + 0-255. If a bit is set, the APQI is marked as usable only by the default AP 238 + queue device drivers; otherwise, the APQI is usable by the vfio_ap device 239 + driver. 240 + 241 + Take, for example, the following mask:: 242 + 243 + 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 244 + 245 + It indicates: 246 + 247 + 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 248 + belong to the vfio_ap device driver's pool. 249 + 250 + The APQN of each AP queue device assigned to the linux host is checked by the 251 + AP bus against the set of APQNs derived from the cross product of APIDs 252 + and APQIs marked as usable only by the default AP queue device drivers. If a 253 + match is detected, only the default AP queue device drivers will be probed; 254 + otherwise, the vfio_ap device driver will be probed. 255 + 256 + By default, the two masks are set to reserve all APQNs for use by the default 257 + AP queue device drivers. There are two ways the default masks can be changed: 258 + 259 + 1. The sysfs mask files can be edited by echoing a string into the 260 + respective sysfs mask file in one of two formats: 261 + 262 + * An absolute hex string starting with 0x - like "0x12345678" - sets 263 + the mask. If the given string is shorter than the mask, it is padded 264 + with 0s on the right; for example, specifying a mask value of 0x41 is 265 + the same as specifying:: 266 + 267 + 0x4100000000000000000000000000000000000000000000000000000000000000 268 + 269 + Keep in mind that the mask reads from left to right (i.e., most 270 + significant to least significant bit in big endian order), so the mask 271 + above identifies device numbers 1 and 7 (``01000001``). 272 + 273 + If the string is longer than the mask, the operation is terminated with 274 + an error (EINVAL). 275 + 276 + * Individual bits in the mask can be switched on and off by specifying 277 + each bit number to be switched in a comma separated list. Each bit 278 + number string must be prepended with a (``+``) or minus (``-``) to indicate 279 + the corresponding bit is to be switched on (``+``) or off (``-``). Some 280 + valid values are:: 281 + 282 + "+0" switches bit 0 on 283 + "-13" switches bit 13 off 284 + "+0x41" switches bit 65 on 285 + "-0xff" switches bit 255 off 286 + 287 + The following example:: 288 + 289 + +0,-6,+0x47,-0xf0 290 + 291 + Switches bits 0 and 71 (0x47) on 292 + Switches bits 6 and 240 (0xf0) off 293 + 294 + Note that the bits not specified in the list remain as they were before 295 + the operation. 296 + 297 + 2. The masks can also be changed at boot time via parameters on the kernel 298 + command line like this:: 299 + 300 + ap.apmask=0xffff ap.aqmask=0x40 301 + 302 + This would create the following masks: 303 + 304 + apmask:: 305 + 306 + 0xffff000000000000000000000000000000000000000000000000000000000000 307 + 308 + aqmask:: 309 + 310 + 0x4000000000000000000000000000000000000000000000000000000000000000 311 + 312 + Resulting in these two pools:: 313 + 314 + default drivers pool: adapter 0-15, domain 1 315 + alternate drivers pool: adapter 16-255, domains 0, 2-255 316 + 317 + Configuring an AP matrix for a linux guest 318 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 319 + 320 + The sysfs interfaces for configuring an AP matrix for a guest are built on the 321 + VFIO mediated device framework. To configure an AP matrix for a guest, a 322 + mediated matrix device must first be created for the ``/sys/devices/vfio_ap/matrix`` 323 + device. When the vfio_ap device driver is loaded, it registers with the VFIO 324 + mediated device framework. When the driver registers, the sysfs interfaces for 325 + creating mediated matrix devices is created:: 326 + 327 + /sys/devices 328 + ... [vfio_ap] 329 + ......[matrix] 330 + ......... [mdev_supported_types] 331 + ............ [vfio_ap-passthrough] 332 + ............... create 333 + ............... [devices] 334 + 335 + A mediated AP matrix device is created by writing a UUID to the attribute file 336 + named ``create``, for example:: 337 + 338 + uuidgen > create 339 + 340 + or 341 + 342 + :: 343 + 344 + echo $uuid > create 345 + 346 + When a mediated AP matrix device is created, a sysfs directory named after 347 + the UUID is created in the ``devices`` subdirectory:: 348 + 349 + /sys/devices 350 + ... [vfio_ap] 351 + ......[matrix] 352 + ......... [mdev_supported_types] 353 + ............ [vfio_ap-passthrough] 354 + ............... create 355 + ............... [devices] 356 + .................. [$uuid] 357 + 358 + There will also be three sets of attribute files created in the mediated 359 + matrix device's sysfs directory to configure an AP matrix for the 360 + KVM guest:: 361 + 362 + /sys/devices 363 + ... [vfio_ap] 364 + ......[matrix] 365 + ......... [mdev_supported_types] 366 + ............ [vfio_ap-passthrough] 367 + ............... create 368 + ............... [devices] 369 + .................. [$uuid] 370 + ..................... assign_adapter 371 + ..................... assign_control_domain 372 + ..................... assign_domain 373 + ..................... matrix 374 + ..................... unassign_adapter 375 + ..................... unassign_control_domain 376 + ..................... unassign_domain 377 + 378 + ``assign_adapter`` 379 + To assign an AP adapter to the mediated matrix device, its APID is written 380 + to the ``assign_adapter`` file. This may be done multiple times to assign more 381 + than one adapter. The APID may be specified using conventional semantics 382 + as a decimal, hexadecimal, or octal number. For example, to assign adapters 383 + 4, 5 and 16 to a mediated matrix device in decimal, hexadecimal and octal 384 + respectively:: 385 + 386 + echo 4 > assign_adapter 387 + echo 0x5 > assign_adapter 388 + echo 020 > assign_adapter 389 + 390 + In order to successfully assign an adapter: 391 + 392 + * The adapter number specified must represent a value from 0 up to the 393 + maximum adapter number allowed by the machine model. If an adapter number 394 + higher than the maximum is specified, the operation will terminate with 395 + an error (ENODEV). 396 + 397 + * All APQNs that can be derived from the adapter ID being assigned and the 398 + IDs of the previously assigned domains must be bound to the vfio_ap device 399 + driver. If no domains have yet been assigned, then there must be at least 400 + one APQN with the specified APID bound to the vfio_ap driver. If no such 401 + APQNs are bound to the driver, the operation will terminate with an 402 + error (EADDRNOTAVAIL). 403 + 404 + * No APQN that can be derived from the adapter ID and the IDs of the 405 + previously assigned domains can be assigned to another mediated matrix 406 + device. If an APQN is assigned to another mediated matrix device, the 407 + operation will terminate with an error (EADDRINUSE). 408 + 409 + ``unassign_adapter`` 410 + To unassign an AP adapter, its APID is written to the ``unassign_adapter`` 411 + file. This may also be done multiple times to unassign more than one adapter. 412 + 413 + ``assign_domain`` 414 + To assign a usage domain, the domain number is written into the 415 + ``assign_domain`` file. This may be done multiple times to assign more than one 416 + usage domain. The domain number is specified using conventional semantics as 417 + a decimal, hexadecimal, or octal number. For example, to assign usage domains 418 + 4, 8, and 71 to a mediated matrix device in decimal, hexadecimal and octal 419 + respectively:: 420 + 421 + echo 4 > assign_domain 422 + echo 0x8 > assign_domain 423 + echo 0107 > assign_domain 424 + 425 + In order to successfully assign a domain: 426 + 427 + * The domain number specified must represent a value from 0 up to the 428 + maximum domain number allowed by the machine model. If a domain number 429 + higher than the maximum is specified, the operation will terminate with 430 + an error (ENODEV). 431 + 432 + * All APQNs that can be derived from the domain ID being assigned and the IDs 433 + of the previously assigned adapters must be bound to the vfio_ap device 434 + driver. If no domains have yet been assigned, then there must be at least 435 + one APQN with the specified APQI bound to the vfio_ap driver. If no such 436 + APQNs are bound to the driver, the operation will terminate with an 437 + error (EADDRNOTAVAIL). 438 + 439 + * No APQN that can be derived from the domain ID being assigned and the IDs 440 + of the previously assigned adapters can be assigned to another mediated 441 + matrix device. If an APQN is assigned to another mediated matrix device, 442 + the operation will terminate with an error (EADDRINUSE). 443 + 444 + ``unassign_domain`` 445 + To unassign a usage domain, the domain number is written into the 446 + ``unassign_domain`` file. This may be done multiple times to unassign more than 447 + one usage domain. 448 + 449 + ``assign_control_domain`` 450 + To assign a control domain, the domain number is written into the 451 + ``assign_control_domain`` file. This may be done multiple times to 452 + assign more than one control domain. The domain number may be specified using 453 + conventional semantics as a decimal, hexadecimal, or octal number. For 454 + example, to assign control domains 4, 8, and 71 to a mediated matrix device 455 + in decimal, hexadecimal and octal respectively:: 456 + 457 + echo 4 > assign_domain 458 + echo 0x8 > assign_domain 459 + echo 0107 > assign_domain 460 + 461 + In order to successfully assign a control domain, the domain number 462 + specified must represent a value from 0 up to the maximum domain number 463 + allowed by the machine model. If a control domain number higher than the 464 + maximum is specified, the operation will terminate with an error (ENODEV). 465 + 466 + ``unassign_control_domain`` 467 + To unassign a control domain, the domain number is written into the 468 + ``unassign_domain`` file. This may be done multiple times to unassign more than 469 + one control domain. 470 + 471 + Notes: No changes to the AP matrix will be allowed while a guest using 472 + the mediated matrix device is running. Attempts to assign an adapter, 473 + domain or control domain will be rejected and an error (EBUSY) returned. 474 + 475 + Starting a Linux Guest Configured with an AP Matrix 476 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 477 + 478 + To provide a mediated matrix device for use by a guest, the following option 479 + must be specified on the QEMU command line:: 480 + 481 + -device vfio_ap,sysfsdev=$path-to-mdev 482 + 483 + The sysfsdev parameter specifies the path to the mediated matrix device. 484 + There are a number of ways to specify this path:: 485 + 486 + /sys/devices/vfio_ap/matrix/$uuid 487 + /sys/bus/mdev/devices/$uuid 488 + /sys/bus/mdev/drivers/vfio_mdev/$uuid 489 + /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid 490 + 491 + When the linux guest is started, the guest will open the mediated 492 + matrix device's file descriptor to get information about the mediated matrix 493 + device. The ``vfio_ap`` device driver will update the APM, AQM, and ADM fields in 494 + the guest's CRYCB with the adapter, usage domain and control domains assigned 495 + via the mediated matrix device's sysfs attribute files. Programs running on the 496 + linux guest will then: 497 + 498 + 1. Have direct access to the APQNs derived from the cross product of the AP 499 + adapter numbers (APID) and queue indexes (APQI) specified in the APM and AQM 500 + fields of the guests's CRYCB respectively. These APQNs identify the AP queues 501 + that are valid for use by the guest; meaning, AP commands can be sent by the 502 + guest to any of these queues for processing. 503 + 504 + 2. Have authorization to process AP commands to change a control domain 505 + identified in the ADM field of the guest's CRYCB. The AP command must be sent 506 + to a valid APQN (see 1 above). 507 + 508 + CPU model features: 509 + 510 + Three CPU model features are available for controlling guest access to AP 511 + facilities: 512 + 513 + 1. AP facilities feature 514 + 515 + The AP facilities feature indicates that AP facilities are installed on the 516 + guest. This feature will be exposed for use only if the AP facilities 517 + are installed on the host system. The feature is s390-specific and is 518 + represented as a parameter of the -cpu option on the QEMU command line:: 519 + 520 + qemu-system-s390x -cpu $model,ap=on|off 521 + 522 + Where: 523 + 524 + ``$model`` 525 + is the CPU model defined for the guest (defaults to the model of 526 + the host system if not specified). 527 + 528 + ``ap=on|off`` 529 + indicates whether AP facilities are installed (on) or not 530 + (off). The default for CPU models zEC12 or newer 531 + is ``ap=on``. AP facilities must be installed on the guest if a 532 + vfio-ap device (``-device vfio-ap,sysfsdev=$path``) is configured 533 + for the guest, or the guest will fail to start. 534 + 535 + 2. Query Configuration Information (QCI) facility 536 + 537 + The QCI facility is used by the AP bus running on the guest to query the 538 + configuration of the AP facilities. This facility will be available 539 + only if the QCI facility is installed on the host system. The feature is 540 + s390-specific and is represented as a parameter of the -cpu option on the 541 + QEMU command line:: 542 + 543 + qemu-system-s390x -cpu $model,apqci=on|off 544 + 545 + Where: 546 + 547 + ``$model`` 548 + is the CPU model defined for the guest 549 + 550 + ``apqci=on|off`` 551 + indicates whether the QCI facility is installed (on) or 552 + not (off). The default for CPU models zEC12 or newer 553 + is ``apqci=on``; for older models, QCI will not be installed. 554 + 555 + If QCI is installed (``apqci=on``) but AP facilities are not 556 + (``ap=off``), an error message will be logged, but the guest 557 + will be allowed to start. It makes no sense to have QCI 558 + installed if the AP facilities are not; this is considered 559 + an invalid configuration. 560 + 561 + If the QCI facility is not installed, APQNs with an APQI 562 + greater than 15 will not be detected by the AP bus 563 + running on the guest. 564 + 565 + 3. Adjunct Process Facility Test (APFT) facility 566 + 567 + The APFT facility is used by the AP bus running on the guest to test the 568 + AP facilities available for a given AP queue. This facility will be available 569 + only if the APFT facility is installed on the host system. The feature is 570 + s390-specific and is represented as a parameter of the -cpu option on the 571 + QEMU command line:: 572 + 573 + qemu-system-s390x -cpu $model,apft=on|off 574 + 575 + Where: 576 + 577 + ``$model`` 578 + is the CPU model defined for the guest (defaults to the model of 579 + the host system if not specified). 580 + 581 + ``apft=on|off`` 582 + indicates whether the APFT facility is installed (on) or 583 + not (off). The default for CPU models zEC12 and 584 + newer is ``apft=on`` for older models, APFT will not be 585 + installed. 586 + 587 + If APFT is installed (``apft=on``) but AP facilities are not 588 + (``ap=off``), an error message will be logged, but the guest 589 + will be allowed to start. It makes no sense to have APFT 590 + installed if the AP facilities are not; this is considered 591 + an invalid configuration. 592 + 593 + It also makes no sense to turn APFT off because the AP bus 594 + running on the guest will not detect CEX4 and newer devices 595 + without it. Since only CEX4 and newer devices are supported 596 + for guest usage, no AP devices can be made accessible to a 597 + guest started without APFT installed. 598 + 599 + Hot plug a vfio-ap device into a running guest 600 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 601 + 602 + Only one vfio-ap device can be attached to the virtual machine's ap-bus, so a 603 + vfio-ap device can be hot plugged if and only if no vfio-ap device is attached 604 + to the bus already, whether via the QEMU command line or a prior hot plug 605 + action. 606 + 607 + To hot plug a vfio-ap device, use the QEMU ``device_add`` command:: 608 + 609 + (qemu) device_add vfio-ap,sysfsdev="$path-to-mdev" 610 + 611 + Where the ``$path-to-mdev`` value specifies the absolute path to a mediated 612 + device to which AP resources to be used by the guest have been assigned. 613 + 614 + Note that on Linux guests, the AP devices will be created in the 615 + ``/sys/bus/ap/devices`` directory when the AP bus subsequently performs its periodic 616 + scan, so there may be a short delay before the AP devices are accessible on the 617 + guest. 618 + 619 + The command will fail if: 620 + 621 + * A vfio-ap device has already been attached to the virtual machine's ap-bus. 622 + 623 + * The CPU model features for controlling guest access to AP facilities are not 624 + enabled (see 'CPU model features' subsection in the previous section). 625 + 626 + Hot unplug a vfio-ap device from a running guest 627 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 628 + 629 + A vfio-ap device can be unplugged from a running KVM guest if a vfio-ap device 630 + has been attached to the virtual machine's ap-bus via the QEMU command line 631 + or a prior hot plug action. 632 + 633 + To hot unplug a vfio-ap device, use the QEMU ``device_del`` command:: 634 + 635 + (qemu) device_del vfio-ap,sysfsdev="$path-to-mdev" 636 + 637 + Where ``$path-to-mdev`` is the same as the path specified when the vfio-ap 638 + device was attached to the virtual machine's ap-bus. 639 + 640 + On a Linux guest, the AP devices will be removed from the ``/sys/bus/ap/devices`` 641 + directory on the guest when the AP bus subsequently performs its periodic scan, 642 + so there may be a short delay before the AP devices are no longer accessible by 643 + the guest. 644 + 645 + The command will fail if the ``$path-to-mdev`` specified on the ``device_del`` command 646 + does not match the value specified when the vfio-ap device was attached to 647 + the virtual machine's ap-bus. 648 + 649 + Example: Configure AP Matrices for Three Linux Guests 650 + ----------------------------------------------------- 651 + 652 + Let's now provide an example to illustrate how KVM guests may be given 653 + access to AP facilities. For this example, we will show how to configure 654 + three guests such that executing the lszcrypt command on the guests would 655 + look like this: 656 + 657 + Guest1:: 658 + 659 + CARD.DOMAIN TYPE MODE 660 + ------------------------------ 661 + 05 CEX5C CCA-Coproc 662 + 05.0004 CEX5C CCA-Coproc 663 + 05.00ab CEX5C CCA-Coproc 664 + 06 CEX5A Accelerator 665 + 06.0004 CEX5A Accelerator 666 + 06.00ab CEX5C CCA-Coproc 667 + 668 + Guest2:: 669 + 670 + CARD.DOMAIN TYPE MODE 671 + ------------------------------ 672 + 05 CEX5A Accelerator 673 + 05.0047 CEX5A Accelerator 674 + 05.00ff CEX5A Accelerator 675 + 676 + Guest3:: 677 + 678 + CARD.DOMAIN TYPE MODE 679 + ------------------------------ 680 + 06 CEX5A Accelerator 681 + 06.0047 CEX5A Accelerator 682 + 06.00ff CEX5A Accelerator 683 + 684 + These are the steps: 685 + 686 + 1. Install the vfio_ap module on the linux host. The dependency chain for the 687 + vfio_ap module is: 688 + 689 + * iommu 690 + * s390 691 + * zcrypt 692 + * vfio 693 + * vfio_mdev 694 + * vfio_mdev_device 695 + * KVM 696 + 697 + To build the vfio_ap module, the kernel build must be configured with the 698 + following Kconfig elements selected: 699 + 700 + * IOMMU_SUPPORT 701 + * S390 702 + * ZCRYPT 703 + * S390_AP_IOMMU 704 + * VFIO 705 + * VFIO_MDEV 706 + * VFIO_MDEV_DEVICE 707 + * KVM 708 + 709 + If using make menuconfig select the following to build the vfio_ap module:: 710 + -> Device Drivers 711 + -> IOMMU Hardware Support 712 + select S390 AP IOMMU Support 713 + -> VFIO Non-Privileged userspace driver framework 714 + -> Mediated device driver framework 715 + -> VFIO driver for Mediated devices 716 + -> I/O subsystem 717 + -> VFIO support for AP devices 718 + 719 + 2. Secure the AP queues to be used by the three guests so that the host can not 720 + access them. To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 721 + 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device driver, 722 + the corresponding APQNs must be removed from the default queue drivers pool 723 + as follows:: 724 + 725 + echo -5,-6 > /sys/bus/ap/apmask 726 + 727 + echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask 728 + 729 + This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 730 + 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The 731 + sysfs directory for the vfio_ap device driver will now contain symbolic links 732 + to the AP queue devices bound to it:: 733 + 734 + /sys/bus/ap 735 + ... [drivers] 736 + ...... [vfio_ap] 737 + ......... [05.0004] 738 + ......... [05.0047] 739 + ......... [05.00ab] 740 + ......... [05.00ff] 741 + ......... [06.0004] 742 + ......... [06.0047] 743 + ......... [06.00ab] 744 + ......... [06.00ff] 745 + 746 + Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) 747 + can be bound to the vfio_ap device driver. The reason for this is to 748 + simplify the implementation by not needlessly complicating the design by 749 + supporting older devices that will go out of service in the relatively near 750 + future, and for which there are few older systems on which to test. 751 + 752 + The administrator, therefore, must take care to secure only AP queues that 753 + can be bound to the vfio_ap device driver. The device type for a given AP 754 + queue device can be read from the parent card's sysfs directory. For example, 755 + to see the hardware type of the queue 05.0004:: 756 + 757 + cat /sys/bus/ap/devices/card05/hwtype 758 + 759 + The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the 760 + vfio_ap device driver. 761 + 762 + 3. Create the mediated devices needed to configure the AP matrixes for the 763 + three guests and to provide an interface to the vfio_ap driver for 764 + use by the guests:: 765 + 766 + /sys/devices/vfio_ap/matrix/ 767 + ... [mdev_supported_types] 768 + ...... [vfio_ap-passthrough] (passthrough mediated matrix device type) 769 + ......... create 770 + ......... [devices] 771 + 772 + To create the mediated devices for the three guests:: 773 + 774 + uuidgen > create 775 + uuidgen > create 776 + uuidgen > create 777 + 778 + or 779 + 780 + :: 781 + 782 + echo $uuid1 > create 783 + echo $uuid2 > create 784 + echo $uuid3 > create 785 + 786 + This will create three mediated devices in the [devices] subdirectory named 787 + after the UUID used to create the mediated device. We'll call them $uuid1, 788 + $uuid2 and $uuid3 and this is the sysfs directory structure after creation:: 789 + 790 + /sys/devices/vfio_ap/matrix/ 791 + ... [mdev_supported_types] 792 + ...... [vfio_ap-passthrough] 793 + ......... [devices] 794 + ............ [$uuid1] 795 + ............... assign_adapter 796 + ............... assign_control_domain 797 + ............... assign_domain 798 + ............... matrix 799 + ............... unassign_adapter 800 + ............... unassign_control_domain 801 + ............... unassign_domain 802 + 803 + ............ [$uuid2] 804 + ............... assign_adapter 805 + ............... assign_control_domain 806 + ............... assign_domain 807 + ............... matrix 808 + ............... unassign_adapter 809 + ............... unassign_control_domain 810 + ............... unassign_domain 811 + 812 + ............ [$uuid3] 813 + ............... assign_adapter 814 + ............... assign_control_domain 815 + ............... assign_domain 816 + ............... matrix 817 + ............... unassign_adapter 818 + ............... unassign_control_domain 819 + ............... unassign_domain 820 + 821 + 4. The administrator now needs to configure the matrixes for the mediated 822 + devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). 823 + 824 + This is how the matrix is configured for Guest1:: 825 + 826 + echo 5 > assign_adapter 827 + echo 6 > assign_adapter 828 + echo 4 > assign_domain 829 + echo 0xab > assign_domain 830 + 831 + Control domains can similarly be assigned using the assign_control_domain 832 + sysfs file. 833 + 834 + If a mistake is made configuring an adapter, domain or control domain, 835 + you can use the ``unassign_xxx`` interfaces to unassign the adapter, domain or 836 + control domain. 837 + 838 + To display the matrix configuration for Guest1:: 839 + 840 + cat matrix 841 + 842 + The output will display the APQNs in the format ``xx.yyyy``, where xx is 843 + the adapter number and yyyy is the domain number. The output for Guest1 844 + will look like this:: 845 + 846 + 05.0004 847 + 05.00ab 848 + 06.0004 849 + 06.00ab 850 + 851 + This is how the matrix is configured for Guest2:: 852 + 853 + echo 5 > assign_adapter 854 + echo 0x47 > assign_domain 855 + echo 0xff > assign_domain 856 + 857 + This is how the matrix is configured for Guest3:: 858 + 859 + echo 6 > assign_adapter 860 + echo 0x47 > assign_domain 861 + echo 0xff > assign_domain 862 + 863 + 5. Start Guest1:: 864 + 865 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... 866 + 867 + 7. Start Guest2:: 868 + 869 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... 870 + 871 + 7. Start Guest3:: 872 + 873 + /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... 874 + 875 + When the guest is shut down, the mediated matrix devices may be removed. 876 + 877 + Using our example again, to remove the mediated matrix device $uuid1:: 878 + 879 + /sys/devices/vfio_ap/matrix/ 880 + ... [mdev_supported_types] 881 + ...... [vfio_ap-passthrough] 882 + ......... [devices] 883 + ............ [$uuid1] 884 + ............... remove 885 + 886 + 887 + echo 1 > remove 888 + 889 + This will remove all of the mdev matrix device's sysfs structures including 890 + the mdev device itself. To recreate and reconfigure the mdev matrix device, 891 + all of the steps starting with step 3 will have to be performed again. Note 892 + that the remove will fail if a guest using the mdev is still running. 893 + 894 + It is not necessary to remove an mdev matrix device, but one may want to 895 + remove it if no guest will use it during the remaining lifetime of the linux 896 + host. If the mdev matrix device is removed, one may want to also reconfigure 897 + the pool of adapters and queues reserved for use by the default drivers. 898 + 899 + Limitations 900 + ----------- 901 + 902 + * The KVM/kernel interfaces do not provide a way to prevent restoring an APQN 903 + to the default drivers pool of a queue that is still assigned to a mediated 904 + device in use by a guest. It is incumbent upon the administrator to 905 + ensure there is no mediated device in use by a guest to which the APQN is 906 + assigned lest the host be given access to the private data of the AP queue 907 + device, such as a private key configured specifically for the guest. 908 + 909 + * Dynamically assigning AP resources to or unassigning AP resources from a 910 + mediated matrix device - see `Configuring an AP matrix for a linux guest`_ 911 + section above - while a running guest is using it is currently not supported. 912 + 913 + * Live guest migration is not supported for guests using AP devices. If a guest 914 + is using AP devices, the vfio-ap device configured for the guest must be 915 + unplugged before migrating the guest (see `Hot unplug a vfio-ap device from a 916 + running guest`_ section above.)

-876

docs/vfio-ap.txt

··· 1 - Adjunct Processor (AP) Device 2 - ============================= 3 - 4 - Contents: 5 - ========= 6 - * Introduction 7 - * AP Architectural Overview 8 - * Start Interpretive Execution (SIE) Instruction 9 - * AP Matrix Configuration on Linux Host 10 - * Starting a Linux Guest Configured with an AP Matrix 11 - * Example: Configure AP Matrices for Three Linux Guests 12 - 13 - Introduction: 14 - ============ 15 - The IBM Adjunct Processor (AP) Cryptographic Facility is comprised 16 - of three AP instructions and from 1 to 256 PCIe cryptographic adapter cards. 17 - These AP devices provide cryptographic functions to all CPUs assigned to a 18 - linux system running in an IBM Z system LPAR. 19 - 20 - On s390x, AP adapter cards are exposed via the AP bus. This document 21 - describes how those cards may be made available to KVM guests using the 22 - VFIO mediated device framework. 23 - 24 - AP Architectural Overview: 25 - ========================= 26 - In order understand the terminology used in the rest of this document, let's 27 - start with some definitions: 28 - 29 - * AP adapter 30 - 31 - An AP adapter is an IBM Z adapter card that can perform cryptographic 32 - functions. There can be from 0 to 256 adapters assigned to an LPAR depending 33 - on the machine model. Adapters assigned to the LPAR in which a linux host is 34 - running will be available to the linux host. Each adapter is identified by a 35 - number from 0 to 255; however, the maximum adapter number allowed is 36 - determined by machine model. When installed, an AP adapter is accessed by 37 - AP instructions executed by any CPU. 38 - 39 - * AP domain 40 - 41 - An adapter is partitioned into domains. Each domain can be thought of as 42 - a set of hardware registers for processing AP instructions. An adapter can 43 - hold up to 256 domains; however, the maximum domain number allowed is 44 - determined by machine model. Each domain is identified by a number from 0 to 45 - 255. Domains can be further classified into two types: 46 - 47 - * Usage domains are domains that can be accessed directly to process AP 48 - commands 49 - 50 - * Control domains are domains that are accessed indirectly by AP 51 - commands sent to a usage domain to control or change the domain; for 52 - example, to set a secure private key for the domain. 53 - 54 - * AP Queue 55 - 56 - An AP queue is the means by which an AP command-request message is sent to an 57 - AP usage domain inside a specific AP. An AP queue is identified by a tuple 58 - comprised of an AP adapter ID (APID) and an AP queue index (APQI). The 59 - APQI corresponds to a given usage domain number within the adapter. This tuple 60 - forms an AP Queue Number (APQN) uniquely identifying an AP queue. AP 61 - instructions include a field containing the APQN to identify the AP queue to 62 - which the AP command-request message is to be sent for processing. 63 - 64 - * AP Instructions: 65 - 66 - There are three AP instructions: 67 - 68 - * NQAP: to enqueue an AP command-request message to a queue 69 - * DQAP: to dequeue an AP command-reply message from a queue 70 - * PQAP: to administer the queues 71 - 72 - AP instructions identify the domain that is targeted to process the AP 73 - command; this must be one of the usage domains. An AP command may modify a 74 - domain that is not one of the usage domains, but the modified domain 75 - must be one of the control domains. 76 - 77 - Start Interpretive Execution (SIE) Instruction 78 - ============================================== 79 - A KVM guest is started by executing the Start Interpretive Execution (SIE) 80 - instruction. The SIE state description is a control block that contains the 81 - state information for a KVM guest and is supplied as input to the SIE 82 - instruction. The SIE state description contains a satellite control block called 83 - the Crypto Control Block (CRYCB). The CRYCB contains three fields to identify 84 - the adapters, usage domains and control domains assigned to the KVM guest: 85 - 86 - * The AP Mask (APM) field is a bit mask that identifies the AP adapters assigned 87 - to the KVM guest. Each bit in the mask, from left to right, corresponds to 88 - an APID from 0-255. If a bit is set, the corresponding adapter is valid for 89 - use by the KVM guest. 90 - 91 - * The AP Queue Mask (AQM) field is a bit mask identifying the AP usage domains 92 - assigned to the KVM guest. Each bit in the mask, from left to right, 93 - corresponds to an AP queue index (APQI) from 0-255. If a bit is set, the 94 - corresponding queue is valid for use by the KVM guest. 95 - 96 - * The AP Domain Mask field is a bit mask that identifies the AP control domains 97 - assigned to the KVM guest. The ADM bit mask controls which domains can be 98 - changed by an AP command-request message sent to a usage domain from the 99 - guest. Each bit in the mask, from left to right, corresponds to a domain from 100 - 0-255. If a bit is set, the corresponding domain can be modified by an AP 101 - command-request message sent to a usage domain. 102 - 103 - If you recall from the description of an AP Queue, AP instructions include 104 - an APQN to identify the AP adapter and AP queue to which an AP command-request 105 - message is to be sent (NQAP and PQAP instructions), or from which a 106 - command-reply message is to be received (DQAP instruction). The validity of an 107 - APQN is defined by the matrix calculated from the APM and AQM; it is the 108 - cross product of all assigned adapter numbers (APM) with all assigned queue 109 - indexes (AQM). For example, if adapters 1 and 2 and usage domains 5 and 6 are 110 - assigned to a guest, the APQNs (1,5), (1,6), (2,5) and (2,6) will be valid for 111 - the guest. 112 - 113 - The APQNs can provide secure key functionality - i.e., a private key is stored 114 - on the adapter card for each of its domains - so each APQN must be assigned to 115 - at most one guest or the linux host. 116 - 117 - Example 1: Valid configuration: 118 - ------------------------------ 119 - Guest1: adapters 1,2 domains 5,6 120 - Guest2: adapter 1,2 domain 7 121 - 122 - This is valid because both guests have a unique set of APQNs: Guest1 has 123 - APQNs (1,5), (1,6), (2,5) and (2,6); Guest2 has APQNs (1,7) and (2,7). 124 - 125 - Example 2: Valid configuration: 126 - ------------------------------ 127 - Guest1: adapters 1,2 domains 5,6 128 - Guest2: adapters 3,4 domains 5,6 129 - 130 - This is also valid because both guests have a unique set of APQNs: 131 - Guest1 has APQNs (1,5), (1,6), (2,5), (2,6); 132 - Guest2 has APQNs (3,5), (3,6), (4,5), (4,6) 133 - 134 - Example 3: Invalid configuration: 135 - -------------------------------- 136 - Guest1: adapters 1,2 domains 5,6 137 - Guest2: adapter 1 domains 6,7 138 - 139 - This is an invalid configuration because both guests have access to 140 - APQN (1,6). 141 - 142 - AP Matrix Configuration on Linux Host: 143 - ===================================== 144 - A linux system is a guest of the LPAR in which it is running and has access to 145 - the AP resources configured for the LPAR. The LPAR's AP matrix is 146 - configured via its Activation Profile which can be edited on the HMC. When the 147 - linux system is started, the AP bus will detect the AP devices assigned to the 148 - LPAR and create the following in sysfs: 149 - 150 - /sys/bus/ap 151 - ... [devices] 152 - ...... xx.yyyy 153 - ...... ... 154 - ...... cardxx 155 - ...... ... 156 - 157 - Where: 158 - cardxx is AP adapter number xx (in hex) 159 - ....xx.yyyy is an APQN with xx specifying the APID and yyyy specifying the 160 - APQI 161 - 162 - For example, if AP adapters 5 and 6 and domains 4, 71 (0x47), 171 (0xab) and 163 - 255 (0xff) are configured for the LPAR, the sysfs representation on the linux 164 - host system would look like this: 165 - 166 - /sys/bus/ap 167 - ... [devices] 168 - ...... 05.0004 169 - ...... 05.0047 170 - ...... 05.00ab 171 - ...... 05.00ff 172 - ...... 06.0004 173 - ...... 06.0047 174 - ...... 06.00ab 175 - ...... 06.00ff 176 - ...... card05 177 - ...... card06 178 - 179 - A set of default device drivers are also created to control each type of AP 180 - device that can be assigned to the LPAR on which a linux host is running: 181 - 182 - /sys/bus/ap 183 - ... [drivers] 184 - ...... [cex2acard] for Crypto Express 2/3 accelerator cards 185 - ...... [cex2aqueue] for AP queues served by Crypto Express 2/3 186 - accelerator cards 187 - ...... [cex4card] for Crypto Express 4/5/6 accelerator and coprocessor 188 - cards 189 - ...... [cex4queue] for AP queues served by Crypto Express 4/5/6 190 - accelerator and coprocessor cards 191 - ...... [pcixcccard] for Crypto Express 2/3 coprocessor cards 192 - ...... [pcixccqueue] for AP queues served by Crypto Express 2/3 193 - coprocessor cards 194 - 195 - Binding AP devices to device drivers 196 - ------------------------------------ 197 - There are two sysfs files that specify bitmasks marking a subset of the APQN 198 - range as 'usable by the default AP queue device drivers' or 'not usable by the 199 - default device drivers' and thus available for use by the alternate device 200 - driver(s). The sysfs locations of the masks are: 201 - 202 - /sys/bus/ap/apmask 203 - /sys/bus/ap/aqmask 204 - 205 - The 'apmask' is a 256-bit mask that identifies a set of AP adapter IDs 206 - (APID). Each bit in the mask, from left to right (i.e., from most significant 207 - to least significant bit in big endian order), corresponds to an APID from 208 - 0-255. If a bit is set, the APID is marked as usable only by the default AP 209 - queue device drivers; otherwise, the APID is usable by the vfio_ap 210 - device driver. 211 - 212 - The 'aqmask' is a 256-bit mask that identifies a set of AP queue indexes 213 - (APQI). Each bit in the mask, from left to right (i.e., from most significant 214 - to least significant bit in big endian order), corresponds to an APQI from 215 - 0-255. If a bit is set, the APQI is marked as usable only by the default AP 216 - queue device drivers; otherwise, the APQI is usable by the vfio_ap device 217 - driver. 218 - 219 - Take, for example, the following mask: 220 - 221 - 0x7dffffffffffffffffffffffffffffffffffffffffffffffffffffffffffffff 222 - 223 - It indicates: 224 - 225 - 1, 2, 3, 4, 5, and 7-255 belong to the default drivers' pool, and 0 and 6 226 - belong to the vfio_ap device driver's pool. 227 - 228 - The APQN of each AP queue device assigned to the linux host is checked by the 229 - AP bus against the set of APQNs derived from the cross product of APIDs 230 - and APQIs marked as usable only by the default AP queue device drivers. If a 231 - match is detected, only the default AP queue device drivers will be probed; 232 - otherwise, the vfio_ap device driver will be probed. 233 - 234 - By default, the two masks are set to reserve all APQNs for use by the default 235 - AP queue device drivers. There are two ways the default masks can be changed: 236 - 237 - 1. The sysfs mask files can be edited by echoing a string into the 238 - respective sysfs mask file in one of two formats: 239 - 240 - * An absolute hex string starting with 0x - like "0x12345678" - sets 241 - the mask. If the given string is shorter than the mask, it is padded 242 - with 0s on the right; for example, specifying a mask value of 0x41 is 243 - the same as specifying: 244 - 245 - 0x4100000000000000000000000000000000000000000000000000000000000000 246 - 247 - Keep in mind that the mask reads from left to right (i.e., most 248 - significant to least significant bit in big endian order), so the mask 249 - above identifies device numbers 1 and 7 (01000001). 250 - 251 - If the string is longer than the mask, the operation is terminated with 252 - an error (EINVAL). 253 - 254 - * Individual bits in the mask can be switched on and off by specifying 255 - each bit number to be switched in a comma separated list. Each bit 256 - number string must be prepended with a ('+') or minus ('-') to indicate 257 - the corresponding bit is to be switched on ('+') or off ('-'). Some 258 - valid values are: 259 - 260 - "+0" switches bit 0 on 261 - "-13" switches bit 13 off 262 - "+0x41" switches bit 65 on 263 - "-0xff" switches bit 255 off 264 - 265 - The following example: 266 - +0,-6,+0x47,-0xf0 267 - 268 - Switches bits 0 and 71 (0x47) on 269 - Switches bits 6 and 240 (0xf0) off 270 - 271 - Note that the bits not specified in the list remain as they were before 272 - the operation. 273 - 274 - 2. The masks can also be changed at boot time via parameters on the kernel 275 - command line like this: 276 - 277 - ap.apmask=0xffff ap.aqmask=0x40 278 - 279 - This would create the following masks: 280 - 281 - apmask: 282 - 0xffff000000000000000000000000000000000000000000000000000000000000 283 - 284 - aqmask: 285 - 0x4000000000000000000000000000000000000000000000000000000000000000 286 - 287 - Resulting in these two pools: 288 - 289 - default drivers pool: adapter 0-15, domain 1 290 - alternate drivers pool: adapter 16-255, domains 0, 2-255 291 - 292 - Configuring an AP matrix for a linux guest. 293 - ------------------------------------------ 294 - The sysfs interfaces for configuring an AP matrix for a guest are built on the 295 - VFIO mediated device framework. To configure an AP matrix for a guest, a 296 - mediated matrix device must first be created for the /sys/devices/vfio_ap/matrix 297 - device. When the vfio_ap device driver is loaded, it registers with the VFIO 298 - mediated device framework. When the driver registers, the sysfs interfaces for 299 - creating mediated matrix devices is created: 300 - 301 - /sys/devices 302 - ... [vfio_ap] 303 - ......[matrix] 304 - ......... [mdev_supported_types] 305 - ............ [vfio_ap-passthrough] 306 - ............... create 307 - ............... [devices] 308 - 309 - A mediated AP matrix device is created by writing a UUID to the attribute file 310 - named 'create', for example: 311 - 312 - uuidgen > create 313 - 314 - or 315 - 316 - echo $uuid > create 317 - 318 - When a mediated AP matrix device is created, a sysfs directory named after 319 - the UUID is created in the 'devices' subdirectory: 320 - 321 - /sys/devices 322 - ... [vfio_ap] 323 - ......[matrix] 324 - ......... [mdev_supported_types] 325 - ............ [vfio_ap-passthrough] 326 - ............... create 327 - ............... [devices] 328 - .................. [$uuid] 329 - 330 - There will also be three sets of attribute files created in the mediated 331 - matrix device's sysfs directory to configure an AP matrix for the 332 - KVM guest: 333 - 334 - /sys/devices 335 - ... [vfio_ap] 336 - ......[matrix] 337 - ......... [mdev_supported_types] 338 - ............ [vfio_ap-passthrough] 339 - ............... create 340 - ............... [devices] 341 - .................. [$uuid] 342 - ..................... assign_adapter 343 - ..................... assign_control_domain 344 - ..................... assign_domain 345 - ..................... matrix 346 - ..................... unassign_adapter 347 - ..................... unassign_control_domain 348 - ..................... unassign_domain 349 - 350 - assign_adapter 351 - To assign an AP adapter to the mediated matrix device, its APID is written 352 - to the 'assign_adapter' file. This may be done multiple times to assign more 353 - than one adapter. The APID may be specified using conventional semantics 354 - as a decimal, hexadecimal, or octal number. For example, to assign adapters 355 - 4, 5 and 16 to a mediated matrix device in decimal, hexadecimal and octal 356 - respectively: 357 - 358 - echo 4 > assign_adapter 359 - echo 0x5 > assign_adapter 360 - echo 020 > assign_adapter 361 - 362 - In order to successfully assign an adapter: 363 - 364 - * The adapter number specified must represent a value from 0 up to the 365 - maximum adapter number allowed by the machine model. If an adapter number 366 - higher than the maximum is specified, the operation will terminate with 367 - an error (ENODEV). 368 - 369 - * All APQNs that can be derived from the adapter ID being assigned and the 370 - IDs of the previously assigned domains must be bound to the vfio_ap device 371 - driver. If no domains have yet been assigned, then there must be at least 372 - one APQN with the specified APID bound to the vfio_ap driver. If no such 373 - APQNs are bound to the driver, the operation will terminate with an 374 - error (EADDRNOTAVAIL). 375 - 376 - No APQN that can be derived from the adapter ID and the IDs of the 377 - previously assigned domains can be assigned to another mediated matrix 378 - device. If an APQN is assigned to another mediated matrix device, the 379 - operation will terminate with an error (EADDRINUSE). 380 - 381 - unassign_adapter 382 - To unassign an AP adapter, its APID is written to the 'unassign_adapter' 383 - file. This may also be done multiple times to unassign more than one adapter. 384 - 385 - assign_domain 386 - To assign a usage domain, the domain number is written into the 387 - 'assign_domain' file. This may be done multiple times to assign more than one 388 - usage domain. The domain number is specified using conventional semantics as 389 - a decimal, hexadecimal, or octal number. For example, to assign usage domains 390 - 4, 8, and 71 to a mediated matrix device in decimal, hexadecimal and octal 391 - respectively: 392 - 393 - echo 4 > assign_domain 394 - echo 0x8 > assign_domain 395 - echo 0107 > assign_domain 396 - 397 - In order to successfully assign a domain: 398 - 399 - * The domain number specified must represent a value from 0 up to the 400 - maximum domain number allowed by the machine model. If a domain number 401 - higher than the maximum is specified, the operation will terminate with 402 - an error (ENODEV). 403 - 404 - * All APQNs that can be derived from the domain ID being assigned and the IDs 405 - of the previously assigned adapters must be bound to the vfio_ap device 406 - driver. If no domains have yet been assigned, then there must be at least 407 - one APQN with the specified APQI bound to the vfio_ap driver. If no such 408 - APQNs are bound to the driver, the operation will terminate with an 409 - error (EADDRNOTAVAIL). 410 - 411 - No APQN that can be derived from the domain ID being assigned and the IDs 412 - of the previously assigned adapters can be assigned to another mediated 413 - matrix device. If an APQN is assigned to another mediated matrix device, 414 - the operation will terminate with an error (EADDRINUSE). 415 - 416 - unassign_domain 417 - To unassign a usage domain, the domain number is written into the 418 - 'unassign_domain' file. This may be done multiple times to unassign more than 419 - one usage domain. 420 - 421 - assign_control_domain 422 - To assign a control domain, the domain number is written into the 423 - 'assign_control_domain' file. This may be done multiple times to 424 - assign more than one control domain. The domain number may be specified using 425 - conventional semantics as a decimal, hexadecimal, or octal number. For 426 - example, to assign control domains 4, 8, and 71 to a mediated matrix device 427 - in decimal, hexadecimal and octal respectively: 428 - 429 - echo 4 > assign_domain 430 - echo 0x8 > assign_domain 431 - echo 0107 > assign_domain 432 - 433 - In order to successfully assign a control domain, the domain number 434 - specified must represent a value from 0 up to the maximum domain number 435 - allowed by the machine model. If a control domain number higher than the 436 - maximum is specified, the operation will terminate with an error (ENODEV). 437 - 438 - unassign_control_domain 439 - To unassign a control domain, the domain number is written into the 440 - 'unassign_domain' file. This may be done multiple times to unassign more than 441 - one control domain. 442 - 443 - Notes: No changes to the AP matrix will be allowed while a guest using 444 - the mediated matrix device is running. Attempts to assign an adapter, 445 - domain or control domain will be rejected and an error (EBUSY) returned. 446 - 447 - Starting a Linux Guest Configured with an AP Matrix: 448 - =================================================== 449 - To provide a mediated matrix device for use by a guest, the following option 450 - must be specified on the QEMU command line: 451 - 452 - -device vfio_ap,sysfsdev=$path-to-mdev 453 - 454 - The sysfsdev parameter specifies the path to the mediated matrix device. 455 - There are a number of ways to specify this path: 456 - 457 - /sys/devices/vfio_ap/matrix/$uuid 458 - /sys/bus/mdev/devices/$uuid 459 - /sys/bus/mdev/drivers/vfio_mdev/$uuid 460 - /sys/devices/vfio_ap/matrix/mdev_supported_types/vfio_ap-passthrough/devices/$uuid 461 - 462 - When the linux guest is started, the guest will open the mediated 463 - matrix device's file descriptor to get information about the mediated matrix 464 - device. The vfio_ap device driver will update the APM, AQM, and ADM fields in 465 - the guest's CRYCB with the adapter, usage domain and control domains assigned 466 - via the mediated matrix device's sysfs attribute files. Programs running on the 467 - linux guest will then: 468 - 469 - 1. Have direct access to the APQNs derived from the cross product of the AP 470 - adapter numbers (APID) and queue indexes (APQI) specified in the APM and AQM 471 - fields of the guests's CRYCB respectively. These APQNs identify the AP queues 472 - that are valid for use by the guest; meaning, AP commands can be sent by the 473 - guest to any of these queues for processing. 474 - 475 - 2. Have authorization to process AP commands to change a control domain 476 - identified in the ADM field of the guest's CRYCB. The AP command must be sent 477 - to a valid APQN (see 1 above). 478 - 479 - CPU model features: 480 - 481 - Three CPU model features are available for controlling guest access to AP 482 - facilities: 483 - 484 - 1. AP facilities feature 485 - 486 - The AP facilities feature indicates that AP facilities are installed on the 487 - guest. This feature will be exposed for use only if the AP facilities 488 - are installed on the host system. The feature is s390-specific and is 489 - represented as a parameter of the -cpu option on the QEMU command line: 490 - 491 - qemu-system-s390x -cpu $model,ap=on|off 492 - 493 - Where: 494 - 495 - $model is the CPU model defined for the guest (defaults to the model of 496 - the host system if not specified). 497 - 498 - ap=on|off indicates whether AP facilities are installed (on) or not 499 - (off). The default for CPU models zEC12 or newer 500 - is ap=on. AP facilities must be installed on the guest if a 501 - vfio-ap device (-device vfio-ap,sysfsdev=$path) is configured 502 - for the guest, or the guest will fail to start. 503 - 504 - 2. Query Configuration Information (QCI) facility 505 - 506 - The QCI facility is used by the AP bus running on the guest to query the 507 - configuration of the AP facilities. This facility will be available 508 - only if the QCI facility is installed on the host system. The feature is 509 - s390-specific and is represented as a parameter of the -cpu option on the 510 - QEMU command line: 511 - 512 - qemu-system-s390x -cpu $model,apqci=on|off 513 - 514 - Where: 515 - 516 - $model is the CPU model defined for the guest 517 - 518 - apqci=on|off indicates whether the QCI facility is installed (on) or 519 - not (off). The default for CPU models zEC12 or newer 520 - is apqci=on; for older models, QCI will not be installed. 521 - 522 - If QCI is installed (apqci=on) but AP facilities are not 523 - (ap=off), an error message will be logged, but the guest 524 - will be allowed to start. It makes no sense to have QCI 525 - installed if the AP facilities are not; this is considered 526 - an invalid configuration. 527 - 528 - If the QCI facility is not installed, APQNs with an APQI 529 - greater than 15 will not be detected by the AP bus 530 - running on the guest. 531 - 532 - 3. Adjunct Process Facility Test (APFT) facility 533 - 534 - The APFT facility is used by the AP bus running on the guest to test the 535 - AP facilities available for a given AP queue. This facility will be available 536 - only if the APFT facility is installed on the host system. The feature is 537 - s390-specific and is represented as a parameter of the -cpu option on the 538 - QEMU command line: 539 - 540 - qemu-system-s390x -cpu $model,apft=on|off 541 - 542 - Where: 543 - 544 - $model is the CPU model defined for the guest (defaults to the model of 545 - the host system if not specified). 546 - 547 - apft=on|off indicates whether the APFT facility is installed (on) or 548 - not (off). The default for CPU models zEC12 and 549 - newer is apft=on for older models, APFT will not be 550 - installed. 551 - 552 - If APFT is installed (apft=on) but AP facilities are not 553 - (ap=off), an error message will be logged, but the guest 554 - will be allowed to start. It makes no sense to have APFT 555 - installed if the AP facilities are not; this is considered 556 - an invalid configuration. 557 - 558 - It also makes no sense to turn APFT off because the AP bus 559 - running on the guest will not detect CEX4 and newer devices 560 - without it. Since only CEX4 and newer devices are supported 561 - for guest usage, no AP devices can be made accessible to a 562 - guest started without APFT installed. 563 - 564 - Hot plug a vfio-ap device into a running guest: 565 - ============================================== 566 - Only one vfio-ap device can be attached to the virtual machine's ap-bus, so a 567 - vfio-ap device can be hot plugged if and only if no vfio-ap device is attached 568 - to the bus already, whether via the QEMU command line or a prior hot plug 569 - action. 570 - 571 - To hot plug a vfio-ap device, use the QEMU device_add command: 572 - 573 - (qemu) device_add vfio-ap,sysfsdev="$path-to-mdev" 574 - 575 - Where the '$path-to-mdev' value specifies the absolute path to a mediated 576 - device to which AP resources to be used by the guest have been assigned. 577 - 578 - Note that on Linux guests, the AP devices will be created in the 579 - /sys/bus/ap/devices directory when the AP bus subsequently performs its periodic 580 - scan, so there may be a short delay before the AP devices are accessible on the 581 - guest. 582 - 583 - The command will fail if: 584 - 585 - * A vfio-ap device has already been attached to the virtual machine's ap-bus. 586 - 587 - * The CPU model features for controlling guest access to AP facilities are not 588 - enabled (see 'CPU model features' subsection in the previous section). 589 - 590 - Hot unplug a vfio-ap device from a running guest: 591 - ================================================ 592 - A vfio-ap device can be unplugged from a running KVM guest if a vfio-ap device 593 - has been attached to the virtual machine's ap-bus via the QEMU command line 594 - or a prior hot plug action. 595 - 596 - To hot unplug a vfio-ap device, use the QEMU device_del command: 597 - 598 - (qemu) device_del vfio-ap,sysfsdev="$path-to-mdev" 599 - 600 - Where $path-to-mdev is the same as the path specified when the vfio-ap 601 - device was attached to the virtual machine's ap-bus. 602 - 603 - On a Linux guest, the AP devices will be removed from the /sys/bus/ap/devices 604 - directory on the guest when the AP bus subsequently performs its periodic scan, 605 - so there may be a short delay before the AP devices are no longer accessible by 606 - the guest. 607 - 608 - The command will fail if the $path-to-mdev specified on the device_del command 609 - does not match the value specified when the vfio-ap device was attached to 610 - the virtual machine's ap-bus. 611 - 612 - Example: Configure AP Matrixes for Three Linux Guests: 613 - ===================================================== 614 - Let's now provide an example to illustrate how KVM guests may be given 615 - access to AP facilities. For this example, we will show how to configure 616 - three guests such that executing the lszcrypt command on the guests would 617 - look like this: 618 - 619 - Guest1 620 - ------ 621 - CARD.DOMAIN TYPE MODE 622 - ------------------------------ 623 - 05 CEX5C CCA-Coproc 624 - 05.0004 CEX5C CCA-Coproc 625 - 05.00ab CEX5C CCA-Coproc 626 - 06 CEX5A Accelerator 627 - 06.0004 CEX5A Accelerator 628 - 06.00ab CEX5C CCA-Coproc 629 - 630 - Guest2 631 - ------ 632 - CARD.DOMAIN TYPE MODE 633 - ------------------------------ 634 - 05 CEX5A Accelerator 635 - 05.0047 CEX5A Accelerator 636 - 05.00ff CEX5A Accelerator (5,4), (5,171), (6,4), (6,171), 637 - 638 - Guest3 639 - ------ 640 - CARD.DOMAIN TYPE MODE 641 - ------------------------------ 642 - 06 CEX5A Accelerator 643 - 06.0047 CEX5A Accelerator 644 - 06.00ff CEX5A Accelerator 645 - 646 - These are the steps: 647 - 648 - 1. Install the vfio_ap module on the linux host. The dependency chain for the 649 - vfio_ap module is: 650 - * iommu 651 - * s390 652 - * zcrypt 653 - * vfio 654 - * vfio_mdev 655 - * vfio_mdev_device 656 - * KVM 657 - 658 - To build the vfio_ap module, the kernel build must be configured with the 659 - following Kconfig elements selected: 660 - * IOMMU_SUPPORT 661 - * S390 662 - * ZCRYPT 663 - * S390_AP_IOMMU 664 - * VFIO 665 - * VFIO_MDEV 666 - * VFIO_MDEV_DEVICE 667 - * KVM 668 - 669 - If using make menuconfig select the following to build the vfio_ap module: 670 - -> Device Drivers 671 - -> IOMMU Hardware Support 672 - select S390 AP IOMMU Support 673 - -> VFIO Non-Privileged userspace driver framework 674 - -> Mediated device driver framework 675 - -> VFIO driver for Mediated devices 676 - -> I/O subsystem 677 - -> VFIO support for AP devices 678 - 679 - 2. Secure the AP queues to be used by the three guests so that the host can not 680 - access them. To secure the AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 681 - 06.0004, 06.0047, 06.00ab, and 06.00ff for use by the vfio_ap device driver, 682 - the corresponding APQNs must be removed from the default queue drivers pool 683 - as follows: 684 - 685 - echo -5,-6 > /sys/bus/ap/apmask 686 - 687 - echo -4,-0x47,-0xab,-0xff > /sys/bus/ap/aqmask 688 - 689 - This will result in AP queues 05.0004, 05.0047, 05.00ab, 05.00ff, 06.0004, 690 - 06.0047, 06.00ab, and 06.00ff getting bound to the vfio_ap device driver. The 691 - sysfs directory for the vfio_ap device driver will now contain symbolic links 692 - to the AP queue devices bound to it: 693 - 694 - /sys/bus/ap 695 - ... [drivers] 696 - ...... [vfio_ap] 697 - ......... [05.0004] 698 - ......... [05.0047] 699 - ......... [05.00ab] 700 - ......... [05.00ff] 701 - ......... [06.0004] 702 - ......... [06.0047] 703 - ......... [06.00ab] 704 - ......... [06.00ff] 705 - 706 - Keep in mind that only type 10 and newer adapters (i.e., CEX4 and later) 707 - can be bound to the vfio_ap device driver. The reason for this is to 708 - simplify the implementation by not needlessly complicating the design by 709 - supporting older devices that will go out of service in the relatively near 710 - future, and for which there are few older systems on which to test. 711 - 712 - The administrator, therefore, must take care to secure only AP queues that 713 - can be bound to the vfio_ap device driver. The device type for a given AP 714 - queue device can be read from the parent card's sysfs directory. For example, 715 - to see the hardware type of the queue 05.0004: 716 - 717 - cat /sys/bus/ap/devices/card05/hwtype 718 - 719 - The hwtype must be 10 or higher (CEX4 or newer) in order to be bound to the 720 - vfio_ap device driver. 721 - 722 - 3. Create the mediated devices needed to configure the AP matrixes for the 723 - three guests and to provide an interface to the vfio_ap driver for 724 - use by the guests: 725 - 726 - /sys/devices/vfio_ap/matrix/ 727 - --- [mdev_supported_types] 728 - ------ [vfio_ap-passthrough] (passthrough mediated matrix device type) 729 - --------- create 730 - --------- [devices] 731 - 732 - To create the mediated devices for the three guests: 733 - 734 - uuidgen > create 735 - uuidgen > create 736 - uuidgen > create 737 - 738 - or 739 - 740 - echo $uuid1 > create 741 - echo $uuid2 > create 742 - echo $uuid3 > create 743 - 744 - This will create three mediated devices in the [devices] subdirectory named 745 - after the UUID used to create the mediated device. We'll call them $uuid1, 746 - $uuid2 and $uuid3 and this is the sysfs directory structure after creation: 747 - 748 - /sys/devices/vfio_ap/matrix/ 749 - --- [mdev_supported_types] 750 - ------ [vfio_ap-passthrough] 751 - --------- [devices] 752 - ------------ [$uuid1] 753 - --------------- assign_adapter 754 - --------------- assign_control_domain 755 - --------------- assign_domain 756 - --------------- matrix 757 - --------------- unassign_adapter 758 - --------------- unassign_control_domain 759 - --------------- unassign_domain 760 - 761 - ------------ [$uuid2] 762 - --------------- assign_adapter 763 - --------------- assign_control_domain 764 - --------------- assign_domain 765 - --------------- matrix 766 - --------------- unassign_adapter 767 - ----------------unassign_control_domain 768 - ----------------unassign_domain 769 - 770 - ------------ [$uuid3] 771 - --------------- assign_adapter 772 - --------------- assign_control_domain 773 - --------------- assign_domain 774 - --------------- matrix 775 - --------------- unassign_adapter 776 - ----------------unassign_control_domain 777 - ----------------unassign_domain 778 - 779 - 4. The administrator now needs to configure the matrixes for the mediated 780 - devices $uuid1 (for Guest1), $uuid2 (for Guest2) and $uuid3 (for Guest3). 781 - 782 - This is how the matrix is configured for Guest1: 783 - 784 - echo 5 > assign_adapter 785 - echo 6 > assign_adapter 786 - echo 4 > assign_domain 787 - echo 0xab > assign_domain 788 - 789 - Control domains can similarly be assigned using the assign_control_domain 790 - sysfs file. 791 - 792 - If a mistake is made configuring an adapter, domain or control domain, 793 - you can use the unassign_xxx interfaces to unassign the adapter, domain or 794 - control domain. 795 - 796 - To display the matrix configuration for Guest1: 797 - 798 - cat matrix 799 - 800 - The output will display the APQNs in the format xx.yyyy, where xx is 801 - the adapter number and yyyy is the domain number. The output for Guest1 802 - will look like this: 803 - 804 - 05.0004 805 - 05.00ab 806 - 06.0004 807 - 06.00ab 808 - 809 - This is how the matrix is configured for Guest2: 810 - 811 - echo 5 > assign_adapter 812 - echo 0x47 > assign_domain 813 - echo 0xff > assign_domain 814 - 815 - This is how the matrix is configured for Guest3: 816 - 817 - echo 6 > assign_adapter 818 - echo 0x47 > assign_domain 819 - echo 0xff > assign_domain 820 - 821 - 5. Start Guest1: 822 - 823 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 824 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid1 ... 825 - 826 - 7. Start Guest2: 827 - 828 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 829 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid2 ... 830 - 831 - 7. Start Guest3: 832 - 833 - /usr/bin/qemu-system-s390x ... -cpu host,ap=on,apqci=on,apft=on \ 834 - -device vfio-ap,sysfsdev=/sys/devices/vfio_ap/matrix/$uuid3 ... 835 - 836 - When the guest is shut down, the mediated matrix devices may be removed. 837 - 838 - Using our example again, to remove the mediated matrix device $uuid1: 839 - 840 - /sys/devices/vfio_ap/matrix/ 841 - --- [mdev_supported_types] 842 - ------ [vfio_ap-passthrough] 843 - --------- [devices] 844 - ------------ [$uuid1] 845 - --------------- remove 846 - 847 - 848 - echo 1 > remove 849 - 850 - This will remove all of the mdev matrix device's sysfs structures including 851 - the mdev device itself. To recreate and reconfigure the mdev matrix device, 852 - all of the steps starting with step 3 will have to be performed again. Note 853 - that the remove will fail if a guest using the mdev is still running. 854 - 855 - It is not necessary to remove an mdev matrix device, but one may want to 856 - remove it if no guest will use it during the remaining lifetime of the linux 857 - host. If the mdev matrix device is removed, one may want to also reconfigure 858 - the pool of adapters and queues reserved for use by the default drivers. 859 - 860 - Limitations 861 - =========== 862 - * The KVM/kernel interfaces do not provide a way to prevent restoring an APQN 863 - to the default drivers pool of a queue that is still assigned to a mediated 864 - device in use by a guest. It is incumbent upon the administrator to 865 - ensure there is no mediated device in use by a guest to which the APQN is 866 - assigned lest the host be given access to the private data of the AP queue 867 - device, such as a private key configured specifically for the guest. 868 - 869 - * Dynamically assigning AP resources to or unassigning AP resources from a 870 - mediated matrix device - see 'Configuring an AP matrix for a linux guest' 871 - section above - while a running guest is using it is currently not supported. 872 - 873 - * Live guest migration is not supported for guests using AP devices. If a guest 874 - is using AP devices, the vfio-ap device configured for the guest must be 875 - unplugged before migrating the guest (see 'Hot unplug a vfio-ap device from a 876 - running guest' section above.

+1 -1

hw/s390x/ipl.c

··· 179 179 /* if not Linux load the address of the (short) IPL PSW */ 180 180 ipl_psw = rom_ptr(4, 4); 181 181 if (ipl_psw) { 182 - pentry = be32_to_cpu(*ipl_psw) & 0x7fffffffUL; 182 + pentry = be32_to_cpu(*ipl_psw) & PSW_MASK_SHORT_ADDR; 183 183 } else { 184 184 error_setg(&err, "Could not get IPL PSW"); 185 185 goto error;

+24

include/standard-headers/drm/drm_fourcc.h

··· 410 410 #define I915_FORMAT_MOD_Yf_TILED_CCS fourcc_mod_code(INTEL, 5) 411 411 412 412 /* 413 + * Intel color control surfaces (CCS) for Gen-12 render compression. 414 + * 415 + * The main surface is Y-tiled and at plane index 0, the CCS is linear and 416 + * at index 1. A 64B CCS cache line corresponds to an area of 4x1 tiles in 417 + * main surface. In other words, 4 bits in CCS map to a main surface cache 418 + * line pair. The main surface pitch is required to be a multiple of four 419 + * Y-tile widths. 420 + */ 421 + #define I915_FORMAT_MOD_Y_TILED_GEN12_RC_CCS fourcc_mod_code(INTEL, 6) 422 + 423 + /* 424 + * Intel color control surfaces (CCS) for Gen-12 media compression 425 + * 426 + * The main surface is Y-tiled and at plane index 0, the CCS is linear and 427 + * at index 1. A 64B CCS cache line corresponds to an area of 4x1 tiles in 428 + * main surface. In other words, 4 bits in CCS map to a main surface cache 429 + * line pair. The main surface pitch is required to be a multiple of four 430 + * Y-tile widths. For semi-planar formats like NV12, CCS planes follow the 431 + * Y and UV planes i.e., planes 0 and 1 are used for Y and UV surfaces, 432 + * planes 2 and 3 for the respective CCS. 433 + */ 434 + #define I915_FORMAT_MOD_Y_TILED_GEN12_MC_CCS fourcc_mod_code(INTEL, 7) 435 + 436 + /* 413 437 * Tiled, NV12MT, grouped in 64 (pixels) x 32 (lines) -sized macroblocks 414 438 * 415 439 * Macroblocks are laid in a Z-shape, and each pixel data is following the

+11

include/standard-headers/linux/ethtool.h

··· 593 593 * @ETH_SS_RSS_HASH_FUNCS: RSS hush function names 594 594 * @ETH_SS_PHY_STATS: Statistic names, for use with %ETHTOOL_GPHYSTATS 595 595 * @ETH_SS_PHY_TUNABLES: PHY tunable names 596 + * @ETH_SS_LINK_MODES: link mode names 597 + * @ETH_SS_MSG_CLASSES: debug message class names 598 + * @ETH_SS_WOL_MODES: wake-on-lan modes 596 599 */ 597 600 enum ethtool_stringset { 598 601 ETH_SS_TEST = 0, ··· 604 607 ETH_SS_TUNABLES, 605 608 ETH_SS_PHY_STATS, 606 609 ETH_SS_PHY_TUNABLES, 610 + ETH_SS_LINK_MODES, 611 + ETH_SS_MSG_CLASSES, 612 + ETH_SS_WOL_MODES, 613 + 614 + /* add new constants above here */ 615 + ETH_SS_COUNT 607 616 }; 608 617 609 618 /** ··· 1687 1696 #define WAKE_MAGIC (1 << 5) 1688 1697 #define WAKE_MAGICSECURE (1 << 6) /* only meaningful if WAKE_MAGIC */ 1689 1698 #define WAKE_FILTER (1 << 7) 1699 + 1700 + #define WOL_MODE_COUNT 8 1690 1701 1691 1702 /* L2-L4 network traffic flow types */ 1692 1703 #define TCP_V4_FLOW 0x01 /* hash or spec (tcp_ip4_spec) */

+1

include/standard-headers/linux/input.h

··· 31 31 unsigned long __sec; 32 32 #if defined(__sparc__) && defined(__arch64__) 33 33 unsigned int __usec; 34 + unsigned int __pad; 34 35 #else 35 36 unsigned long __usec; 36 37 #endif

+1

include/standard-headers/linux/pci_regs.h

··· 676 676 #define PCI_EXP_LNKCTL2_TLS_32_0GT 0x0005 /* Supported Speed 32GT/s */ 677 677 #define PCI_EXP_LNKCTL2_ENTER_COMP 0x0010 /* Enter Compliance */ 678 678 #define PCI_EXP_LNKCTL2_TX_MARGIN 0x0380 /* Transmit Margin */ 679 + #define PCI_EXP_LNKCTL2_HASD 0x0020 /* HW Autonomous Speed Disable */ 679 680 #define PCI_EXP_LNKSTA2 50 /* Link Status 2 */ 680 681 #define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */ 681 682 #define PCI_EXP_SLTCAP2 52 /* Slot Capabilities 2 */

+2

linux-headers/asm-arm/unistd-common.h

··· 390 390 #define __NR_fspick (__NR_SYSCALL_BASE + 433) 391 391 #define __NR_pidfd_open (__NR_SYSCALL_BASE + 434) 392 392 #define __NR_clone3 (__NR_SYSCALL_BASE + 435) 393 + #define __NR_openat2 (__NR_SYSCALL_BASE + 437) 394 + #define __NR_pidfd_getfd (__NR_SYSCALL_BASE + 438) 393 395 394 396 #endif /* _ASM_ARM_UNISTD_COMMON_H */

+10 -2

linux-headers/asm-arm64/kvm.h

··· 220 220 #define KVM_REG_ARM_PTIMER_CVAL ARM64_SYS_REG(3, 3, 14, 2, 2) 221 221 #define KVM_REG_ARM_PTIMER_CNT ARM64_SYS_REG(3, 3, 14, 0, 1) 222 222 223 - /* EL0 Virtual Timer Registers */ 223 + /* 224 + * EL0 Virtual Timer Registers 225 + * 226 + * WARNING: 227 + * KVM_REG_ARM_TIMER_CVAL and KVM_REG_ARM_TIMER_CNT are not defined 228 + * with the appropriate register encodings. Their values have been 229 + * accidentally swapped. As this is set API, the definitions here 230 + * must be used, rather than ones derived from the encodings. 231 + */ 224 232 #define KVM_REG_ARM_TIMER_CTL ARM64_SYS_REG(3, 3, 14, 3, 1) 233 + #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2) 225 234 #define KVM_REG_ARM_TIMER_CNT ARM64_SYS_REG(3, 3, 14, 3, 2) 226 - #define KVM_REG_ARM_TIMER_CVAL ARM64_SYS_REG(3, 3, 14, 0, 2) 227 235 228 236 /* KVM-as-firmware specific pseudo-registers */ 229 237 #define KVM_REG_ARM_FW (0x0014 << KVM_REG_ARM_COPROC_SHIFT)

+1

linux-headers/asm-arm64/unistd.h

··· 19 19 #define __ARCH_WANT_NEW_STAT 20 20 #define __ARCH_WANT_SET_GET_RLIMIT 21 21 #define __ARCH_WANT_TIME32_SYSCALLS 22 + #define __ARCH_WANT_SYS_CLONE3 22 23 23 24 #include <asm-generic/unistd.h>

+2

linux-headers/asm-generic/mman-common.h

··· 11 11 #define PROT_WRITE 0x2 /* page can be written */ 12 12 #define PROT_EXEC 0x4 /* page can be executed */ 13 13 #define PROT_SEM 0x8 /* page may be used for atomic ops */ 14 + /* 0x10 reserved for arch-specific use */ 15 + /* 0x20 reserved for arch-specific use */ 14 16 #define PROT_NONE 0x0 /* page can not be accessed */ 15 17 #define PROT_GROWSDOWN 0x01000000 /* mprotect flag: extend change to start of growsdown vma */ 16 18 #define PROT_GROWSUP 0x02000000 /* mprotect flag: extend change to end of growsup vma */

+6 -1

linux-headers/asm-generic/unistd.h

··· 851 851 __SYSCALL(__NR_clone3, sys_clone3) 852 852 #endif 853 853 854 + #define __NR_openat2 437 855 + __SYSCALL(__NR_openat2, sys_openat2) 856 + #define __NR_pidfd_getfd 438 857 + __SYSCALL(__NR_pidfd_getfd, sys_pidfd_getfd) 858 + 854 859 #undef __NR_syscalls 855 - #define __NR_syscalls 436 860 + #define __NR_syscalls 439 856 861 857 862 /* 858 863 * 32 bit systems traditionally used different

+2

linux-headers/asm-mips/unistd_n32.h

··· 365 365 #define __NR_fspick (__NR_Linux + 433) 366 366 #define __NR_pidfd_open (__NR_Linux + 434) 367 367 #define __NR_clone3 (__NR_Linux + 435) 368 + #define __NR_openat2 (__NR_Linux + 437) 369 + #define __NR_pidfd_getfd (__NR_Linux + 438) 368 370 369 371 370 372 #endif /* _ASM_MIPS_UNISTD_N32_H */

+2

linux-headers/asm-mips/unistd_n64.h

··· 341 341 #define __NR_fspick (__NR_Linux + 433) 342 342 #define __NR_pidfd_open (__NR_Linux + 434) 343 343 #define __NR_clone3 (__NR_Linux + 435) 344 + #define __NR_openat2 (__NR_Linux + 437) 345 + #define __NR_pidfd_getfd (__NR_Linux + 438) 344 346 345 347 346 348 #endif /* _ASM_MIPS_UNISTD_N64_H */

+2

linux-headers/asm-mips/unistd_o32.h

··· 411 411 #define __NR_fspick (__NR_Linux + 433) 412 412 #define __NR_pidfd_open (__NR_Linux + 434) 413 413 #define __NR_clone3 (__NR_Linux + 435) 414 + #define __NR_openat2 (__NR_Linux + 437) 415 + #define __NR_pidfd_getfd (__NR_Linux + 438) 414 416 415 417 416 418 #endif /* _ASM_MIPS_UNISTD_O32_H */

+2

linux-headers/asm-powerpc/unistd_32.h

··· 418 418 #define __NR_fspick 433 419 419 #define __NR_pidfd_open 434 420 420 #define __NR_clone3 435 421 + #define __NR_openat2 437 422 + #define __NR_pidfd_getfd 438 421 423 422 424 423 425 #endif /* _ASM_POWERPC_UNISTD_32_H */

+2

linux-headers/asm-powerpc/unistd_64.h

··· 390 390 #define __NR_fspick 433 391 391 #define __NR_pidfd_open 434 392 392 #define __NR_clone3 435 393 + #define __NR_openat2 437 394 + #define __NR_pidfd_getfd 438 393 395 394 396 395 397 #endif /* _ASM_POWERPC_UNISTD_64_H */

+2

linux-headers/asm-s390/unistd_32.h

··· 408 408 #define __NR_fspick 433 409 409 #define __NR_pidfd_open 434 410 410 #define __NR_clone3 435 411 + #define __NR_openat2 437 412 + #define __NR_pidfd_getfd 438 411 413 412 414 #endif /* _ASM_S390_UNISTD_32_H */

+2

linux-headers/asm-s390/unistd_64.h

··· 356 356 #define __NR_fspick 433 357 357 #define __NR_pidfd_open 434 358 358 #define __NR_clone3 435 359 + #define __NR_openat2 437 360 + #define __NR_pidfd_getfd 438 359 361 360 362 #endif /* _ASM_S390_UNISTD_64_H */

+2

linux-headers/asm-x86/unistd_32.h

··· 426 426 #define __NR_fspick 433 427 427 #define __NR_pidfd_open 434 428 428 #define __NR_clone3 435 429 + #define __NR_openat2 437 430 + #define __NR_pidfd_getfd 438 429 431 430 432 #endif /* _ASM_X86_UNISTD_32_H */

+2

linux-headers/asm-x86/unistd_64.h

··· 348 348 #define __NR_fspick 433 349 349 #define __NR_pidfd_open 434 350 350 #define __NR_clone3 435 351 + #define __NR_openat2 437 352 + #define __NR_pidfd_getfd 438 351 353 352 354 #endif /* _ASM_X86_UNISTD_64_H */

+2

linux-headers/asm-x86/unistd_x32.h

··· 301 301 #define __NR_fspick (__X32_SYSCALL_BIT + 433) 302 302 #define __NR_pidfd_open (__X32_SYSCALL_BIT + 434) 303 303 #define __NR_clone3 (__X32_SYSCALL_BIT + 435) 304 + #define __NR_openat2 (__X32_SYSCALL_BIT + 437) 305 + #define __NR_pidfd_getfd (__X32_SYSCALL_BIT + 438) 304 306 #define __NR_rt_sigaction (__X32_SYSCALL_BIT + 512) 305 307 #define __NR_rt_sigreturn (__X32_SYSCALL_BIT + 513) 306 308 #define __NR_ioctl (__X32_SYSCALL_BIT + 514)

+5

linux-headers/linux/kvm.h

··· 1009 1009 #define KVM_CAP_PPC_GUEST_DEBUG_SSTEP 176 1010 1010 #define KVM_CAP_ARM_NISV_TO_USER 177 1011 1011 #define KVM_CAP_ARM_INJECT_EXT_DABT 178 1012 + #define KVM_CAP_S390_VCPU_RESETS 179 1012 1013 1013 1014 #ifdef KVM_CAP_IRQ_ROUTING 1014 1015 ··· 1472 1473 1473 1474 /* Available with KVM_CAP_ARM_SVE */ 1474 1475 #define KVM_ARM_VCPU_FINALIZE _IOW(KVMIO, 0xc2, int) 1476 + 1477 + /* Available with KVM_CAP_S390_VCPU_RESETS */ 1478 + #define KVM_S390_NORMAL_RESET _IO(KVMIO, 0xc3) 1479 + #define KVM_S390_CLEAR_RESET _IO(KVMIO, 0xc4) 1475 1480 1476 1481 /* Secure Encrypted Virtualization command */ 1477 1482 enum sev_cmd_id {

+14 -4

target/s390x/cpu.c

··· 78 78 S390CPU *cpu = S390_CPU(s); 79 79 uint64_t spsw = ldq_phys(s->as, 0); 80 80 81 - cpu->env.psw.mask = spsw & 0xffffffff80000000ULL; 81 + cpu->env.psw.mask = spsw & PSW_MASK_SHORT_CTRL; 82 82 /* 83 83 * Invert short psw indication, so SIE will report a specification 84 84 * exception if it was not set. 85 85 */ 86 86 cpu->env.psw.mask ^= PSW_MASK_SHORTPSW; 87 - cpu->env.psw.addr = spsw & 0x7fffffffULL; 87 + cpu->env.psw.addr = spsw & PSW_MASK_SHORT_ADDR; 88 88 89 89 s390_cpu_set_state(S390_CPU_STATE_OPERATING, cpu); 90 90 } ··· 144 144 } 145 145 146 146 /* Reset state inside the kernel that we cannot access yet from QEMU. */ 147 - if (kvm_enabled() && type != S390_CPU_RESET_NORMAL) { 148 - kvm_s390_reset_vcpu(cpu); 147 + if (kvm_enabled()) { 148 + switch (type) { 149 + case S390_CPU_RESET_CLEAR: 150 + kvm_s390_reset_vcpu_clear(cpu); 151 + break; 152 + case S390_CPU_RESET_INITIAL: 153 + kvm_s390_reset_vcpu_initial(cpu); 154 + break; 155 + case S390_CPU_RESET_NORMAL: 156 + kvm_s390_reset_vcpu_normal(cpu); 157 + break; 158 + } 149 159 } 150 160 } 151 161

+2 -1

target/s390x/cpu.h

··· 276 276 #define PSW_MASK_RI 0x0000008000000000ULL 277 277 #define PSW_MASK_64 0x0000000100000000ULL 278 278 #define PSW_MASK_32 0x0000000080000000ULL 279 - #define PSW_MASK_ESA_ADDR 0x000000007fffffffULL 279 + #define PSW_MASK_SHORT_ADDR 0x000000007fffffffULL 280 + #define PSW_MASK_SHORT_CTRL 0xffffffff80000000ULL 280 281 281 282 #undef PSW_ASC_PRIMARY 282 283 #undef PSW_ASC_ACCREG

+1 -1

target/s390x/helper.c

··· 89 89 static inline bool is_special_wait_psw(uint64_t psw_addr) 90 90 { 91 91 /* signal quiesce */ 92 - return psw_addr == 0xfffUL; 92 + return (psw_addr & 0xfffUL) == 0xfffUL; 93 93 } 94 94 95 95 void s390_handle_wait(S390CPU *cpu)

+9 -1

target/s390x/kvm-stub.c

··· 83 83 { 84 84 } 85 85 86 - void kvm_s390_reset_vcpu(S390CPU *cpu) 86 + void kvm_s390_reset_vcpu_initial(S390CPU *cpu) 87 + { 88 + } 89 + 90 + void kvm_s390_reset_vcpu_clear(S390CPU *cpu) 91 + { 92 + } 93 + 94 + void kvm_s390_reset_vcpu_normal(S390CPU *cpu) 87 95 { 88 96 } 89 97

+34 -8

target/s390x/kvm.c

··· 151 151 static int cap_ri; 152 152 static int cap_gs; 153 153 static int cap_hpage_1m; 154 + static int cap_vcpu_resets; 154 155 155 156 static int active_cmma; 156 157 ··· 342 343 cap_async_pf = kvm_check_extension(s, KVM_CAP_ASYNC_PF); 343 344 cap_mem_op = kvm_check_extension(s, KVM_CAP_S390_MEM_OP); 344 345 cap_s390_irq = kvm_check_extension(s, KVM_CAP_S390_INJECT_IRQ); 346 + cap_vcpu_resets = kvm_check_extension(s, KVM_CAP_S390_VCPU_RESETS); 345 347 346 348 if (!kvm_check_extension(s, KVM_CAP_S390_GMAP) 347 349 || !kvm_check_extension(s, KVM_CAP_S390_COW)) { ··· 406 408 return 0; 407 409 } 408 410 409 - void kvm_s390_reset_vcpu(S390CPU *cpu) 411 + static void kvm_s390_reset_vcpu(S390CPU *cpu, unsigned long type) 410 412 { 411 413 CPUState *cs = CPU(cpu); 412 414 413 - /* The initial reset call is needed here to reset in-kernel 414 - * vcpu data that we can't access directly from QEMU 415 - * (i.e. with older kernels which don't support sync_regs/ONE_REG). 416 - * Before this ioctl cpu_synchronize_state() is called in common kvm 417 - * code (kvm-all) */ 418 - if (kvm_vcpu_ioctl(cs, KVM_S390_INITIAL_RESET, NULL)) { 419 - error_report("Initial CPU reset failed on CPU %i", cs->cpu_index); 415 + /* 416 + * The reset call is needed here to reset in-kernel vcpu data that 417 + * we can't access directly from QEMU (i.e. with older kernels 418 + * which don't support sync_regs/ONE_REG). Before this ioctl 419 + * cpu_synchronize_state() is called in common kvm code 420 + * (kvm-all). 421 + */ 422 + if (kvm_vcpu_ioctl(cs, type)) { 423 + error_report("CPU reset failed on CPU %i type %lx", 424 + cs->cpu_index, type); 425 + } 426 + } 427 + 428 + void kvm_s390_reset_vcpu_initial(S390CPU *cpu) 429 + { 430 + kvm_s390_reset_vcpu(cpu, KVM_S390_INITIAL_RESET); 431 + } 432 + 433 + void kvm_s390_reset_vcpu_clear(S390CPU *cpu) 434 + { 435 + if (cap_vcpu_resets) { 436 + kvm_s390_reset_vcpu(cpu, KVM_S390_CLEAR_RESET); 437 + } else { 438 + kvm_s390_reset_vcpu(cpu, KVM_S390_INITIAL_RESET); 439 + } 440 + } 441 + 442 + void kvm_s390_reset_vcpu_normal(S390CPU *cpu) 443 + { 444 + if (cap_vcpu_resets) { 445 + kvm_s390_reset_vcpu(cpu, KVM_S390_NORMAL_RESET); 420 446 } 421 447 } 422 448

+3 -1

target/s390x/kvm_s390x.h

··· 34 34 int vq, bool assign); 35 35 int kvm_s390_cmma_active(void); 36 36 void kvm_s390_cmma_reset(void); 37 - void kvm_s390_reset_vcpu(S390CPU *cpu); 37 + void kvm_s390_reset_vcpu_clear(S390CPU *cpu); 38 + void kvm_s390_reset_vcpu_normal(S390CPU *cpu); 39 + void kvm_s390_reset_vcpu_initial(S390CPU *cpu); 38 40 int kvm_s390_set_mem_limit(uint64_t new_limit, uint64_t *hw_limit); 39 41 void kvm_s390_set_max_pagesize(uint64_t pagesize, Error **errp); 40 42 void kvm_s390_crypto_reset(void);

+1 -1

target/s390x/translate.c

··· 3874 3874 3875 3875 /* Operate. */ 3876 3876 switch (s->fields.op2) { 3877 - case 0x55: /* AND */ 3877 + case 0x54: /* AND */ 3878 3878 tcg_gen_ori_i64(o->in2, o->in2, ~mask); 3879 3879 tcg_gen_and_i64(o->out, o->out, o->in2); 3880 3880 break;