qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

include/qemu/bswap.h: Use __builtin_memcpy() in accessor functions

In the accessor functions ld*_he_p() and st*_he_p() we use memcpy()
to perform a load or store to a pointer which might not be aligned
for the size of the type. We rely on the compiler to optimize this
memcpy() into an efficient load or store instruction where possible.
This is required for good performance, but at the moment it is also
required for correct operation, because some users of these functions
require that the access is atomic if the pointer is aligned, which
will only be the case if the compiler has optimized out the memcpy().
(The particular example where we discovered this is the virtio
vring_avail_idx() which calls virtio_lduw_phys_cached() which
eventually ends up calling lduw_he_p().)

Unfortunately some compile environments, such as the fortify-source
setup used in Alpine Linux, define memcpy() to a wrapper function
in a way that inhibits this compiler optimization.

The correct long-term fix here is to add a set of functions for
doing atomic accesses into AddressSpaces (and to other relevant
families of accessor functions like the virtio_*_phys_cached()
ones), and make sure that callsites which want atomic behaviour
use the correct functions.

In the meantime, switch to using __builtin_memcpy() in the
bswap.h accessor functions. This will make us robust against things
like this fortify library in the short term. In the longer term
it will mean that we don't end up with these functions being really
badly-performing even if the semantics of the out-of-line memcpy()
are correct.

Reported-by: Fernando Casas Schössow <casasfernando@outlook.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20190318112938.8298-1-peter.maydell@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

authored by

Peter Maydell and committed by
Paolo Bonzini
77b17570 1cab4641

+16 -10
+16 -10
include/qemu/bswap.h
··· 316 316 *(uint8_t *)ptr = v; 317 317 } 318 318 319 - /* Any compiler worth its salt will turn these memcpy into native unaligned 320 - operations. Thus we don't need to play games with packed attributes, or 321 - inline byte-by-byte stores. */ 319 + /* 320 + * Any compiler worth its salt will turn these memcpy into native unaligned 321 + * operations. Thus we don't need to play games with packed attributes, or 322 + * inline byte-by-byte stores. 323 + * Some compilation environments (eg some fortify-source implementations) 324 + * may intercept memcpy() in a way that defeats the compiler optimization, 325 + * though, so we use __builtin_memcpy() to give ourselves the best chance 326 + * of good performance. 327 + */ 322 328 323 329 static inline int lduw_he_p(const void *ptr) 324 330 { 325 331 uint16_t r; 326 - memcpy(&r, ptr, sizeof(r)); 332 + __builtin_memcpy(&r, ptr, sizeof(r)); 327 333 return r; 328 334 } 329 335 330 336 static inline int ldsw_he_p(const void *ptr) 331 337 { 332 338 int16_t r; 333 - memcpy(&r, ptr, sizeof(r)); 339 + __builtin_memcpy(&r, ptr, sizeof(r)); 334 340 return r; 335 341 } 336 342 337 343 static inline void stw_he_p(void *ptr, uint16_t v) 338 344 { 339 - memcpy(ptr, &v, sizeof(v)); 345 + __builtin_memcpy(ptr, &v, sizeof(v)); 340 346 } 341 347 342 348 static inline int ldl_he_p(const void *ptr) 343 349 { 344 350 int32_t r; 345 - memcpy(&r, ptr, sizeof(r)); 351 + __builtin_memcpy(&r, ptr, sizeof(r)); 346 352 return r; 347 353 } 348 354 349 355 static inline void stl_he_p(void *ptr, uint32_t v) 350 356 { 351 - memcpy(ptr, &v, sizeof(v)); 357 + __builtin_memcpy(ptr, &v, sizeof(v)); 352 358 } 353 359 354 360 static inline uint64_t ldq_he_p(const void *ptr) 355 361 { 356 362 uint64_t r; 357 - memcpy(&r, ptr, sizeof(r)); 363 + __builtin_memcpy(&r, ptr, sizeof(r)); 358 364 return r; 359 365 } 360 366 361 367 static inline void stq_he_p(void *ptr, uint64_t v) 362 368 { 363 - memcpy(ptr, &v, sizeof(v)); 369 + __builtin_memcpy(ptr, &v, sizeof(v)); 364 370 } 365 371 366 372 static inline int lduw_le_p(const void *ptr)