qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

block: Exploit BDRV_BLOCK_EOF for larger zero blocks

When we have a BDS with unallocated clusters, but asking the status
of its underlying bs->file or backing layer encounters an end-of-file
condition, we know that the rest of the unallocated area will read as
zeroes. However, pre-patch, this required two separate calls to
bdrv_get_block_status(), as the first call stops at the point where
the underlying file ends. Thanks to BDRV_BLOCK_EOF, we can now widen
the results of the primary status if the secondary status already
includes BDRV_BLOCK_ZERO.

In turn, this fixes a TODO mentioned in iotest 154, where we can now
see that all sectors in a partial cluster at the end of a file read
as zero when coupling the shorter backing file's status along with our
knowledge that the remaining sectors came from an unallocated cluster.

Also, note that the loop in bdrv_co_get_block_status_above() had an
inefficent exit: in cases where the active layer sets BDRV_BLOCK_ZERO
but does NOT set BDRV_BLOCK_ALLOCATED (namely, where we know we read
zeroes merely because our unallocated clusters lie beyond the backing
file's shorter length), we still ended up probing the backing layer
even though we already had a good answer.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20170505021500.19315-3-eblake@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>

authored by

Eric Blake and committed by
Fam Zheng
c61e684e fb0d8654

+28 -15
+22 -5
block/io.c
··· 1803 1803 /* Ignore errors. This is just providing extra information, it 1804 1804 * is useful but not necessary. 1805 1805 */ 1806 - if (!file_pnum) { 1807 - /* !file_pnum indicates an offset at or beyond the EOF; it is 1808 - * perfectly valid for the format block driver to point to such 1809 - * offsets, so catch it and mark everything as zero */ 1806 + if (ret2 & BDRV_BLOCK_EOF && 1807 + (!file_pnum || ret2 & BDRV_BLOCK_ZERO)) { 1808 + /* 1809 + * It is valid for the format block driver to read 1810 + * beyond the end of the underlying file's current 1811 + * size; such areas read as zero. 1812 + */ 1810 1813 ret |= BDRV_BLOCK_ZERO; 1811 1814 } else { 1812 1815 /* Limit request to the range reported by the protocol driver */ ··· 1833 1836 { 1834 1837 BlockDriverState *p; 1835 1838 int64_t ret = 0; 1839 + bool first = true; 1836 1840 1837 1841 assert(bs != base); 1838 1842 for (p = bs; p != base; p = backing_bs(p)) { 1839 1843 ret = bdrv_co_get_block_status(p, sector_num, nb_sectors, pnum, file); 1840 - if (ret < 0 || ret & BDRV_BLOCK_ALLOCATED) { 1844 + if (ret < 0) { 1845 + break; 1846 + } 1847 + if (ret & BDRV_BLOCK_ZERO && ret & BDRV_BLOCK_EOF && !first) { 1848 + /* 1849 + * Reading beyond the end of the file continues to read 1850 + * zeroes, but we can only widen the result to the 1851 + * unallocated length we learned from an earlier 1852 + * iteration. 1853 + */ 1854 + *pnum = nb_sectors; 1855 + } 1856 + if (ret & (BDRV_BLOCK_ZERO | BDRV_BLOCK_DATA)) { 1841 1857 break; 1842 1858 } 1843 1859 /* [sector_num, pnum] unallocated on this layer, which could be only 1844 1860 * the first part of [sector_num, nb_sectors]. */ 1845 1861 nb_sectors = MIN(nb_sectors, *pnum); 1862 + first = false; 1846 1863 } 1847 1864 return ret; 1848 1865 }
-4
tests/qemu-iotests/154
··· 334 334 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map 335 335 336 336 # Repeat with backing file holding unallocated cluster. 337 - # TODO: Note that this forces an allocation, because we aren't yet able to 338 - # quickly detect that reads beyond EOF of the backing file are always zero 339 337 CLUSTER_SIZE=2048 TEST_IMG="$TEST_IMG.base" _make_test_img $((size + 1024)) 340 338 341 339 # Write at the front: sector-wise, the request is: ··· 371 369 $QEMU_IMG map --output=json "$TEST_IMG" | _filter_qemu_img_map 372 370 373 371 # Repeat with backing file holding zero'd cluster 374 - # TODO: Note that this forces an allocation, because we aren't yet able to 375 - # quickly detect that reads beyond EOF of the backing file are always zero 376 372 $QEMU_IO -c "write -z $size 512" "$TEST_IMG.base" | _filter_qemu_io 377 373 378 374 # Write at the front: sector-wise, the request is:
+6 -6
tests/qemu-iotests/154.out
··· 310 310 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 311 311 2048/2048 bytes allocated at offset 128 MiB 312 312 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 313 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 313 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 314 314 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 315 315 wrote 512/512 bytes at offset 134219264 316 316 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 317 317 2048/2048 bytes allocated at offset 128 MiB 318 318 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 319 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 319 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 320 320 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 321 321 wrote 1024/1024 bytes at offset 134218240 322 322 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 323 323 2048/2048 bytes allocated at offset 128 MiB 324 324 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 325 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 325 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 326 326 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 327 327 wrote 2048/2048 bytes at offset 134217728 328 328 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) ··· 336 336 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 337 337 2048/2048 bytes allocated at offset 128 MiB 338 338 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 339 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 339 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 340 340 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 341 341 wrote 512/512 bytes at offset 134219264 342 342 512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 343 343 2048/2048 bytes allocated at offset 128 MiB 344 344 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 345 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 345 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 346 346 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 347 347 wrote 1024/1024 bytes at offset 134218240 348 348 1 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec) 349 349 2048/2048 bytes allocated at offset 128 MiB 350 350 [{ "start": 0, "length": 134217728, "depth": 1, "zero": true, "data": false}, 351 - { "start": 134217728, "length": 2048, "depth": 0, "zero": false, "data": true, "offset": OFFSET}] 351 + { "start": 134217728, "length": 2048, "depth": 0, "zero": true, "data": false}] 352 352 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=134219776 backing_file=TEST_DIR/t.IMGFMT.base 353 353 wrote 2048/2048 bytes at offset 134217728 354 354 2 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)