qemu with hax to log dma reads & writes jcs.org/2018/11/12/vfio

migration/colo.c: Use event instead of semaphore

If multiple packets miscompare in a short timeframe, the semaphore
value will be increased multiple times. This causes multiple
checkpoints even if one would be sufficient.

Fix this by using a event instead of a semaphore for triggering
checkpoints. Now, checkpoint requests will be ignored until the
checkpoint event is sent to colo-compare (which releases the
miscompared packets).

Benchmark results (iperf3):
Client-to-server tcp:
without patch: ~66 Mbit/s
with patch: ~61 Mbit/s
Server-to-client tcp:
without patch: ~702 Kbit/s
with patch: ~16 Mbit/s

Signed-off-by: Lukas Straub <lukasstraub2@web.de>
Message-Id: <fd601ba1beb524aada54ba66e87ebfc12cf4574b.1589193382.git.lukasstraub2@web.de>
Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>

authored by

Lukas Straub and committed by
Dr. David Alan Gilbert
bb70b66e e0d138aa

+7 -6
+5 -4
migration/colo.c
··· 436 436 goto out; 437 437 } 438 438 439 + qemu_event_reset(&s->colo_checkpoint_event); 439 440 colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT, &local_err); 440 441 if (local_err) { 441 442 goto out; ··· 589 590 goto out; 590 591 } 591 592 592 - qemu_sem_wait(&s->colo_checkpoint_sem); 593 + qemu_event_wait(&s->colo_checkpoint_event); 593 594 594 595 if (s->state != MIGRATION_STATUS_COLO) { 595 596 goto out; ··· 637 638 colo_compare_unregister_notifier(&packets_compare_notifier); 638 639 timer_del(s->colo_delay_timer); 639 640 timer_free(s->colo_delay_timer); 640 - qemu_sem_destroy(&s->colo_checkpoint_sem); 641 + qemu_event_destroy(&s->colo_checkpoint_event); 641 642 642 643 /* 643 644 * Must be called after failover BH is completed, ··· 654 655 MigrationState *s = opaque; 655 656 int64_t next_notify_time; 656 657 657 - qemu_sem_post(&s->colo_checkpoint_sem); 658 + qemu_event_set(&s->colo_checkpoint_event); 658 659 s->colo_checkpoint_time = qemu_clock_get_ms(QEMU_CLOCK_HOST); 659 660 next_notify_time = s->colo_checkpoint_time + 660 661 s->parameters.x_checkpoint_delay; ··· 664 665 void migrate_start_colo_process(MigrationState *s) 665 666 { 666 667 qemu_mutex_unlock_iothread(); 667 - qemu_sem_init(&s->colo_checkpoint_sem, 0); 668 + qemu_event_init(&s->colo_checkpoint_event, false); 668 669 s->colo_delay_timer = timer_new_ms(QEMU_CLOCK_HOST, 669 670 colo_checkpoint_notify, s); 670 671
+2 -2
migration/migration.h
··· 215 215 /* The semaphore is used to notify COLO thread that failover is finished */ 216 216 QemuSemaphore colo_exit_sem; 217 217 218 - /* The semaphore is used to notify COLO thread to do checkpoint */ 219 - QemuSemaphore colo_checkpoint_sem; 218 + /* The event is used to notify COLO thread to do checkpoint */ 219 + QemuEvent colo_checkpoint_event; 220 220 int64_t colo_checkpoint_time; 221 221 QEMUTimer *colo_delay_timer; 222 222