Git fork

bundle-uri: download in creationToken order

The creationToken heuristic provides an ordering on the bundles
advertised by a bundle list. Teach the Git client to download bundles
differently when this heuristic is advertised.

The bundles in the list are sorted by their advertised creationToken
values, then downloaded in decreasing order. This avoids the previous
strategy of downloading bundles in an arbitrary order and attempting
to apply them (likely failing in the case of required commits) until
discovering the order through attempted unbundling.

During a fresh 'git clone', it may make sense to download the bundles in
increasing order, since that would prevent the need to attempt
unbundling a bundle with required commits that do not exist in our empty
object store. The cost of testing an unbundle is quite low, and instead
the chosen order is optimizing for a future bundle download during a
'git fetch' operation with a non-empty object store.

Since the Git client continues fetching from the Git remote after
downloading and unbundling bundles, the client's object store can be
ahead of the bundle provider's object store. The next time it attempts
to download from the bundle list, it makes most sense to download only
the most-recent bundles until all tips successfully unbundle. The
strategy implemented here provides that short-circuit where the client
downloads a minimal set of bundles.

However, we are not satisfied by the naive approach of downloading
bundles until one successfully unbundles, expecting the earlier bundles
to successfully unbundle now. The example repository in t5558
demonstrates this well:

---------------- bundle-4

4
/ \
----|---|------- bundle-3
| |
| 3
| |
----|---|------- bundle-2
| |
2 |
| |
----|---|------- bundle-1
\ /
1
|
(previous commits)

In this repository, if we already have the objects for bundle-1 and then
try to fetch from this list, the naive approach will fail. bundle-4
requires both bundle-3 and bundle-2, though bundle-3 will successfully
unbundle without bundle-2. Thus, the algorithm needs to keep this in
mind.

A later implementation detail will store the maximum creationToken seen
during such a bundle download, and the client will avoid downloading a
bundle unless its creationToken is strictly greater than that stored
value. For now, if the client seeks to download from an identical
bundle list since its previous download, it will download the
most-recent bundle then stop since its required commits are already in
the object store.

Add tests that exercise this behavior, but we will expand upon these
tests when incremental downloads during 'git fetch' make use of
creationToken values.

Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Derrick Stolee and committed by
Junio C Hamano
7903efb7 512fccf8

+233 -9
+154 -2
bundle-uri.c
··· 447 return 0; 448 } 449 450 static int download_bundle_list(struct repository *r, 451 struct bundle_list *local_list, 452 struct bundle_list *global_list, ··· 484 goto cleanup; 485 } 486 487 - if ((result = download_bundle_list(r, &list_from_bundle, 488 global_list, depth))) 489 goto cleanup; 490 ··· 626 int result; 627 struct bundle_list global_list; 628 629 init_bundle_list(&global_list); 630 631 /* If a bundle is added to this global list, then it is required. */ ··· 634 if ((result = download_bundle_list(r, list, &global_list, 0))) 635 goto cleanup; 636 637 - result = unbundle_all_bundles(r, &global_list); 638 639 cleanup: 640 for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
··· 447 return 0; 448 } 449 450 + struct bundles_for_sorting { 451 + struct remote_bundle_info **items; 452 + size_t alloc; 453 + size_t nr; 454 + }; 455 + 456 + static int append_bundle(struct remote_bundle_info *bundle, void *data) 457 + { 458 + struct bundles_for_sorting *list = data; 459 + list->items[list->nr++] = bundle; 460 + return 0; 461 + } 462 + 463 + /** 464 + * For use in QSORT() to get a list sorted by creationToken 465 + * in decreasing order. 466 + */ 467 + static int compare_creation_token_decreasing(const void *va, const void *vb) 468 + { 469 + const struct remote_bundle_info * const *a = va; 470 + const struct remote_bundle_info * const *b = vb; 471 + 472 + if ((*a)->creationToken > (*b)->creationToken) 473 + return -1; 474 + if ((*a)->creationToken < (*b)->creationToken) 475 + return 1; 476 + return 0; 477 + } 478 + 479 + static int fetch_bundles_by_token(struct repository *r, 480 + struct bundle_list *list) 481 + { 482 + int cur; 483 + int move_direction = 0; 484 + struct bundle_list_context ctx = { 485 + .r = r, 486 + .list = list, 487 + .mode = list->mode, 488 + }; 489 + struct bundles_for_sorting bundles = { 490 + .alloc = hashmap_get_size(&list->bundles), 491 + }; 492 + 493 + ALLOC_ARRAY(bundles.items, bundles.alloc); 494 + 495 + for_all_bundles_in_list(list, append_bundle, &bundles); 496 + 497 + QSORT(bundles.items, bundles.nr, compare_creation_token_decreasing); 498 + 499 + /* 500 + * Attempt to download and unbundle the minimum number of bundles by 501 + * creationToken in decreasing order. If we fail to unbundle (after 502 + * a successful download) then move to the next non-downloaded bundle 503 + * and attempt downloading. Once we succeed in applying a bundle, 504 + * move to the previous unapplied bundle and attempt to unbundle it 505 + * again. 506 + * 507 + * In the case of a fresh clone, we will likely download all of the 508 + * bundles before successfully unbundling the oldest one, then the 509 + * rest of the bundles unbundle successfully in increasing order 510 + * of creationToken. 511 + * 512 + * If there are existing objects, then this process may terminate 513 + * early when all required commits from "new" bundles exist in the 514 + * repo's object store. 515 + */ 516 + cur = 0; 517 + while (cur >= 0 && cur < bundles.nr) { 518 + struct remote_bundle_info *bundle = bundles.items[cur]; 519 + if (!bundle->file) { 520 + /* 521 + * Not downloaded yet. Try downloading. 522 + * 523 + * Note that bundle->file is non-NULL if a download 524 + * was attempted, even if it failed to download. 525 + */ 526 + if (fetch_bundle_uri_internal(ctx.r, bundle, ctx.depth + 1, ctx.list)) { 527 + /* Mark as unbundled so we do not retry. */ 528 + bundle->unbundled = 1; 529 + 530 + /* Try looking deeper in the list. */ 531 + move_direction = 1; 532 + goto move; 533 + } 534 + 535 + /* We expect bundles when using creationTokens. */ 536 + if (!is_bundle(bundle->file, 1)) { 537 + warning(_("file downloaded from '%s' is not a bundle"), 538 + bundle->uri); 539 + break; 540 + } 541 + } 542 + 543 + if (bundle->file && !bundle->unbundled) { 544 + /* 545 + * This was downloaded, but not successfully 546 + * unbundled. Try unbundling again. 547 + */ 548 + if (unbundle_from_file(ctx.r, bundle->file)) { 549 + /* Try looking deeper in the list. */ 550 + move_direction = 1; 551 + } else { 552 + /* 553 + * Succeeded in unbundle. Retry bundles 554 + * that previously failed to unbundle. 555 + */ 556 + move_direction = -1; 557 + bundle->unbundled = 1; 558 + } 559 + } 560 + 561 + /* 562 + * Else case: downloaded and unbundled successfully. 563 + * Skip this by moving in the same direction as the 564 + * previous step. 565 + */ 566 + 567 + move: 568 + /* Move in the specified direction and repeat. */ 569 + cur += move_direction; 570 + } 571 + 572 + free(bundles.items); 573 + 574 + /* 575 + * We succeed if the loop terminates because 'cur' drops below 576 + * zero. The other case is that we terminate because 'cur' 577 + * reaches the end of the list, so we have a failure no matter 578 + * which bundles we apply from the list. 579 + */ 580 + return cur >= 0; 581 + } 582 + 583 static int download_bundle_list(struct repository *r, 584 struct bundle_list *local_list, 585 struct bundle_list *global_list, ··· 617 goto cleanup; 618 } 619 620 + /* 621 + * If this list uses the creationToken heuristic, then the URIs 622 + * it advertises are expected to be bundles, not nested lists. 623 + * We can drop 'global_list' and 'depth'. 624 + */ 625 + if (list_from_bundle.heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) { 626 + result = fetch_bundles_by_token(r, &list_from_bundle); 627 + global_list->heuristic = BUNDLE_HEURISTIC_CREATIONTOKEN; 628 + } else if ((result = download_bundle_list(r, &list_from_bundle, 629 global_list, depth))) 630 goto cleanup; 631 ··· 767 int result; 768 struct bundle_list global_list; 769 770 + /* 771 + * If the creationToken heuristic is used, then the URIs 772 + * advertised by 'list' are not nested lists and instead 773 + * direct bundles. We do not need to use global_list. 774 + */ 775 + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) 776 + return fetch_bundles_by_token(r, list); 777 + 778 init_bundle_list(&global_list); 779 780 /* If a bundle is added to this global list, then it is required. */ ··· 783 if ((result = download_bundle_list(r, list, &global_list, 0))) 784 goto cleanup; 785 786 + if (list->heuristic == BUNDLE_HEURISTIC_CREATIONTOKEN) 787 + result = fetch_bundles_by_token(r, list); 788 + else 789 + result = unbundle_all_bundles(r, &global_list); 790 791 cleanup: 792 for_all_bundles_in_list(&global_list, unlink_bundle, NULL);
+33 -7
t/t5558-clone-bundle-uri.sh
··· 401 git -C clone-list-http-2 cat-file --batch-check <oids && 402 403 cat >expect <<-EOF && 404 - $HTTPD_URL/bundle-1.bundle 405 - $HTTPD_URL/bundle-2.bundle 406 $HTTPD_URL/bundle-3.bundle 407 - $HTTPD_URL/bundle-4.bundle 408 $HTTPD_URL/bundle-list 409 EOF 410 411 - # Since the creationToken heuristic is not yet understood by the 412 - # client, the order cannot be verified at this moment. Sort the 413 - # list for consistent results. 414 - test_remote_https_urls <trace-clone.txt | sort >actual && 415 test_cmp expect actual 416 ' 417
··· 401 git -C clone-list-http-2 cat-file --batch-check <oids && 402 403 cat >expect <<-EOF && 404 + $HTTPD_URL/bundle-list 405 + $HTTPD_URL/bundle-4.bundle 406 $HTTPD_URL/bundle-3.bundle 407 + $HTTPD_URL/bundle-2.bundle 408 + $HTTPD_URL/bundle-1.bundle 409 + EOF 410 + 411 + test_remote_https_urls <trace-clone.txt >actual && 412 + test_cmp expect actual 413 + ' 414 + 415 + test_expect_success 'clone incomplete bundle list (http, creationToken)' ' 416 + test_when_finished rm -f trace*.txt && 417 + 418 + cp clone-from/bundle-*.bundle "$HTTPD_DOCUMENT_ROOT_PATH/" && 419 + cat >"$HTTPD_DOCUMENT_ROOT_PATH/bundle-list" <<-EOF && 420 + [bundle] 421 + version = 1 422 + mode = all 423 + heuristic = creationToken 424 + 425 + [bundle "bundle-1"] 426 + uri = bundle-1.bundle 427 + creationToken = 1 428 + EOF 429 + 430 + GIT_TRACE2_EVENT=$(pwd)/trace-clone.txt \ 431 + git clone --bundle-uri="$HTTPD_URL/bundle-list" \ 432 + --single-branch --branch=base --no-tags \ 433 + "$HTTPD_URL/smart/fetch.git" clone-token-http && 434 + 435 + cat >expect <<-EOF && 436 $HTTPD_URL/bundle-list 437 + $HTTPD_URL/bundle-1.bundle 438 EOF 439 440 + test_remote_https_urls <trace-clone.txt >actual && 441 test_cmp expect actual 442 ' 443
+46
t/t5601-clone.sh
··· 831 grep -f pattern trace.txt 832 ' 833 834 # DO NOT add non-httpd-specific tests here, because the last part of this 835 # test script is only executed when httpd is available and enabled. 836
··· 831 grep -f pattern trace.txt 832 ' 833 834 + test_expect_success 'auto-discover multiple bundles from HTTP clone: creationToken heuristic' ' 835 + test_when_finished rm -rf "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && 836 + test_when_finished rm -rf clone-heuristic trace*.txt && 837 + 838 + test_commit -C src newest && 839 + git -C src bundle create "$HTTPD_DOCUMENT_ROOT_PATH/newest.bundle" HEAD~1..HEAD && 840 + git clone --bare --no-local src "$HTTPD_DOCUMENT_ROOT_PATH/repo4.git" && 841 + 842 + cat >>"$HTTPD_DOCUMENT_ROOT_PATH/repo4.git/config" <<-EOF && 843 + [uploadPack] 844 + advertiseBundleURIs = true 845 + 846 + [bundle] 847 + version = 1 848 + mode = all 849 + heuristic = creationToken 850 + 851 + [bundle "everything"] 852 + uri = $HTTPD_URL/everything.bundle 853 + creationtoken = 1 854 + 855 + [bundle "new"] 856 + uri = $HTTPD_URL/new.bundle 857 + creationtoken = 2 858 + 859 + [bundle "newest"] 860 + uri = $HTTPD_URL/newest.bundle 861 + creationtoken = 3 862 + EOF 863 + 864 + GIT_TRACE2_EVENT="$(pwd)/trace-clone.txt" \ 865 + git -c protocol.version=2 \ 866 + -c transfer.bundleURI=true clone \ 867 + "$HTTPD_URL/smart/repo4.git" clone-heuristic && 868 + 869 + cat >expect <<-EOF && 870 + $HTTPD_URL/newest.bundle 871 + $HTTPD_URL/new.bundle 872 + $HTTPD_URL/everything.bundle 873 + EOF 874 + 875 + # We should fetch all bundles in the expected order. 876 + test_remote_https_urls <trace-clone.txt >actual && 877 + test_cmp expect actual 878 + ' 879 + 880 # DO NOT add non-httpd-specific tests here, because the last part of this 881 # test script is only executed when httpd is available and enabled. 882