Git fork

fetch: use batched reference updates

The reference updates performed as a part of 'git-fetch(1)', take place
one at a time. For each reference update, a new transaction is created
and committed. This is necessary to ensure we can allow individual
updates to fail without failing the entire command. The command also
supports an '--atomic' mode, which uses a single transaction to update
all of the references. But this mode has an all-or-nothing approach,
where if a single update fails, all updates would fail.

In 23fc8e4f61 (refs: implement batch reference update support,
2025-04-08), we introduced a new mechanism to batch reference updates.
Under the hood, this uses a single transaction to perform a batch of
reference updates, while allowing only individual updates to fail.
Utilize this newly introduced batch update mechanism in 'git-fetch(1)'.
This provides a significant bump in performance, especially when dealing
with repositories with large number of references.

Adding support for batched updates is simply modifying the flow to also
create a batch update transaction in the non-atomic flow.

With the reftable backend there is a 22x performance improvement, when
performing 'git-fetch(1)' with 10000 refs:

Benchmark 1: fetch: many refs (refformat = reftable, refcount = 10000, revision = master)
Time (mean ± σ): 3.403 s ± 0.775 s [User: 1.875 s, System: 1.417 s]
Range (min … max): 2.454 s … 4.529 s 10 runs

Benchmark 2: fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD)
Time (mean ± σ): 154.3 ms ± 17.6 ms [User: 102.5 ms, System: 56.1 ms]
Range (min … max): 145.2 ms … 220.5 ms 18 runs

Summary
fetch: many refs (refformat = reftable, refcount = 10000, revision = HEAD) ran
22.06 ± 5.62 times faster than fetch: many refs (refformat = reftable, refcount = 10000, revision = master)

In similar conditions, the files backend sees a 1.25x performance
improvement:

Benchmark 1: fetch: many refs (refformat = files, refcount = 10000, revision = master)
Time (mean ± σ): 605.5 ms ± 9.4 ms [User: 117.8 ms, System: 483.3 ms]
Range (min … max): 595.6 ms … 621.5 ms 10 runs

Benchmark 2: fetch: many refs (refformat = files, refcount = 10000, revision = HEAD)
Time (mean ± σ): 485.8 ms ± 4.3 ms [User: 91.1 ms, System: 396.7 ms]
Range (min … max): 477.6 ms … 494.3 ms 10 runs

Summary
fetch: many refs (refformat = files, refcount = 10000, revision = HEAD) ran
1.25 ± 0.02 times faster than fetch: many refs (refformat = files, refcount = 10000, revision = master)

With this we'll either be using a regular transaction or a batch update
transaction. This helps cleanup some code which is no longer needed as
we'll now always have some type of 'ref_transaction' object being
propagated.

One big change is that earlier, each individual update would propagate a
failure. Whereas now, the `ref_transaction_for_each_rejected_update`
function is called at the end of the flow to capture the exit status for
'git-fetch(1)' and also to print F/D conflict errors. This does change
the order of the errors being printed, but the behavior stays the same.

Since transaction errors are now explicitly defined as part of
76e760b999 (refs: introduce enum-based transaction error types,
2025-04-08), utilize them and get rid of custom errors defined within
'builtin/fetch.c'.

Signed-off-by: Karthik Nayak <karthik.188@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Karthik Nayak and committed by
Junio C Hamano
0e358de6 b3de3832

+73 -54
+73 -54
builtin/fetch.c
··· 640 return ref_map; 641 } 642 643 - #define STORE_REF_ERROR_OTHER 1 644 - #define STORE_REF_ERROR_DF_CONFLICT 2 645 - 646 static int s_update_ref(const char *action, 647 struct ref *ref, 648 struct ref_transaction *transaction, ··· 650 { 651 char *msg; 652 char *rla = getenv("GIT_REFLOG_ACTION"); 653 - struct ref_transaction *our_transaction = NULL; 654 struct strbuf err = STRBUF_INIT; 655 int ret; 656 ··· 660 rla = default_rla.buf; 661 msg = xstrfmt("%s: %s", rla, action); 662 663 - /* 664 - * If no transaction was passed to us, we manage the transaction 665 - * ourselves. Otherwise, we trust the caller to handle the transaction 666 - * lifecycle. 667 - */ 668 - if (!transaction) { 669 - transaction = our_transaction = ref_store_transaction_begin(get_main_ref_store(the_repository), 670 - 0, &err); 671 - if (!transaction) { 672 - ret = STORE_REF_ERROR_OTHER; 673 - goto out; 674 - } 675 - } 676 - 677 ret = ref_transaction_update(transaction, ref->name, &ref->new_oid, 678 check_old ? &ref->old_oid : NULL, 679 NULL, NULL, 0, msg, &err); 680 - if (ret) { 681 - ret = STORE_REF_ERROR_OTHER; 682 - goto out; 683 - } 684 685 - if (our_transaction) { 686 - switch (ref_transaction_commit(our_transaction, &err)) { 687 - case 0: 688 - break; 689 - case REF_TRANSACTION_ERROR_NAME_CONFLICT: 690 - ret = STORE_REF_ERROR_DF_CONFLICT; 691 - goto out; 692 - default: 693 - ret = STORE_REF_ERROR_OTHER; 694 - goto out; 695 - } 696 - } 697 - 698 - out: 699 - ref_transaction_free(our_transaction); 700 if (ret) 701 error("%s", err.buf); 702 strbuf_release(&err); ··· 1139 "to avoid this check\n"); 1140 1141 static int store_updated_refs(struct display_state *display_state, 1142 - const char *remote_name, 1143 int connectivity_checked, 1144 struct ref_transaction *transaction, struct ref *ref_map, 1145 struct fetch_head *fetch_head, ··· 1277 } 1278 } 1279 1280 - if (rc & STORE_REF_ERROR_DF_CONFLICT) 1281 - error(_("some local refs could not be updated; try running\n" 1282 - " 'git remote prune %s' to remove any old, conflicting " 1283 - "branches"), remote_name); 1284 - 1285 if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) { 1286 if (!config->show_forced_updates) { 1287 warning(_(warn_show_forced_updates)); ··· 1365 } 1366 1367 trace2_region_enter("fetch", "consume_refs", the_repository); 1368 - ret = store_updated_refs(display_state, transport->remote->name, 1369 - connectivity_checked, transaction, ref_map, 1370 - fetch_head, config); 1371 trace2_region_leave("fetch", "consume_refs", the_repository); 1372 1373 out: ··· 1687 return result; 1688 } 1689 1690 static int do_fetch(struct transport *transport, 1691 struct refspec *rs, 1692 const struct fetch_config *config) ··· 1807 retcode = 1; 1808 } 1809 1810 if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map, 1811 &fetch_head, config)) { 1812 retcode = 1; ··· 1838 free_refs(tags_ref_map); 1839 } 1840 1841 - if (transaction) { 1842 - if (retcode) 1843 - goto cleanup; 1844 1845 - retcode = ref_transaction_commit(transaction, &err); 1846 if (retcode) { 1847 - /* 1848 - * Explicitly handle transaction cleanup to avoid 1849 - * aborting an already closed transaction. 1850 - */ 1851 ref_transaction_free(transaction); 1852 transaction = NULL; 1853 goto cleanup;
··· 640 return ref_map; 641 } 642 643 static int s_update_ref(const char *action, 644 struct ref *ref, 645 struct ref_transaction *transaction, ··· 647 { 648 char *msg; 649 char *rla = getenv("GIT_REFLOG_ACTION"); 650 struct strbuf err = STRBUF_INIT; 651 int ret; 652 ··· 656 rla = default_rla.buf; 657 msg = xstrfmt("%s: %s", rla, action); 658 659 ret = ref_transaction_update(transaction, ref->name, &ref->new_oid, 660 check_old ? &ref->old_oid : NULL, 661 NULL, NULL, 0, msg, &err); 662 663 if (ret) 664 error("%s", err.buf); 665 strbuf_release(&err); ··· 1102 "to avoid this check\n"); 1103 1104 static int store_updated_refs(struct display_state *display_state, 1105 int connectivity_checked, 1106 struct ref_transaction *transaction, struct ref *ref_map, 1107 struct fetch_head *fetch_head, ··· 1239 } 1240 } 1241 1242 if (advice_enabled(ADVICE_FETCH_SHOW_FORCED_UPDATES)) { 1243 if (!config->show_forced_updates) { 1244 warning(_(warn_show_forced_updates)); ··· 1322 } 1323 1324 trace2_region_enter("fetch", "consume_refs", the_repository); 1325 + ret = store_updated_refs(display_state, connectivity_checked, 1326 + transaction, ref_map, fetch_head, config); 1327 trace2_region_leave("fetch", "consume_refs", the_repository); 1328 1329 out: ··· 1643 return result; 1644 } 1645 1646 + struct ref_rejection_data { 1647 + int *retcode; 1648 + int conflict_msg_shown; 1649 + const char *remote_name; 1650 + }; 1651 + 1652 + static void ref_transaction_rejection_handler(const char *refname, 1653 + const struct object_id *old_oid UNUSED, 1654 + const struct object_id *new_oid UNUSED, 1655 + const char *old_target UNUSED, 1656 + const char *new_target UNUSED, 1657 + enum ref_transaction_error err, 1658 + void *cb_data) 1659 + { 1660 + struct ref_rejection_data *data = cb_data; 1661 + 1662 + if (err == REF_TRANSACTION_ERROR_NAME_CONFLICT && !data->conflict_msg_shown) { 1663 + error(_("some local refs could not be updated; try running\n" 1664 + " 'git remote prune %s' to remove any old, conflicting " 1665 + "branches"), data->remote_name); 1666 + data->conflict_msg_shown = 1; 1667 + } else { 1668 + const char *reason = ref_transaction_error_msg(err); 1669 + 1670 + error(_("fetching ref %s failed: %s"), refname, reason); 1671 + } 1672 + 1673 + *data->retcode = 1; 1674 + } 1675 + 1676 static int do_fetch(struct transport *transport, 1677 struct refspec *rs, 1678 const struct fetch_config *config) ··· 1793 retcode = 1; 1794 } 1795 1796 + /* 1797 + * If not atomic, we can still use batched updates, which would be much 1798 + * more performant. We don't initiate the transaction before pruning, 1799 + * since pruning must be an independent step, to avoid F/D conflicts. 1800 + * 1801 + * TODO: if reference transactions gain logical conflict resolution, we 1802 + * can delete and create refs (with F/D conflicts) in the same transaction 1803 + * and this can be moved above the 'prune_refs()' block. 1804 + */ 1805 + if (!transaction) { 1806 + transaction = ref_store_transaction_begin(get_main_ref_store(the_repository), 1807 + REF_TRANSACTION_ALLOW_FAILURE, &err); 1808 + if (!transaction) { 1809 + retcode = -1; 1810 + goto cleanup; 1811 + } 1812 + } 1813 + 1814 if (fetch_and_consume_refs(&display_state, transport, transaction, ref_map, 1815 &fetch_head, config)) { 1816 retcode = 1; ··· 1842 free_refs(tags_ref_map); 1843 } 1844 1845 + if (retcode) 1846 + goto cleanup; 1847 + 1848 + retcode = ref_transaction_commit(transaction, &err); 1849 + if (retcode) { 1850 + /* 1851 + * Explicitly handle transaction cleanup to avoid 1852 + * aborting an already closed transaction. 1853 + */ 1854 + ref_transaction_free(transaction); 1855 + transaction = NULL; 1856 + goto cleanup; 1857 + } 1858 1859 + if (!atomic_fetch) { 1860 + struct ref_rejection_data data = { 1861 + .retcode = &retcode, 1862 + .conflict_msg_shown = 0, 1863 + .remote_name = transport->remote->name, 1864 + }; 1865 + 1866 + ref_transaction_for_each_rejected_update(transaction, 1867 + ref_transaction_rejection_handler, 1868 + &data); 1869 if (retcode) { 1870 ref_transaction_free(transaction); 1871 transaction = NULL; 1872 goto cleanup;