Git fork

refs/files: use heuristic to decide whether to repack with `--auto`

The `--auto` flag for git-pack-refs(1) allows the ref backend to decide
whether or not a repack is in order. This switch has been introduced
mostly with the "reftable" backend in mind, which already knows to
auto-compact its tables during normal operations. When the flag is set,
then it will use the same auto-compaction mechanism and thus end up
doing nothing in most cases.

The "files" backend does not have any such heuristic yet and instead
packs any loose references unconditionally. So we rewrite the complete
"packed-refs" file even if there's only a single loose reference to be
packed.

Even worse, starting with 9f6714ab3e (builtin/gc: pack refs when using
`git maintenance run --auto`, 2024-03-25), `git pack-refs --auto` is
unconditionally executed via our auto maintenance, so we end up repacking
references every single time auto maintenance kicks in. And while that
commit already mentioned that the "files" backend unconditionally packs
refs now, the author obviously didn't quite think about the consequences
thereof. So while the idea was sound, we really should have added a
heuristic to the "files" backend before implementing it.

Introduce a heuristic that decides whether or not it is worth to pack
loose references. The important factors to decide here are the number of
loose references in comparison to the overall size of the "packed-refs"
file. The bigger the "packed-refs" file, the longer it takes to rewrite
it and thus we scale up the limit of allowed loose references before we
repack.

As is the nature of heuristics, this mechansim isn't obviously
"correct", but should rather be seen as a tradeoff between how much
resources we spend packing refs and how inefficient the ref store
becomes. For all I can say, we have successfully been using the exact
same heuristic in Gitaly for several years by now.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Patrick Steinhardt and committed by
Junio C Hamano
c3459ae9 bd51dca3

+164 -9
+65
refs/files-backend.c
··· 1300 1300 return 0; 1301 1301 } 1302 1302 1303 + static int should_pack_refs(struct files_ref_store *refs, 1304 + struct pack_refs_opts *opts) 1305 + { 1306 + struct ref_iterator *iter; 1307 + size_t packed_size; 1308 + size_t refcount = 0; 1309 + size_t limit; 1310 + int ret; 1311 + 1312 + if (!(opts->flags & PACK_REFS_AUTO)) 1313 + return 1; 1314 + 1315 + ret = packed_refs_size(refs->packed_ref_store, &packed_size); 1316 + if (ret < 0) 1317 + die("cannot determine packed-refs size"); 1318 + 1319 + /* 1320 + * Packing loose references into the packed-refs file scales with the 1321 + * number of references we're about to write. We thus decide whether we 1322 + * repack refs by weighing the current size of the packed-refs file 1323 + * against the number of loose references. This is done such that we do 1324 + * not repack too often on repositories with a huge number of 1325 + * references, where we can expect a lot of churn in the number of 1326 + * references. 1327 + * 1328 + * As a heuristic, we repack if the number of loose references in the 1329 + * repository exceeds `log2(nr_packed_refs) * 5`, where we estimate 1330 + * `nr_packed_refs = packed_size / 100`, which scales as following: 1331 + * 1332 + * - 1kB ~ 10 packed refs: 16 refs 1333 + * - 10kB ~ 100 packed refs: 33 refs 1334 + * - 100kB ~ 1k packed refs: 49 refs 1335 + * - 1MB ~ 10k packed refs: 66 refs 1336 + * - 10MB ~ 100k packed refs: 82 refs 1337 + * - 100MB ~ 1m packed refs: 99 refs 1338 + * 1339 + * We thus allow roughly 16 additional loose refs per factor of ten of 1340 + * packed refs. This heuristic may be tweaked in the future, but should 1341 + * serve as a sufficiently good first iteration. 1342 + */ 1343 + limit = log2u(packed_size / 100) * 5; 1344 + if (limit < 16) 1345 + limit = 16; 1346 + 1347 + iter = cache_ref_iterator_begin(get_loose_ref_cache(refs, 0), NULL, 1348 + refs->base.repo, 0); 1349 + while ((ret = ref_iterator_advance(iter)) == ITER_OK) { 1350 + if (should_pack_ref(refs, iter->refname, iter->oid, 1351 + iter->flags, opts)) 1352 + refcount++; 1353 + if (refcount >= limit) { 1354 + ref_iterator_abort(iter); 1355 + return 1; 1356 + } 1357 + } 1358 + 1359 + if (ret != ITER_DONE) 1360 + die("error while iterating over references"); 1361 + 1362 + return 0; 1363 + } 1364 + 1303 1365 static int files_pack_refs(struct ref_store *ref_store, 1304 1366 struct pack_refs_opts *opts) 1305 1367 { ··· 1311 1373 struct ref_to_prune *refs_to_prune = NULL; 1312 1374 struct strbuf err = STRBUF_INIT; 1313 1375 struct ref_transaction *transaction; 1376 + 1377 + if (!should_pack_refs(refs, opts)) 1378 + return 0; 1314 1379 1315 1380 transaction = ref_store_transaction_begin(refs->packed_ref_store, &err); 1316 1381 if (!transaction)
+18
refs/packed-backend.c
··· 1250 1250 return is_lock_file_locked(&refs->lock); 1251 1251 } 1252 1252 1253 + int packed_refs_size(struct ref_store *ref_store, 1254 + size_t *out) 1255 + { 1256 + struct packed_ref_store *refs = packed_downcast(ref_store, REF_STORE_READ, 1257 + "packed_refs_size"); 1258 + struct stat st; 1259 + 1260 + if (stat(refs->path, &st) < 0) { 1261 + if (errno != ENOENT) 1262 + return -1; 1263 + *out = 0; 1264 + return 0; 1265 + } 1266 + 1267 + *out = st.st_size; 1268 + return 0; 1269 + } 1270 + 1253 1271 /* 1254 1272 * The packed-refs header line that we write out. Perhaps other traits 1255 1273 * will be added later.
+7
refs/packed-backend.h
··· 28 28 int packed_refs_is_locked(struct ref_store *ref_store); 29 29 30 30 /* 31 + * Obtain the size of the `packed-refs` file. Reports `0` as size in case there 32 + * is no packed-refs file. Returns 0 on success, negative otherwise. 33 + */ 34 + int packed_refs_size(struct ref_store *ref_store, 35 + size_t *out); 36 + 37 + /* 31 38 * Return true if `transaction` really needs to be carried out against 32 39 * the specified packed_ref_store, or false if it can be skipped 33 40 * (i.e., because it is an obvious NOOP). `ref_store` must be locked
+74 -9
t/t0601-reffiles-pack-refs.sh
··· 362 362 363 363 for command in "git pack-refs --all --auto" "git maintenance run --task=pack-refs --auto" 364 364 do 365 - test_expect_success "$command unconditionally packs loose refs" ' 365 + test_expect_success "$command does not repack below 16 refs without packed-refs" ' 366 + test_when_finished "rm -rf repo" && 367 + git init repo && 368 + ( 369 + cd repo && 370 + git config set maintenance.auto false && 371 + git commit --allow-empty --message "initial" && 372 + 373 + # Create 14 additional references, which brings us to 374 + # 15 together with the default branch. 375 + printf "create refs/heads/loose-%d HEAD\n" $(test_seq 14) >stdin && 376 + git update-ref --stdin <stdin && 377 + test_path_is_missing .git/packed-refs && 378 + git pack-refs --auto --all && 379 + test_path_is_missing .git/packed-refs && 380 + 381 + # Create the 16th reference, which should cause us to repack. 382 + git update-ref refs/heads/loose-15 HEAD && 383 + git pack-refs --auto --all && 384 + test_path_is_file .git/packed-refs 385 + ) 386 + ' 387 + 388 + test_expect_success "$command does not repack below 16 refs with small packed-refs" ' 389 + test_when_finished "rm -rf repo" && 390 + git init repo && 391 + ( 392 + cd repo && 393 + git config set maintenance.auto false && 394 + git commit --allow-empty --message "initial" && 395 + 396 + git pack-refs --all && 397 + test_line_count = 2 .git/packed-refs && 398 + 399 + # Create 15 loose references. 400 + printf "create refs/heads/loose-%d HEAD\n" $(test_seq 15) >stdin && 401 + git update-ref --stdin <stdin && 402 + git pack-refs --auto --all && 403 + test_line_count = 2 .git/packed-refs && 404 + 405 + # Create the 16th loose reference, which should cause us to repack. 406 + git update-ref refs/heads/loose-17 HEAD && 407 + git pack-refs --auto --all && 408 + test_line_count = 18 .git/packed-refs 409 + ) 410 + ' 411 + 412 + test_expect_success "$command scales with size of packed-refs" ' 366 413 test_when_finished "rm -rf repo" && 367 414 git init repo && 368 415 ( 369 416 cd repo && 370 417 git config set maintenance.auto false && 371 - test_commit initial && 418 + git commit --allow-empty --message "initial" && 419 + 420 + # Create 99 packed refs. This should cause the heuristic 421 + # to require more than the minimum amount of loose refs. 422 + test_seq 99 | 423 + while read i 424 + do 425 + printf "create refs/heads/packed-%d HEAD\n" $i || return 1 426 + done >stdin && 427 + git update-ref --stdin <stdin && 428 + git pack-refs --all && 429 + test_line_count = 101 .git/packed-refs && 430 + 431 + # Create 24 loose refs, which should not yet cause us to repack. 432 + printf "create refs/heads/loose-%d HEAD\n" $(test_seq 24) >stdin && 433 + git update-ref --stdin <stdin && 434 + git pack-refs --auto --all && 435 + test_line_count = 101 .git/packed-refs && 372 436 373 - git update-ref refs/heads/something HEAD && 374 - test_path_is_file .git/refs/heads/something && 375 - git rev-parse refs/heads/something >expect && 376 - $command && 377 - test_path_is_missing .git/refs/heads/something && 378 - git rev-parse refs/heads/something >actual && 379 - test_cmp expect actual 437 + # Create another handful of refs to cross the border. 438 + # Note that we explicitly do not check for strict 439 + # boundaries here, as this also depends on the size of 440 + # the object hash. 441 + printf "create refs/heads/addn-%d HEAD\n" $(test_seq 10) >stdin && 442 + git update-ref --stdin <stdin && 443 + git pack-refs --auto --all && 444 + test_line_count = 135 .git/packed-refs 380 445 ) 381 446 ' 382 447 done