Git fork

repack: add --path-walk option

Since 'git pack-objects' supports a --path-walk option, allow passing it
through in 'git repack'. This presents interesting testing opportunities for
comparing the different repacking strategies against each other.

Add the --path-walk option to the performance tests in p5313.

For the microsoft/fluentui repo [1] checked out at a specific commit [2],
the --path-walk tests in p5313 look like this:

Test this tree
-------------------------------------------------------------------------
5313.18: thin pack with --path-walk 0.08(0.06+0.02)
5313.19: thin pack size with --path-walk 18.4K
5313.20: big pack with --path-walk 2.10(7.80+0.26)
5313.21: big pack size with --path-walk 19.8M
5313.22: shallow fetch pack with --path-walk 1.62(3.38+0.17)
5313.23: shallow pack size with --path-walk 33.6M
5313.24: repack with --path-walk 81.29(96.08+0.71)
5313.25: repack size with --path-walk 142.5M

[1] https://github.com/microsoft/fluentui
[2] e70848ebac1cd720875bccaa3026f4a9ed700e08

Along with the earlier tests in p5313, I'll instead reformat the
comparison as follows:

Repack Method Pack Size Time
---------------------------------------
Hash v1 439.4M 87.24s
Hash v2 161.7M 21.51s
Path Walk 142.5M 81.29s

There are a few things to notice here:

1. The benefits of --name-hash-version=2 over --name-hash-version=1 are
significant, but --path-walk still compresses better than that
option.

2. The --path-walk command is still using --name-hash-version=1 for the
second pass of delta computation, using the increased name hash
collisions as a potential method for opportunistic compression on
top of the path-focused compression.

3. The --path-walk algorithm is currently sequential and does not use
multiple threads for delta compression. Threading will be
implemented in a future change so the computation time will improve
to better compete in this metric.

There are small benefits in size for my copy of the Git repository:

Repack Method Pack Size Time
---------------------------------------
Hash v1 248.8M 30.44s
Hash v2 249.0M 30.15s
Path Walk 213.2M 142.50s

As well as in the nodejs/node repository [3]:

Repack Method Pack Size Time
---------------------------------------
Hash v1 739.9M 71.18s
Hash v2 764.6M 67.82s
Path Walk 698.1M 208.10s

[3] https://github.com/nodejs/node

This benefit also repeats in my copy of the Linux kernel repository:

Repack Method Pack Size Time
---------------------------------------
Hash v1 2.5G 554.41s
Hash v2 2.5G 549.62s
Path Walk 2.2G 1562.36s

It is important to see that even when the repository shape does not have
many name-hash collisions, there is a slight space boost to be found
using this method.

As this repacking strategy was released in Git for Windows 2.47.0, some
users have reported cases where the --path-walk compression is slightly
worse than the --name-hash-version=2 option. In those cases, it may be
beneficial to combine the two options. However, there has not been a
released version of Git that has both options and I don't have access to
these repos for testing.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Derrick Stolee and committed by
Junio C Hamano
5f711504 6e95bf80

+18 -12
+4 -1
Documentation/git-repack.adoc
··· 11 11 [verse] 12 12 'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m] 13 13 [--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>] 14 - [--write-midx] [--name-hash-version=<n>] 14 + [--write-midx] [--name-hash-version=<n>] [--path-walk] 15 15 16 16 DESCRIPTION 17 17 ----------- ··· 255 255 Provide this argument to the underlying `git pack-objects` process. 256 256 See linkgit:git-pack-objects[1] for full details. 257 257 258 + --path-walk:: 259 + Pass the `--path-walk` option to the underlying `git pack-objects` 260 + process. See linkgit:git-pack-objects[1] for full details. 258 261 259 262 CONFIGURATION 260 263 -------------
+6 -1
builtin/repack.c
··· 43 43 static const char *const git_repack_usage[] = { 44 44 N_("git repack [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [-m]\n" 45 45 "[--window=<n>] [--depth=<n>] [--threads=<n>] [--keep-pack=<pack-name>]\n" 46 - "[--write-midx] [--name-hash-version=<n>]"), 46 + "[--write-midx] [--name-hash-version=<n>] [--path-walk]"), 47 47 NULL 48 48 }; 49 49 ··· 63 63 int quiet; 64 64 int local; 65 65 int name_hash_version; 66 + int path_walk; 66 67 struct list_objects_filter_options filter_options; 67 68 }; 68 69 ··· 313 314 strvec_pushf(&cmd->args, "--no-reuse-object"); 314 315 if (args->name_hash_version) 315 316 strvec_pushf(&cmd->args, "--name-hash-version=%d", args->name_hash_version); 317 + if (args->path_walk) 318 + strvec_pushf(&cmd->args, "--path-walk"); 316 319 if (args->local) 317 320 strvec_push(&cmd->args, "--local"); 318 321 if (args->quiet) ··· 1212 1215 N_("pass --no-reuse-object to git-pack-objects")), 1213 1216 OPT_INTEGER(0, "name-hash-version", &po_args.name_hash_version, 1214 1217 N_("specify the name hash version to use for grouping similar objects by path")), 1218 + OPT_BOOL(0, "path-walk", &po_args.path_walk, 1219 + N_("pass --path-walk to git-pack-objects")), 1215 1220 OPT_NEGBIT('n', NULL, &run_update_server_info, 1216 1221 N_("do not run git-update-server-info"), 1), 1217 1222 OPT__QUIET(&po_args.quiet, N_("be quiet")),
+8 -10
t/perf/p5313-pack-objects.sh
··· 55 55 test_size "shallow pack size with $parameter" ' 56 56 test_file_size out 57 57 ' 58 - } 59 58 60 - for version in 1 2 61 - do 62 - export version 63 - 64 - test_all_with_args --name-hash-version=$version 65 - 66 - test_perf "repack with --name-hash-version=$version" ' 67 - git repack -adf --name-hash-version=$version 59 + test_perf "repack with $parameter" ' 60 + git repack -adf $parameter 68 61 ' 69 62 70 - test_size "repack size with --name-hash-version=$version" ' 63 + test_size "repack size with $parameter" ' 71 64 gitdir=$(git rev-parse --git-dir) && 72 65 pack=$(ls $gitdir/objects/pack/pack-*.pack) && 73 66 test_file_size "$pack" 74 67 ' 68 + } 69 + 70 + for version in 1 2 71 + do 72 + test_all_with_args --name-hash-version=$version 75 73 done 76 74 77 75 test_all_with_args --path-walk