Git fork

prune: use bitmaps for reachability traversal

Pruning generally has to traverse the whole commit graph in order to
see which objects are reachable. This is the exact problem that
reachability bitmaps were meant to solve, so let's use them (if they're
available, of course).

Here are timings on git.git:

Test HEAD^ HEAD
------------------------------------------------------------------------
5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3%

And on linux.git:

Test HEAD^ HEAD
--------------------------------------------------------------------------
5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4%

The tests show a pretty optimal case, as we'll have just repacked and
should have pretty good coverage of all refs with our bitmaps. But
that's actually pretty realistic: normally prune is run via "gc" right
after repacking.

A few notes on the implementation:

- the change is actually in reachable.c, so it would improve
reachability traversals by "reflog expire --stale-fix", as well.
Those aren't performed regularly, though (a normal "git gc" doesn't
use --stale-fix), so they're not really worth measuring. There's a
low chance of regressing that caller, since the use of bitmaps is
totally transparent from the caller's perspective.

- The bitmap case could actually get away without creating a "struct
object", and instead the caller could just look up each object id in
the bitmap result. However, this would be a marginal improvement in
runtime, and it would make the callers much more complicated. They'd
have to handle both the bitmap and non-bitmap cases separately, and
in the case of git-prune, we'd also have to tweak prune_shallow(),
which relies on our SEEN flags.

- Because we do create real object structs, we go through a few
contortions to create ones of the right type. This isn't strictly
necessary (lookup_unknown_object() would suffice), but it's more
memory efficient to use the correct types, since we already know
them.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Jeff King and committed by
Junio C Hamano
fde67d68 d55a30bb

+53
+42
reachable.c
··· 12 12 #include "packfile.h" 13 13 #include "worktree.h" 14 14 #include "object-store.h" 15 + #include "pack-bitmap.h" 15 16 16 17 struct connectivity_progress { 17 18 struct progress *progress; ··· 158 159 FOR_EACH_OBJECT_LOCAL_ONLY); 159 160 } 160 161 162 + static void *lookup_object_by_type(struct repository *r, 163 + const struct object_id *oid, 164 + enum object_type type) 165 + { 166 + switch (type) { 167 + case OBJ_COMMIT: 168 + return lookup_commit(r, oid); 169 + case OBJ_TREE: 170 + return lookup_tree(r, oid); 171 + case OBJ_TAG: 172 + return lookup_tag(r, oid); 173 + case OBJ_BLOB: 174 + return lookup_blob(r, oid); 175 + default: 176 + die("BUG: unknown object type %d", type); 177 + } 178 + } 179 + 180 + static int mark_object_seen(const struct object_id *oid, 181 + enum object_type type, 182 + int exclude, 183 + uint32_t name_hash, 184 + struct packed_git *found_pack, 185 + off_t found_offset) 186 + { 187 + struct object *obj = lookup_object_by_type(the_repository, oid, type); 188 + if (!obj) 189 + die("unable to create object '%s'", oid_to_hex(oid)); 190 + 191 + obj->flags |= SEEN; 192 + return 0; 193 + } 194 + 161 195 void mark_reachable_objects(struct rev_info *revs, int mark_reflog, 162 196 timestamp_t mark_recent, struct progress *progress) 163 197 { 164 198 struct connectivity_progress cp; 199 + struct bitmap_index *bitmap_git; 165 200 166 201 /* 167 202 * Set up revision parsing, and mark us as being interested ··· 187 222 188 223 cp.progress = progress; 189 224 cp.count = 0; 225 + 226 + bitmap_git = prepare_bitmap_walk(revs); 227 + if (bitmap_git) { 228 + traverse_bitmap_commit_list(bitmap_git, mark_object_seen); 229 + free_bitmap_index(bitmap_git); 230 + return; 231 + } 190 232 191 233 /* 192 234 * Set up the revision walk - this will move all commits
+11
t/perf/p5304-prune.sh
··· 21 21 git prune 22 22 ' 23 23 24 + test_expect_success 'repack with bitmaps' ' 25 + git repack -adb 26 + ' 27 + 28 + # We have to create the object in each trial run, since otherwise 29 + # runs after the first see no object and just skip the traversal entirely! 30 + test_perf 'prune with bitmaps' ' 31 + echo "probably not present in repo" | git hash-object -w --stdin && 32 + git prune 33 + ' 34 + 24 35 test_done