Git fork

Merge branch 'jk/fewer-pack-rescan'

Internaly we use 0{40} as a placeholder object name to signal the
codepath that there is no such object (e.g. the fast-forward check
while "git fetch" stores a new remote-tracking ref says "we know
there is no 'old' thing pointed at by the ref, as we are creating
it anew" by passing 0{40} for the 'old' side), and expect that a
codepath to locate an in-core object to return NULL as a sign that
the object does not exist. A look-up for an object that does not
exist however is quite costly with a repository with large number
of packfiles. This access pattern has been optimized.

* jk/fewer-pack-rescan:
sha1_file: fast-path null sha1 as a missing object
everything_local: use "quick" object existence check
p5551: add a script to test fetch pack-dir rescans
t/perf/lib-pack: use fast-import checkpoint to create packs
p5550: factor out nonsense-pack creation

+87 -24
+2 -1
fetch-pack.c
··· 716 716 for (ref = *refs; ref; ref = ref->next) { 717 717 struct object *o; 718 718 719 - if (!has_object_file(&ref->old_oid)) 719 + if (!has_object_file_with_flags(&ref->old_oid, 720 + OBJECT_INFO_QUICK)) 720 721 continue; 721 722 722 723 o = parse_object(&ref->old_oid);
+3
sha1_file.c
··· 1164 1164 lookup_replace_object(sha1) : 1165 1165 sha1; 1166 1166 1167 + if (is_null_sha1(real)) 1168 + return -1; 1169 + 1167 1170 if (!oi) 1168 1171 oi = &blank_oi; 1169 1172
+25
t/perf/lib-pack.sh
··· 1 + # Helpers for dealing with large numbers of packs. 2 + 3 + # create $1 nonsense packs, each with a single blob 4 + create_packs () { 5 + perl -le ' 6 + my ($n) = @ARGV; 7 + for (1..$n) { 8 + print "blob"; 9 + print "data <<EOF"; 10 + print "$_"; 11 + print "EOF"; 12 + print "checkpoint" 13 + } 14 + ' "$@" | 15 + git fast-import 16 + } 17 + 18 + # create a large number of packs, disabling any gc which might 19 + # cause us to repack them 20 + setup_many_packs () { 21 + git config gc.auto 0 && 22 + git config gc.autopacklimit 0 && 23 + git config fastimport.unpacklimit 0 && 24 + create_packs 500 25 + }
+2 -23
t/perf/p5550-fetch-tags.sh
··· 20 20 taking too long to set up and run the tests. 21 21 ' 22 22 . ./perf-lib.sh 23 + . "$TEST_DIRECTORY/perf/lib-pack.sh" 23 24 24 25 # make a long nonsense history on branch $1, consisting of $2 commits, each 25 26 # with a unique file pointing to the blob at $2. ··· 44 45 git update-ref --stdin 45 46 } 46 47 47 - # create $1 nonsense packs, each with a single blob 48 - create_packs () { 49 - perl -le ' 50 - my ($n) = @ARGV; 51 - for (1..$n) { 52 - print "blob"; 53 - print "data <<EOF"; 54 - print "$_"; 55 - print "EOF"; 56 - } 57 - ' "$@" | 58 - git fast-import && 59 - 60 - git cat-file --batch-all-objects --batch-check='%(objectname)' | 61 - while read sha1 62 - do 63 - echo $sha1 | git pack-objects .git/objects/pack/pack 64 - done 65 - } 66 - 67 48 test_expect_success 'create parent and child' ' 68 49 git init parent && 69 50 git -C parent commit --allow-empty -m base && ··· 84 65 test_expect_success 'create child packs' ' 85 66 ( 86 67 cd child && 87 - git config gc.auto 0 && 88 - git config gc.autopacklimit 0 && 89 - create_packs 500 68 + setup_many_packs 90 69 ) 91 70 ' 92 71
+55
t/perf/p5551-fetch-rescan.sh
··· 1 + #!/bin/sh 2 + 3 + test_description='fetch performance with many packs 4 + 5 + It is common for fetch to consider objects that we might not have, and it is an 6 + easy mistake for the code to use a function like `parse_object` that might 7 + give the correct _answer_ on such an object, but do so slowly (due to 8 + re-scanning the pack directory for lookup failures). 9 + 10 + The resulting performance drop can be hard to notice in a real repository, but 11 + becomes quite large in a repository with a large number of packs. So this 12 + test creates a more pathological case, since any mistakes would produce a more 13 + noticeable slowdown. 14 + ' 15 + . ./perf-lib.sh 16 + . "$TEST_DIRECTORY"/perf/lib-pack.sh 17 + 18 + test_expect_success 'create parent and child' ' 19 + git init parent && 20 + git clone parent child 21 + ' 22 + 23 + 24 + test_expect_success 'create refs in the parent' ' 25 + ( 26 + cd parent && 27 + git commit --allow-empty -m foo && 28 + head=$(git rev-parse HEAD) && 29 + test_seq 1000 | 30 + sed "s,.*,update refs/heads/& $head," | 31 + $MODERN_GIT update-ref --stdin 32 + ) 33 + ' 34 + 35 + test_expect_success 'create many packs in the child' ' 36 + ( 37 + cd child && 38 + setup_many_packs 39 + ) 40 + ' 41 + 42 + test_perf 'fetch' ' 43 + # start at the same state for each iteration 44 + obj=$($MODERN_GIT -C parent rev-parse HEAD) && 45 + ( 46 + cd child && 47 + $MODERN_GIT for-each-ref --format="delete %(refname)" refs/remotes | 48 + $MODERN_GIT update-ref --stdin && 49 + rm -vf .git/objects/$(echo $obj | sed "s|^..|&/|") && 50 + 51 + git fetch 52 + ) 53 + ' 54 + 55 + test_done