Git fork

Merge branch 'bc/sha1-256-interop-01'

The beginning of SHA1-SHA256 interoperability work.

* bc/sha1-256-interop-01:
t1010: use BROKEN_OBJECTS prerequisite
t: allow specifying compatibility hash
fsck: consider gpgsig headers expected in tags
rev-parse: allow printing compatibility hash
docs: add documentation for loose objects
docs: improve ambiguous areas of pack format documentation
docs: reflect actual double signature for tags
docs: update offset order for pack index v3
docs: update pack index v3 format

+255 -32
+1
Documentation/Makefile
··· 34 34 MAN5_TXT += gitformat-chunk.adoc 35 35 MAN5_TXT += gitformat-commit-graph.adoc 36 36 MAN5_TXT += gitformat-index.adoc 37 + MAN5_TXT += gitformat-loose.adoc 37 38 MAN5_TXT += gitformat-pack.adoc 38 39 MAN5_TXT += gitformat-signature.adoc 39 40 MAN5_TXT += githooks.adoc
+6
Documentation/fsck-msgids.adoc
··· 10 10 `badFilemode`:: 11 11 (INFO) A tree contains a bad filemode entry. 12 12 13 + `badGpgsig`:: 14 + (ERROR) A tag contains a bad (truncated) signature (e.g., `gpgsig`) header. 15 + 16 + `badHeaderContinuation`:: 17 + (ERROR) A continuation header (such as for `gpgsig`) is unexpectedly truncated. 18 + 13 19 `badName`:: 14 20 (ERROR) An author/committer name is empty. 15 21
+6 -5
Documentation/git-rev-parse.adoc
··· 324 324 path of the current directory relative to the top-level 325 325 directory. 326 326 327 - --show-object-format[=(storage|input|output)]:: 328 - Show the object format (hash algorithm) used for the repository 329 - for storage inside the `.git` directory, input, or output. For 330 - input, multiple algorithms may be printed, space-separated. 331 - If not specified, the default is "storage". 327 + --show-object-format[=(storage|input|output|compat)]:: 328 + Show the object format (hash algorithm) used for the repository for storage 329 + inside the `.git` directory, input, output, or compatibility. For input, 330 + multiple algorithms may be printed, space-separated. If `compat` is 331 + requested and no compatibility algorithm is enabled, prints an empty line. If 332 + not specified, the default is "storage". 332 333 333 334 --show-ref-format:: 334 335 Show the reference storage format used for the repository.
+53
Documentation/gitformat-loose.adoc
··· 1 + gitformat-loose(5) 2 + ================== 3 + 4 + NAME 5 + ---- 6 + gitformat-loose - Git loose object format 7 + 8 + 9 + SYNOPSIS 10 + -------- 11 + [verse] 12 + $GIT_DIR/objects/[0-9a-f][0-9a-f]/* 13 + 14 + DESCRIPTION 15 + ----------- 16 + 17 + Loose objects are how Git stores individual objects, where every object is 18 + written as a separate file. 19 + 20 + Over the lifetime of a repository, objects are usually written as loose objects 21 + initially. Eventually, these loose objects will be compacted into packfiles 22 + via repository maintenance to improve disk space usage and speed up the lookup 23 + of these objects. 24 + 25 + == Loose objects 26 + 27 + Each loose object contains a prefix, followed immediately by the data of the 28 + object. The prefix contains `<type> <size>\0`. `<type>` is one of `blob`, 29 + `tree`, `commit`, or `tag` and `size` is the size of the data (without the 30 + prefix) as a decimal integer expressed in ASCII. 31 + 32 + The entire contents, prefix and data concatenated, is then compressed with zlib 33 + and the compressed data is stored in the file. The object ID of the object is 34 + the SHA-1 or SHA-256 (as appropriate) hash of the uncompressed data. 35 + 36 + The file for the loose object is stored under the `objects` directory, with the 37 + first two hex characters of the object ID being the directory and the remaining 38 + characters being the file name. This is done to shard the data and avoid too 39 + many files being in one directory, since some file systems perform poorly with 40 + many items in a directory. 41 + 42 + As an example, the empty tree contains the data (when uncompressed) `tree 0\0` 43 + and, in a SHA-256 repository, would have the object ID 44 + `6ef19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321` and would be 45 + stored under 46 + `$GIT_DIR/objects/6e/f19b41225c5369f1c104d45d8d85efa9b057b53b14b4b9b939dd74decc5321`. 47 + 48 + Similarly, a blob containing the contents `abc` would have the uncompressed 49 + data of `blob 3\0abc`. 50 + 51 + GIT 52 + --- 53 + Part of the linkgit:git[1] suite
+19
Documentation/gitformat-pack.adoc
··· 32 32 and object IDs (object names) mentioned below are all computed using SHA-1. 33 33 Similarly, in SHA-256 repositories, these values are computed using SHA-256. 34 34 35 + CRC32 checksums are always computed over the entire packed object, including 36 + the header (n-byte type and length); the base object name or offset, if any; 37 + and the entire compressed object. The CRC32 algorithm used is that of zlib. 38 + 35 39 == pack-*.pack files have the following format: 36 40 37 41 - A header appears at the beginning and consists of the following: ··· 80 84 81 85 Type 5 is reserved for future expansion. Type 0 is invalid. 82 86 87 + === Object encoding 88 + 89 + Unlike loose objects, packed objects do not have a prefix containing the type, 90 + size, and a NUL byte. These are not necessary because they can be determined by 91 + the n-byte type and length that prefixes the data and so they are omitted from 92 + the compressed and deltified data. 93 + 94 + The computation of the object ID still uses this prefix by reconstructing it 95 + from the type and length as needed. 96 + 83 97 === Size encoding 84 98 85 99 This document uses the following "size encoding" of non-negative ··· 91 105 92 106 This size encoding should not be confused with the "offset encoding", 93 107 which is also used in this document. 108 + 109 + When encoding the size of an undeltified object in a pack, the size is that of 110 + the uncompressed raw object. For deltified objects, it is the size of the 111 + uncompressed delta. The base object name or offset is not included in the size 112 + computation. 94 113 95 114 === Deltified representation 96 115
+1
Documentation/meson.build
··· 173 173 'gitformat-chunk.adoc' : 5, 174 174 'gitformat-commit-graph.adoc' : 5, 175 175 'gitformat-index.adoc' : 5, 176 + 'gitformat-loose.adoc' : 5, 176 177 'gitformat-pack.adoc' : 5, 177 178 'gitformat-signature.adoc' : 5, 178 179 'githooks.adoc' : 5,
+23 -19
Documentation/technical/hash-function-transition.adoc
··· 227 227 ** 4-byte length in bytes of shortened object names. This is the 228 228 shortest possible length needed to make names in the shortened 229 229 object name table unambiguous. 230 - ** 4-byte integer, recording where tables relating to this format 230 + ** 8-byte integer, recording where tables relating to this format 231 231 are stored in this index file, as an offset from the beginning. 232 - * 4-byte offset to the trailer from the beginning of this file. 232 + * 8-byte offset to the trailer from the beginning of this file. 233 233 * Zero or more additional key/value pairs (4-byte key, 4-byte 234 234 value). Only one key is supported: 'PSRC'. See the "Loose objects 235 235 and unreachable objects" section for supported values and how this ··· 260 260 compressed data to be copied directly from pack to pack during 261 261 repacking without undetected data corruption. 262 262 263 - * A table of 4-byte offset values. For an object in the table of 264 - sorted shortened object names, the value at the corresponding 265 - index in this table indicates where that object can be found in 266 - the pack file. These are usually 31-bit pack file offsets, but 267 - large offsets are encoded as an index into the next table with the 268 - most significant bit set. 263 + * A table of 4-byte offset values. The index of this table in pack order 264 + indicates where that object can be found in the pack file. These are 265 + usually 31-bit pack file offsets, but large offsets are encoded as 266 + an index into the next table with the most significant bit set. 269 267 270 268 * A table of 8-byte offset entries (empty for pack files less than 271 269 2 GiB). Pack files are organized with heavily used objects toward ··· 276 274 up to and not including the table of CRC32 values. 277 275 - Zero or more NUL bytes. 278 276 - The trailer consists of the following: 279 - * A copy of the 20-byte SHA-256 checksum at the end of the 277 + * A copy of the full main hash checksum at the end of the 280 278 corresponding packfile. 281 279 282 - * 20-byte SHA-256 checksum of all of the above. 280 + * Full main hash checksum of all of the above. 281 + 282 + The "full main hash" is a full-length hash of the main (not compatibility) 283 + algorithm in the repository. Thus, if the main algorithm is SHA-256, this is 284 + a 32-byte SHA-256 hash and for SHA-1, it's a 20-byte SHA-1 hash. 283 285 284 286 Loose object index 285 287 ~~~~~~~~~~~~~~~~~~ ··· 427 429 428 430 Signed Tags 429 431 ~~~~~~~~~~~ 430 - We add a new field "gpgsig-sha256" to the tag object format to allow 431 - signing tags without relying on SHA-1. Its signed payload is the 432 - SHA-256 content of the tag with its gpgsig-sha256 field and "-----BEGIN PGP 433 - SIGNATURE-----" delimited in-body signature removed. 432 + We add new fields "gpgsig" and "gpgsig-sha256" to the tag object format to 433 + allow signing tags in both formats. The in-body signature is used for the 434 + signature in the current hash algorithm and the header is used for the 435 + signature in the other algorithm. Thus, a dual-signature tag will contain both 436 + an in-body signature and a gpgsig-sha256 header for the SHA-1 format of an 437 + object or both an in-body signature and a gpgsig header for the SHA-256 format 438 + of and object. 434 439 435 - This means tags can be signed 440 + The signed payload of the tag is the content of the tag in the current 441 + algorithm with both its gpgsig and gpgsig-sha256 fields and 442 + "-----BEGIN PGP SIGNATURE-----" delimited in-body signature removed. 436 443 437 - 1. using SHA-1 only, as in existing signed tag objects 438 - 2. using both SHA-1 and SHA-256, by using gpgsig-sha256 and an in-body 439 - signature. 440 - 3. using only SHA-256, by only using the gpgsig-sha256 field. 444 + This means tags can be signed using one or both algorithms. 441 445 442 446 Mergetag embedding 443 447 ~~~~~~~~~~~~~~~~~~
+10 -1
builtin/rev-parse.c
··· 1107 1107 const char *val = arg ? arg : "storage"; 1108 1108 1109 1109 if (strcmp(val, "storage") && 1110 + strcmp(val, "compat") && 1110 1111 strcmp(val, "input") && 1111 1112 strcmp(val, "output")) 1112 1113 die(_("unknown mode for --show-object-format: %s"), 1113 1114 arg); 1114 - puts(the_hash_algo->name); 1115 + 1116 + if (!strcmp(val, "compat")) { 1117 + if (the_repository->compat_hash_algo) 1118 + puts(the_repository->compat_hash_algo->name); 1119 + else 1120 + putchar('\n'); 1121 + } else { 1122 + puts(the_hash_algo->name); 1123 + } 1115 1124 continue; 1116 1125 } 1117 1126 if (!strcmp(arg, "--show-ref-format")) {
+18
fsck.c
··· 1067 1067 else 1068 1068 ret = fsck_ident(&buffer, oid, OBJ_TAG, options); 1069 1069 1070 + if (buffer < buffer_end && (skip_prefix(buffer, "gpgsig ", &buffer) || skip_prefix(buffer, "gpgsig-sha256 ", &buffer))) { 1071 + eol = memchr(buffer, '\n', buffer_end - buffer); 1072 + if (!eol) { 1073 + ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_GPGSIG, "invalid format - unexpected end after 'gpgsig' or 'gpgsig-sha256' line"); 1074 + goto done; 1075 + } 1076 + buffer = eol + 1; 1077 + 1078 + while (buffer < buffer_end && starts_with(buffer, " ")) { 1079 + eol = memchr(buffer, '\n', buffer_end - buffer); 1080 + if (!eol) { 1081 + ret = report(options, oid, OBJ_TAG, FSCK_MSG_BAD_HEADER_CONTINUATION, "invalid format - unexpected end in 'gpgsig' or 'gpgsig-sha256' continuation line"); 1082 + goto done; 1083 + } 1084 + buffer = eol + 1; 1085 + } 1086 + } 1087 + 1070 1088 if (buffer < buffer_end && !starts_with(buffer, "\n")) { 1071 1089 /* 1072 1090 * The verify_headers() check will allow
+2
fsck.h
··· 25 25 FUNC(NUL_IN_HEADER, FATAL) \ 26 26 FUNC(UNTERMINATED_HEADER, FATAL) \ 27 27 /* errors */ \ 28 + FUNC(BAD_HEADER_CONTINUATION, ERROR) \ 28 29 FUNC(BAD_DATE, ERROR) \ 29 30 FUNC(BAD_DATE_OVERFLOW, ERROR) \ 30 31 FUNC(BAD_EMAIL, ERROR) \ 32 + FUNC(BAD_GPGSIG, ERROR) \ 31 33 FUNC(BAD_NAME, ERROR) \ 32 34 FUNC(BAD_OBJECT_SHA1, ERROR) \ 33 35 FUNC(BAD_PACKED_REF_ENTRY, ERROR) \
+8 -5
t/t1010-mktree.sh
··· 11 11 git add "$d" || return 1 12 12 done && 13 13 echo zero >one && 14 - git update-index --add --info-only one && 15 - git write-tree --missing-ok >tree.missing && 16 - git ls-tree $(cat tree.missing) >top.missing && 17 - git ls-tree -r $(cat tree.missing) >all.missing && 14 + if test_have_prereq BROKEN_OBJECTS 15 + then 16 + git update-index --add --info-only one && 17 + git write-tree --missing-ok >tree.missing && 18 + git ls-tree $(cat tree.missing) >top.missing && 19 + git ls-tree -r $(cat tree.missing) >all.missing 20 + fi && 18 21 echo one >one && 19 22 git add one && 20 23 git write-tree >tree && ··· 53 56 test_cmp tree.withsub actual 54 57 ' 55 58 56 - test_expect_success 'allow missing object with --missing' ' 59 + test_expect_success BROKEN_OBJECTS 'allow missing object with --missing' ' 57 60 git mktree --missing <top.missing >actual && 58 61 test_cmp tree.missing actual 59 62 '
+54
t/t1450-fsck.sh
··· 454 454 test_grep "error in tag $tag.*unterminated header: NUL at offset" out 455 455 ' 456 456 457 + test_expect_success 'tag accepts gpgsig header even if not validly signed' ' 458 + test_oid_cache <<-\EOF && 459 + header sha1:gpgsig-sha256 460 + header sha256:gpgsig 461 + EOF 462 + header=$(test_oid header) && 463 + sha=$(git rev-parse HEAD) && 464 + cat >good-tag <<-EOF && 465 + object $sha 466 + type commit 467 + tag good 468 + tagger T A Gger <tagger@example.com> 1234567890 -0000 469 + $header -----BEGIN PGP SIGNATURE----- 470 + Not a valid signature 471 + -----END PGP SIGNATURE----- 472 + 473 + This is a good tag. 474 + EOF 475 + 476 + tag=$(git hash-object --literally -t tag -w --stdin <good-tag) && 477 + test_when_finished "remove_object $tag" && 478 + git update-ref refs/tags/good $tag && 479 + test_when_finished "git update-ref -d refs/tags/good" && 480 + git -c fsck.extraHeaderEntry=error fsck --tags 481 + ' 482 + 483 + test_expect_success 'tag rejects invalid headers' ' 484 + test_oid_cache <<-\EOF && 485 + header sha1:gpgsig-sha256 486 + header sha256:gpgsig 487 + EOF 488 + header=$(test_oid header) && 489 + sha=$(git rev-parse HEAD) && 490 + cat >bad-tag <<-EOF && 491 + object $sha 492 + type commit 493 + tag good 494 + tagger T A Gger <tagger@example.com> 1234567890 -0000 495 + $header -----BEGIN PGP SIGNATURE----- 496 + Not a valid signature 497 + -----END PGP SIGNATURE----- 498 + junk 499 + 500 + This is a bad tag with junk at the end of the headers. 501 + EOF 502 + 503 + tag=$(git hash-object --literally -t tag -w --stdin <bad-tag) && 504 + test_when_finished "remove_object $tag" && 505 + git update-ref refs/tags/bad $tag && 506 + test_when_finished "git update-ref -d refs/tags/bad" && 507 + test_must_fail git -c fsck.extraHeaderEntry=error fsck --tags 2>out && 508 + test_grep "error in tag $tag.*invalid format - extra header" out 509 + ' 510 + 457 511 test_expect_success 'cleaned up' ' 458 512 git fsck >actual 2>&1 && 459 513 test_must_be_empty actual
+34
t/t1500-rev-parse.sh
··· 207 207 grep "unknown mode for --show-object-format: squeamish-ossifrage" err 208 208 ' 209 209 210 + 211 + test_expect_success 'rev-parse --show-object-format in repo with compat mode' ' 212 + mkdir repo && 213 + ( 214 + sane_unset GIT_DEFAULT_HASH && 215 + cd repo && 216 + git init --object-format=sha256 && 217 + git config extensions.compatobjectformat sha1 && 218 + echo sha256 >expect && 219 + git rev-parse --show-object-format >actual && 220 + test_cmp expect actual && 221 + git rev-parse --show-object-format=storage >actual && 222 + test_cmp expect actual && 223 + git rev-parse --show-object-format=input >actual && 224 + test_cmp expect actual && 225 + git rev-parse --show-object-format=output >actual && 226 + test_cmp expect actual && 227 + echo sha1 >expect && 228 + git rev-parse --show-object-format=compat >actual && 229 + test_cmp expect actual && 230 + test_must_fail git rev-parse --show-object-format=squeamish-ossifrage 2>err && 231 + grep "unknown mode for --show-object-format: squeamish-ossifrage" err 232 + ) && 233 + mkdir repo2 && 234 + ( 235 + sane_unset GIT_DEFAULT_HASH && 236 + cd repo2 && 237 + git init --object-format=sha256 && 238 + echo >expect && 239 + git rev-parse --show-object-format=compat >actual && 240 + test_cmp expect actual 241 + ) 242 + ' 243 + 210 244 test_expect_success 'rev-parse --show-ref-format' ' 211 245 test_detect_ref_format >expect && 212 246 git rev-parse --show-ref-format >actual &&
+7 -2
t/test-lib-functions.sh
··· 1708 1708 # Detect the hash algorithm in use. 1709 1709 test_detect_hash () { 1710 1710 case "${GIT_TEST_DEFAULT_HASH:-$GIT_TEST_BUILTIN_HASH}" in 1711 - "sha256") 1711 + *:*) 1712 + test_hash_algo="${GIT_TEST_DEFAULT_HASH%%:*}" 1713 + test_compat_hash_algo="${GIT_TEST_DEFAULT_HASH##*:}" 1714 + test_repo_compat_hash_algo="$test_compat_hash_algo" 1715 + ;; 1716 + sha256) 1712 1717 test_hash_algo=sha256 1713 1718 test_compat_hash_algo=sha1 1714 1719 ;; 1715 - *) 1720 + sha1) 1716 1721 test_hash_algo=sha1 1717 1722 test_compat_hash_algo=sha256 1718 1723 ;;
+13
t/test-lib.sh
··· 1924 1924 test_lazy_prereq DEFAULT_REPO_FORMAT ' 1925 1925 test_have_prereq SHA1,REFFILES 1926 1926 ' 1927 + # BROKEN_OBJECTS is a test whether we can write deliberately broken objects and 1928 + # expect them to work. When running using SHA-256 mode with SHA-1 1929 + # compatibility, we cannot write such objects because there's no SHA-1 1930 + # compatibility value for a nonexistent object. 1931 + test_lazy_prereq BROKEN_OBJECTS ' 1932 + ! test_have_prereq COMPAT_HASH 1933 + ' 1934 + 1935 + # COMPAT_HASH is a test if we're operating in a repository with SHA-256 with 1936 + # SHA-1 compatibility. 1937 + test_lazy_prereq COMPAT_HASH ' 1938 + test -n "$test_repo_compat_hash_algo" 1939 + ' 1927 1940 1928 1941 # Ensure that no test accidentally triggers a Git command 1929 1942 # that runs the actual maintenance scheduler, affecting a user's