Documentation/gitformat-pack.adoc at reftables-rust

freshlybakedca.ke / git
fork atom
Git fork
fork atom
git / Documentation / gitformat-pack.adoc
at reftables-rust 700 lines 27 kB view raw
wrap content
brian m. carlson docs: improve ambiguous areas of pack format documentation 5mo ago
24d46f86
  1gitformat-pack(5)
  2=================
  3
  4NAME
  5----
  6gitformat-pack - Git pack format
  7
  8
  9SYNOPSIS
 10--------
 11[verse]
 12$GIT_DIR/objects/pack/pack-*.{pack,idx}
 13$GIT_DIR/objects/pack/pack-*.rev
 14$GIT_DIR/objects/pack/pack-*.mtimes
 15$GIT_DIR/objects/pack/multi-pack-index
 16
 17DESCRIPTION
 18-----------
 19
 20The Git pack format is how Git stores most of its primary repository
 21data. Over the lifetime of a repository, loose objects (if any) and
 22smaller packs are consolidated into larger pack(s). See
 23linkgit:git-gc[1] and linkgit:git-pack-objects[1].
 24
 25The pack format is also used over-the-wire, see
 26e.g. linkgit:gitprotocol-v2[5], as well as being a part of
 27other container formats in the case of linkgit:gitformat-bundle[5].
 28
 29== Checksums and object IDs
 30
 31In a repository using the traditional SHA-1, pack checksums, index checksums,
 32and object IDs (object names) mentioned below are all computed using SHA-1.
 33Similarly, in SHA-256 repositories, these values are computed using SHA-256.
 34
 35CRC32 checksums are always computed over the entire packed object, including
 36the header (n-byte type and length); the base object name or offset, if any;
 37and the entire compressed object.  The CRC32 algorithm used is that of zlib.
 38
 39== pack-*.pack files have the following format:
 40
 41   - A header appears at the beginning and consists of the following:
 42
 43     4-byte signature:
 44         The signature is: {'P', 'A', 'C', 'K'}
 45
 46     4-byte version number (network byte order):
 47	 Git currently accepts version number 2 or 3 but
 48         generates version 2 only.
 49
 50     4-byte number of objects contained in the pack (network byte order)
 51
 52     Observation: we cannot have more than 4G versions ;-) and
 53     more than 4G objects in a pack.
 54
 55   - The header is followed by a number of object entries, each of
 56     which looks like this:
 57
 58     (undeltified representation)
 59     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
 60     compressed data
 61
 62     (deltified representation)
 63     n-byte type and length (3-bit type, (n-1)*7+4-bit length)
 64     base object name if OBJ_REF_DELTA or a negative relative
 65	 offset from the delta object's position in the pack if this
 66	 is an OBJ_OFS_DELTA object
 67     compressed delta data
 68
 69     Observation: the length of each object is encoded in a variable
 70     length format and is not constrained to 32-bit or anything.
 71
 72  - The trailer records a pack checksum of all of the above.
 73
 74=== Object types
 75
 76Valid object types are:
 77
 78- OBJ_COMMIT (1)
 79- OBJ_TREE (2)
 80- OBJ_BLOB (3)
 81- OBJ_TAG (4)
 82- OBJ_OFS_DELTA (6)
 83- OBJ_REF_DELTA (7)
 84
 85Type 5 is reserved for future expansion. Type 0 is invalid.
 86
 87=== Object encoding
 88
 89Unlike loose objects, packed objects do not have a prefix containing the type,
 90size, and a NUL byte. These are not necessary because they can be determined by
 91the n-byte type and length that prefixes the data and so they are omitted from
 92the compressed and deltified data.
 93
 94The computation of the object ID still uses this prefix by reconstructing it
 95from the type and length as needed.
 96
 97=== Size encoding
 98
 99This document uses the following "size encoding" of non-negative
100integers: From each byte, the seven least significant bits are
101used to form the resulting integer. As long as the most significant
102bit is 1, this process continues; the byte with MSB 0 provides the
103last seven bits.  The seven-bit chunks are concatenated. Later
104values are more significant.
105
106This size encoding should not be confused with the "offset encoding",
107which is also used in this document.
108
109When encoding the size of an undeltified object in a pack, the size is that of
110the uncompressed raw object. For deltified objects, it is the size of the
111uncompressed delta.  The base object name or offset is not included in the size
112computation.
113
114=== Deltified representation
115
116Conceptually there are only four object types: commit, tree, tag and
117blob. However to save space, an object could be stored as a "delta" of
118another "base" object. These representations are assigned new types
119ofs-delta and ref-delta, which is only valid in a pack file.
120
121Both ofs-delta and ref-delta store the "delta" to be applied to
122another object (called 'base object') to reconstruct the object. The
123difference between them is, ref-delta directly encodes base object
124name. If the base object is in the same pack, ofs-delta encodes
125the offset of the base object in the pack instead.
126
127The base object could also be deltified if it's in the same pack.
128Ref-delta can also refer to an object outside the pack (i.e. the
129so-called "thin pack"). When stored on disk however, the pack should
130be self contained to avoid cyclic dependency.
131
132The delta data starts with the size of the base object and the
133size of the object to be reconstructed. These sizes are
134encoded using the size encoding from above.  The remainder of
135the delta data is a sequence of instructions to reconstruct the object
136from the base object. If the base object is deltified, it must be
137converted to canonical form first. Each instruction appends more and
138more data to the target object until it's complete. There are two
139supported instructions so far: one for copying a byte range from the
140source object and one for inserting new data embedded in the
141instruction itself.
142
143Each instruction has variable length. Instruction type is determined
144by the seventh bit of the first octet. The following diagrams follow
145the convention in RFC 1951 (Deflate compressed data format).
146
147==== Instruction to copy from base object
148
149  +----------+---------+---------+---------+---------+-------+-------+-------+
150  | 1xxxxxxx | offset1 | offset2 | offset3 | offset4 | size1 | size2 | size3 |
151  +----------+---------+---------+---------+---------+-------+-------+-------+
152
153This is the instruction format to copy a byte range from the source
154object. It encodes the offset to copy from and the number of bytes to
155copy. Offset and size are in little-endian order.
156
157All offset and size bytes are optional. This is to reduce the
158instruction size when encoding small offsets or sizes. The first seven
159bits in the first octet determine which of the next seven octets is
160present. If bit zero is set, offset1 is present. If bit one is set
161offset2 is present and so on.
162
163Note that a more compact instruction does not change offset and size
164encoding. For example, if only offset2 is omitted like below, offset3
165still contains bits 16-23. It does not become offset2 and contains
166bits 8-15 even if it's right next to offset1.
167
168  +----------+---------+---------+
169  | 10000101 | offset1 | offset3 |
170  +----------+---------+---------+
171
172In its most compact form, this instruction only takes up one byte
173(0x80) with both offset and size omitted, which will have default
174values zero. There is another exception: size zero is automatically
175converted to 0x10000.
176
177==== Instruction to add new data
178
179  +----------+============+
180  | 0xxxxxxx |    data    |
181  +----------+============+
182
183This is the instruction to construct the target object without the base
184object. The following data is appended to the target object. The first
185seven bits of the first octet determine the size of data in
186bytes. The size must be non-zero.
187
188==== Reserved instruction
189
190  +----------+============
191  | 00000000 |
192  +----------+============
193
194This is the instruction reserved for future expansion.
195
196== Original (version 1) pack-*.idx files have the following format:
197
198  - The header consists of 256 4-byte network byte order
199    integers.  N-th entry of this table records the number of
200    objects in the corresponding pack, the first byte of whose
201    object name is less than or equal to N.  This is called the
202    'first-level fan-out' table.
203
204  - The header is followed by sorted 24-byte entries, one entry
205    per object in the pack.  Each entry is:
206
207    4-byte network byte order integer, recording where the
208    object is stored in the packfile as the offset from the
209    beginning.
210
211    one object name of the appropriate size.
212
213  - The file is concluded with a trailer:
214
215    A copy of the pack checksum at the end of the corresponding
216    packfile.
217
218    Index checksum of all of the above.
219
220Pack Idx file:
221
222	--  +--------------------------------+
223fanout	    | fanout[0] = 2 (for example)    |-.
224table	    +--------------------------------+ |
225	    | fanout[1]                      | |
226	    +--------------------------------+ |
227	    | fanout[2]                      | |
228	    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ |
229	    | fanout[255] = total objects    |---.
230	--  +--------------------------------+ | |
231main	    | offset                         | | |
232index	    | object name 00XXXXXXXXXXXXXXXX | | |
233table	    +--------------------------------+ | |
234	    | offset                         | | |
235	    | object name 00XXXXXXXXXXXXXXXX | | |
236	    +--------------------------------+<+ |
237	  .-| offset                         |   |
238	  | | object name 01XXXXXXXXXXXXXXXX |   |
239	  | +--------------------------------+   |
240	  | | offset                         |   |
241	  | | object name 01XXXXXXXXXXXXXXXX |   |
242	  | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~   |
243	  | | offset                         |   |
244	  | | object name FFXXXXXXXXXXXXXXXX |   |
245	--| +--------------------------------+<--+
246trailer	  | | packfile checksum              |
247	  | +--------------------------------+
248	  | | idxfile checksum               |
249	  | +--------------------------------+
250          .-------.
251                  |
252Pack file entry: <+
253
254     packed object header:
255	1-byte size extension bit (MSB)
256	       type (next 3 bit)
257	       size0 (lower 4-bit)
258        n-byte sizeN (as long as MSB is set, each 7-bit)
259		size0..sizeN form 4+7+7+..+7 bit integer, size0
260		is the least significant part, and sizeN is the
261		most significant part.
262     packed object data:
263        If it is not DELTA, then deflated bytes (the size above
264		is the size before compression).
265	If it is REF_DELTA, then
266	  base object name (the size above is the
267		size of the delta data that follows).
268          delta data, deflated.
269	If it is OFS_DELTA, then
270	  n-byte offset (see below) interpreted as a negative
271		offset from the type-byte of the header of the
272		ofs-delta entry (the size above is the size of
273		the delta data that follows).
274	  delta data, deflated.
275
276     offset encoding:
277	  n bytes with MSB set in all but the last one.
278	  The offset is then the number constructed by
279	  concatenating the lower 7 bit of each byte, and
280	  for n >= 2 adding 2^7 + 2^14 + ... + 2^(7*(n-1))
281	  to the result.
282
283
284
285== Version 2 pack-*.idx files support packs larger than 4 GiB, and
286   have some other reorganizations.  They have the format:
287
288  - A 4-byte magic number '\377tOc' which is an unreasonable
289    fanout[0] value.
290
291  - A 4-byte version number (= 2)
292
293  - A 256-entry fan-out table just like v1.
294
295  - A table of sorted object names.  These are packed together
296    without offset values to reduce the cache footprint of the
297    binary search for a specific object name.
298
299  - A table of 4-byte CRC32 values of the packed object data.
300    This is new in v2 so compressed data can be copied directly
301    from pack to pack during repacking without undetected
302    data corruption.
303
304  - A table of 4-byte offset values (in network byte order).
305    These are usually 31-bit pack file offsets, but large
306    offsets are encoded as an index into the next table with
307    the msbit set.
308
309  - A table of 8-byte offset entries (empty for pack files less
310    than 2 GiB).  Pack files are organized with heavily used
311    objects toward the front, so most object references should
312    not need to refer to this table.
313
314  - The same trailer as a v1 pack file:
315
316    A copy of the pack checksum at the end of the
317    corresponding packfile.
318
319    Index checksum of all of the above.
320
321== pack-*.rev files have the format:
322
323  - A 4-byte magic number '0x52494458' ('RIDX').
324
325  - A 4-byte version identifier (= 1).
326
327  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
328
329  - A table of index positions (one per packed object, num_objects in
330    total, each a 4-byte unsigned integer in network order), sorted by
331    their corresponding offsets in the packfile.
332
333  - A trailer, containing a:
334
335    checksum of the corresponding packfile, and
336
337    a checksum of all of the above.
338
339All 4-byte numbers are in network order.
340
341== pack-*.mtimes files have the format:
342
343All 4-byte numbers are in network byte order.
344
345  - A 4-byte magic number '0x4d544d45' ('MTME').
346
347  - A 4-byte version identifier (= 1).
348
349  - A 4-byte hash function identifier (= 1 for SHA-1, 2 for SHA-256).
350
351  - A table of 4-byte unsigned integers. The ith value is the
352    modification time (mtime) of the ith object in the corresponding
353    pack by lexicographic (index) order. The mtimes count standard
354    epoch seconds.
355
356  - A trailer, containing a checksum of the corresponding packfile,
357    and a checksum of all of the above (each having length according
358    to the specified hash function).
359
360== multi-pack-index (MIDX) files have the following format:
361
362The multi-pack-index files refer to multiple pack-files and loose objects.
363
364In order to allow extensions that add extra data to the MIDX, we organize
365the body into "chunks" and provide a lookup table at the beginning of the
366body. The header includes certain length values, such as the number of packs,
367the number of base MIDX files, hash lengths and types.
368
369All 4-byte numbers are in network order.
370
371HEADER:
372
373	4-byte signature:
374	    The signature is: {'M', 'I', 'D', 'X'}
375
376	1-byte version number:
377	    Git only writes or recognizes version 1.
378
379	1-byte Object Id Version
380	    We infer the length of object IDs (OIDs) from this value:
381		1 => SHA-1
382		2 => SHA-256
383	    If the hash type does not match the repository's hash algorithm,
384	    the multi-pack-index file should be ignored with a warning
385	    presented to the user.
386
387	1-byte number of "chunks"
388
389	1-byte number of base multi-pack-index files:
390	    This value is currently always zero.
391
392	4-byte number of pack files
393
394CHUNK LOOKUP:
395
396	(C + 1) * 12 bytes providing the chunk offsets:
397	    First 4 bytes describe chunk id. Value 0 is a terminating label.
398	    Other 8 bytes provide offset in current file for chunk to start.
399	    (Chunks are provided in file-order, so you can infer the length
400	    using the next chunk position if necessary.)
401
402	The CHUNK LOOKUP matches the table of contents from
403	the chunk-based file format, see linkgit:gitformat-chunk[5].
404
405	The remaining data in the body is described one chunk at a time, and
406	these chunks may be given in any order. Chunks are required unless
407	otherwise specified.
408
409CHUNK DATA:
410
411	Packfile Names (ID: {'P', 'N', 'A', 'M'})
412	    Store the names of packfiles as a sequence of NUL-terminated
413	    strings. There is no extra padding between the filenames,
414	    and they are listed in lexicographic order. The chunk itself
415	    is padded at the end with between 0 and 3 NUL bytes to make the
416	    chunk size a multiple of 4 bytes.
417
418	Bitmapped Packfiles (ID: {'B', 'T', 'M', 'P'})
419	    Stores a table of two 4-byte unsigned integers in network order.
420	    Each table entry corresponds to a single pack (in the order that
421	    they appear above in the `PNAM` chunk). The values for each table
422	    entry are as follows:
423	    - The first bit position (in pseudo-pack order, see below) to
424	      contain an object from that pack.
425	    - The number of bits whose objects are selected from that pack.
426
427	OID Fanout (ID: {'O', 'I', 'D', 'F'})
428	    The ith entry, F[i], stores the number of OIDs with first
429	    byte at most i. Thus F[255] stores the total
430	    number of objects.
431
432	OID Lookup (ID: {'O', 'I', 'D', 'L'})
433	    The OIDs for all objects in the MIDX are stored in lexicographic
434	    order in this chunk.
435
436	Object Offsets (ID: {'O', 'O', 'F', 'F'})
437	    Stores two 4-byte values for every object.
438	    1: The pack-int-id for the pack storing this object.
439	    2: The offset within the pack.
440		If all offsets are less than 2^32, then the large offset chunk
441		will not exist and offsets are stored as in IDX v1.
442		If there is at least one offset value larger than 2^32-1, then
443		the large offset chunk must exist, and offsets larger than
444		2^31-1 must be stored in it instead. If the large offset chunk
445		exists and the 31st bit is on, then removing that bit reveals
446		the row in the large offsets containing the 8-byte offset of
447		this object.
448
449	[Optional] Object Large Offsets (ID: {'L', 'O', 'F', 'F'})
450	    8-byte offsets into large packfiles.
451
452	[Optional] Bitmap pack order (ID: {'R', 'I', 'D', 'X'})
453	    A list of MIDX positions (one per object in the MIDX, num_objects in
454	    total, each a 4-byte unsigned integer in network byte order), sorted
455	    according to their relative bitmap/pseudo-pack positions.
456
457TRAILER:
458
459	Index checksum of the above contents.
460
461== multi-pack-index reverse indexes
462
463Similar to the pack-based reverse index, the multi-pack index can also
464be used to generate a reverse index.
465
466Instead of mapping between offset, pack-, and index position, this
467reverse index maps between an object's position within the MIDX, and
468that object's position within a pseudo-pack that the MIDX describes
469(i.e., the ith entry of the multi-pack reverse index holds the MIDX
470position of ith object in pseudo-pack order).
471
472To clarify the difference between these orderings, consider a multi-pack
473reachability bitmap (which does not yet exist, but is what we are
474building towards here). Each bit needs to correspond to an object in the
475MIDX, and so we need an efficient mapping from bit position to MIDX
476position.
477
478One solution is to let bits occupy the same position in the oid-sorted
479index stored by the MIDX. But because oids are effectively random, their
480resulting reachability bitmaps would have no locality, and thus compress
481poorly. (This is the reason that single-pack bitmaps use the pack
482ordering, and not the .idx ordering, for the same purpose.)
483
484So we'd like to define an ordering for the whole MIDX based around
485pack ordering, which has far better locality (and thus compresses more
486efficiently). We can think of a pseudo-pack created by the concatenation
487of all of the packs in the MIDX. E.g., if we had a MIDX with three packs
488(a, b, c), with 10, 15, and 20 objects respectively, we can imagine an
489ordering of the objects like:
490
491    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
492
493where the ordering of the packs is defined by the MIDX's pack list,
494and then the ordering of objects within each pack is the same as the
495order in the actual packfile.
496
497Given the list of packs and their counts of objects, you can
498naïvely reconstruct that pseudo-pack ordering (e.g., the object at
499position 27 must be (c,1) because packs "a" and "b" consumed 25 of the
500slots). But there's a catch. Objects may be duplicated between packs, in
501which case the MIDX only stores one pointer to the object (and thus we'd
502want only one slot in the bitmap).
503
504Callers could handle duplicates themselves by reading objects in order
505of their bit-position, but that's linear in the number of objects, and
506much too expensive for ordinary bitmap lookups. Building a reverse index
507solves this, since it is the logical inverse of the index, and that
508index has already removed duplicates. But, building a reverse index on
509the fly can be expensive. Since we already have an on-disk format for
510pack-based reverse indexes, let's reuse it for the MIDX's pseudo-pack,
511too.
512
513Objects from the MIDX are ordered as follows to string together the
514pseudo-pack. Let `pack(o)` return the pack from which `o` was selected
515by the MIDX, and define an ordering of packs based on their numeric ID
516(as stored by the MIDX). Let `offset(o)` return the object offset of `o`
517within `pack(o)`. Then, compare `o1` and `o2` as follows:
518
519  - If one of `pack(o1)` and `pack(o2)` is preferred and the other
520    is not, then the preferred one sorts first.
521+
522(This is a detail that allows the MIDX bitmap to determine which
523pack should be used by the pack-reuse mechanism, since it can ask
524the MIDX for the pack containing the object at bit position 0).
525
526  - If `pack(o1) ≠ pack(o2)`, then sort the two objects in descending
527    order based on the pack ID.
528
529  - Otherwise, `pack(o1) = pack(o2)`, and the objects are sorted in
530    pack-order (i.e., `o1` sorts ahead of `o2` exactly when `offset(o1)
531    < offset(o2)`).
532
533In short, a MIDX's pseudo-pack is the de-duplicated concatenation of
534objects in packs stored by the MIDX, laid out in pack order, and the
535packs arranged in MIDX order (with the preferred pack coming first).
536
537The MIDX's reverse index is stored in the optional 'RIDX' chunk within
538the MIDX itself.
539
540=== `BTMP` chunk
541
542The Bitmapped Packfiles (`BTMP`) chunk encodes additional information
543about the objects in the multi-pack index's reachability bitmap. Recall
544that objects from the MIDX are arranged in "pseudo-pack" order (see
545above) for reachability bitmaps.
546
547From the example above, suppose we have packs "a", "b", and "c", with
54810, 15, and 20 objects, respectively. In pseudo-pack order, those would
549be arranged as follows:
550
551    |a,0|a,1|...|a,9|b,0|b,1|...|b,14|c,0|c,1|...|c,19|
552
553When working with single-pack bitmaps (or, equivalently, multi-pack
554reachability bitmaps with a preferred pack), linkgit:git-pack-objects[1]
555performs ``verbatim'' reuse, attempting to reuse chunks of the bitmapped
556or preferred packfile instead of adding objects to the packing list.
557
558When a chunk of bytes is reused from an existing pack, any objects
559contained therein do not need to be added to the packing list, saving
560memory and CPU time. But a chunk from an existing packfile can only be
561reused when the following conditions are met:
562
563  - The chunk contains only objects which were requested by the caller
564    (i.e. does not contain any objects which the caller didn't ask for
565    explicitly or implicitly).
566
567  - All objects stored in non-thin packs as offset- or reference-deltas
568    also include their base object in the resulting pack.
569
570The `BTMP` chunk encodes the necessary information in order to implement
571multi-pack reuse over a set of packfiles as described above.
572Specifically, the `BTMP` chunk encodes three pieces of information (all
57332-bit unsigned integers in network byte-order) for each packfile `p`
574that is stored in the MIDX, as follows:
575
576`bitmap_pos`:: The first bit position (in pseudo-pack order) in the
577  multi-pack index's reachability bitmap occupied by an object from `p`.
578
579`bitmap_nr`:: The number of bit positions (including the one at
580  `bitmap_pos`) that encode objects from that pack `p`.
581
582For example, the `BTMP` chunk corresponding to the above example (with
583packs ``a'', ``b'', and ``c'') would look like:
584
585[cols="1,2,2"]
586|===
587| |`bitmap_pos` |`bitmap_nr`
588
589|packfile ``a''
590|`0`
591|`10`
592
593|packfile ``b''
594|`10`
595|`15`
596
597|packfile ``c''
598|`25`
599|`20`
600|===
601
602With this information in place, we can treat each packfile as
603individually reusable in the same fashion as verbatim pack reuse is
604performed on individual packs prior to the implementation of the `BTMP`
605chunk.
606
607== cruft packs
608
609The cruft packs feature offer an alternative to Git's traditional mechanism of
610removing unreachable objects. This document provides an overview of Git's
611pruning mechanism, and how a cruft pack can be used instead to accomplish the
612same.
613
614=== Background
615
616To remove unreachable objects from your repository, Git offers `git repack -Ad`
617(see linkgit:git-repack[1]). Quoting from the documentation:
618
619----
620[...] unreachable objects in a previous pack become loose, unpacked objects,
621instead of being left in the old pack. [...] loose unreachable objects will be
622pruned according to normal expiry rules with the next 'git gc' invocation.
623----
624
625Unreachable objects aren't removed immediately, since doing so could race with
626an incoming push which may reference an object which is about to be deleted.
627Instead, those unreachable objects are stored as loose objects and stay that way
628until they are older than the expiration window, at which point they are removed
629by linkgit:git-prune[1].
630
631Git must store these unreachable objects loose in order to keep track of their
632per-object mtimes. If these unreachable objects were written into one big pack,
633then either freshening that pack (because an object contained within it was
634re-written) or creating a new pack of unreachable objects would cause the pack's
635mtime to get updated, and the objects within it would never leave the expiration
636window. Instead, objects are stored loose in order to keep track of the
637individual object mtimes and avoid a situation where all cruft objects are
638freshened at once.
639
640This can lead to undesirable situations when a repository contains many
641unreachable objects which have not yet left the grace period. Having large
642directories in the shards of `.git/objects` can lead to decreased performance in
643the repository. But given enough unreachable objects, this can lead to inode
644starvation and degrade the performance of the whole system. Since we
645can never pack those objects, these repositories often take up a large amount of
646disk space, since we can only zlib compress them, but not store them in delta
647chains.
648
649=== Cruft packs
650
651A cruft pack eliminates the need for storing unreachable objects in a loose
652state by including the per-object mtimes in a separate file alongside a single
653pack containing all loose objects.
654
655A cruft pack is written by `git repack --cruft` when generating a new pack.
656linkgit:git-pack-objects[1]'s `--cruft` option. Note that `git repack --cruft`
657is a classic all-into-one repack, meaning that everything in the resulting pack is
658reachable, and everything else is unreachable. Once written, the `--cruft`
659option instructs `git repack` to generate another pack containing only objects
660not packed in the previous step (which equates to packing all unreachable
661objects together). This progresses as follows:
662
663  1. Enumerate every object, marking any object which is (a) not contained in a
664     kept-pack, and (b) whose mtime is within the grace period as a traversal
665     tip.
666
667  2. Perform a reachability traversal based on the tips gathered in the previous
668     step, adding every object along the way to the pack.
669
670  3. Write the pack out, along with a `.mtimes` file that records the per-object
671     timestamps.
672
673This mode is invoked internally by linkgit:git-repack[1] when instructed to
674write a cruft pack. Crucially, the set of in-core kept packs is exactly the set
675of packs which will not be deleted by the repack; in other words, they contain
676all of the repository's reachable objects.
677
678When a repository already has a cruft pack, `git repack --cruft` typically only
679adds objects to it. An exception to this is when `git repack` is given the
680`--cruft-expiration` option, which allows the generated cruft pack to omit
681expired objects instead of waiting for linkgit:git-gc[1] to expire those objects
682later on.
683
684It is linkgit:git-gc[1] that is typically responsible for removing expired
685unreachable objects.
686
687=== Alternatives
688
689Notable alternatives to this design include:
690
691  - The location of the per-object mtime data.
692
693On the location of mtime data, a new auxiliary file tied to the pack was chosen
694to avoid complicating the `.idx` format. If the `.idx` format were ever to gain
695support for optional chunks of data, it may make sense to consolidate the
696`.mtimes` format into the `.idx` itself.
697
698GIT
699---
700Part of the linkgit:git[1] suite