Git fork
at reftables-rust 195 lines 7.5 kB view raw
1gitpacking(7) 2============= 3 4NAME 5---- 6gitpacking - Advanced concepts related to packing in Git 7 8SYNOPSIS 9-------- 10gitpacking 11 12DESCRIPTION 13----------- 14 15This document aims to describe some advanced concepts related to packing 16in Git. 17 18Many concepts are currently described scattered between manual pages of 19various Git commands, including linkgit:git-pack-objects[1], 20linkgit:git-repack[1], and others, as well as linkgit:gitformat-pack[5], 21and parts of the `Documentation/technical` tree. 22 23There are many aspects of packing in Git that are not covered in this 24document that instead live in the aforementioned areas. Over time, those 25scattered bits may coalesce into this document. 26 27== Pseudo-merge bitmaps 28 29NOTE: Pseudo-merge bitmaps are considered an experimental feature, so 30the configuration and many of the ideas are subject to change. 31 32=== Background 33 34Reachability bitmaps are most efficient when we have on-disk stored 35bitmaps for one or more of the starting points of a traversal. For this 36reason, Git prefers storing bitmaps for commits at the tips of refs, 37because traversals tend to start with those points. 38 39But if you have a large number of refs, it's not feasible to store a 40bitmap for _every_ ref tip. It takes up space, and just OR-ing all of 41those bitmaps together is expensive. 42 43One way we can deal with that is to create bitmaps that represent 44_groups_ of refs. When a traversal asks about the entire group, then we 45can use this single bitmap instead of considering each ref individually. 46Because these bitmaps represent the set of objects which would be 47reachable in a hypothetical merge of all of the commits, we call them 48pseudo-merge bitmaps. 49 50=== Overview 51 52A "pseudo-merge bitmap" is used to refer to a pair of bitmaps, as 53follows: 54 55Commit bitmap:: 56 57 A bitmap whose set bits describe the set of commits included in the 58 pseudo-merge's "merge" bitmap (as below). 59 60Merge bitmap:: 61 62 A bitmap whose set bits describe the reachability closure over the set 63 of commits in the pseudo-merge's "commits" bitmap (as above). An 64 identical bitmap would be generated for an octopus merge with the same 65 set of parents as described in the commits bitmap. 66 67Pseudo-merge bitmaps can accelerate bitmap traversals when all commits 68for a given pseudo-merge are listed on either side of the traversal, 69either directly (by explicitly asking for them as part of the `HAVES` 70or `WANTS`) or indirectly (by encountering them during a fill-in 71traversal). 72 73=== Use-cases 74 75For example, suppose there exists a pseudo-merge bitmap with a large 76number of commits, all of which are listed in the `WANTS` section of 77some bitmap traversal query. When pseudo-merge bitmaps are enabled, the 78bitmap machinery can quickly determine there is a pseudo-merge which 79satisfies some subset of the wanted objects on either side of the query. 80Then, we can inflate the EWAH-compressed bitmap, and `OR` it in to the 81resulting bitmap. By contrast, without pseudo-merge bitmaps, we would 82have to repeat the decompression and `OR`-ing step over a potentially 83large number of individual bitmaps, which can take proportionally more 84time. 85 86Another benefit of pseudo-merges arises when there is some combination 87of (a) a large number of references, with (b) poor bitmap coverage, and 88(c) deep, nested trees, making fill-in traversal relatively expensive. 89For example, suppose that there are a large enough number of tags where 90bitmapping each of the tags individually is infeasible. Without 91pseudo-merge bitmaps, computing the result of, say, `git rev-list 92--use-bitmap-index --count --objects --tags` would likely require a 93large amount of fill-in traversal. But when a large quantity of those 94tags are stored together in a pseudo-merge bitmap, the bitmap machinery 95can take advantage of the fact that we only care about the union of 96objects reachable from all of those tags, and answer the query much 97faster. 98 99=== Configuration 100 101Reference tips are grouped into different pseudo-merge groups according 102to two criteria. A reference name matches one or more of the defined 103pseudo-merge patterns, and optionally one or more capture groups within 104that pattern which further partition the group. 105 106Within a group, commits may be considered "stable", or "unstable" 107depending on their age. These are adjusted by setting the 108`bitmapPseudoMerge.<name>.stableThreshold` and 109`bitmapPseudoMerge.<name>.threshold` configuration values, respectively. 110 111All stable commits are grouped into pseudo-merges of equal size 112(`bitmapPseudoMerge.<name>.stableSize`). If the `stableSize` 113configuration is set to, say, 100, then the first 100 commits (ordered 114by committer date) which are older than the `stableThreshold` value will 115form one group, the next 100 commits will form another group, and so on. 116 117Among unstable commits, the pseudo-merge machinery will attempt to 118combine older commits into large groups as opposed to newer commits 119which will appear in smaller groups. This is based on the heuristic that 120references whose tip commit is older are less likely to be modified to 121point at a different commit than a reference whose tip commit is newer. 122 123The size of groups is determined by a power-law decay function, and the 124decay parameter roughly corresponds to "k" in `f(n) = C*n^(-k/100)`, 125where `f(n)` describes the size of the `n`-th pseudo-merge group. The 126sample rate controls what percentage of eligible commits are considered 127as candidates. The threshold parameter indicates the minimum age (so as 128to avoid including too-recent commits in a pseudo-merge group, making it 129less likely to be valid). The "maxMerges" parameter sets an upper-bound 130on the number of pseudo-merge commits an individual group 131 132The "stable"-related parameters control "stable" pseudo-merge groups, 133comprised of a fixed number of commits which are older than the 134configured "stable threshold" value and may be grouped together in 135chunks of "stableSize" in order of age. 136 137The exact configuration for pseudo-merges is as follows: 138 139include::config/bitmap-pseudo-merge.adoc[] 140 141=== Examples 142 143Suppose that you have a repository with a large number of references, 144and you want a bare-bones configuration of pseudo-merge bitmaps that 145will enhance bitmap coverage of the `refs/` namespace. You may start 146with a configuration like so: 147 148---- 149[bitmapPseudoMerge "all"] 150 pattern = "refs/" 151 threshold = now 152 stableThreshold = never 153 sampleRate = 100 154 maxMerges = 64 155---- 156 157This will create pseudo-merge bitmaps for all references, regardless of 158their age, and group them into 64 pseudo-merge commits. 159 160If you wanted to separate tags from branches when generating 161pseudo-merge commits, you would instead define the pattern with a 162capture group, like so: 163 164---- 165[bitmapPseudoMerge "all"] 166 pattern = "refs/(heads/tags)/" 167---- 168 169Suppose instead that you are working in a fork-network repository, with 170each fork specified by some numeric ID, and whose refs reside in 171`refs/virtual/NNN/` (where `NNN` is the numeric ID corresponding to some 172fork) in the network. In this instance, you may instead write something 173like: 174 175---- 176[bitmapPseudoMerge "all"] 177 pattern = "refs/virtual/([0-9]+)/(heads|tags)/" 178 threshold = now 179 stableThreshold = never 180 sampleRate = 100 181 maxMerges = 64 182---- 183 184Which would generate pseudo-merge group identifiers like "1234-heads", 185and "5678-tags" (for branches in fork "1234", and tags in remote "5678", 186respectively). 187 188SEE ALSO 189-------- 190linkgit:git-pack-objects[1] 191linkgit:git-repack[1] 192 193GIT 194--- 195Part of the linkgit:git[1] suite