Merge branch 'rj/doc-technical-fixes' · freshlybakedca.ke/git@411903c

+1

Documentation/Makefile

··· 123 123 TECH_DOCS += technical/commit-graph 124 124 TECH_DOCS += technical/directory-rename-detection 125 125 TECH_DOCS += technical/hash-function-transition 126 + TECH_DOCS += technical/large-object-promisors 126 127 TECH_DOCS += technical/long-running-process-protocol 127 128 TECH_DOCS += technical/multi-pack-index 128 129 TECH_DOCS += technical/packfile-uri

+19 -10

Documentation/technical/commit-graph.adoc

··· 39 39 Values 1-4 satisfy the requirements of parse_commit_gently(). 40 40 41 41 There are two definitions of generation number: 42 + 42 43 1. Corrected committer dates (generation number v2) 43 44 2. Topological levels (generation number v1) 44 45 ··· 158 159 we enable fast writes of new commit data without rewriting the entire commit 159 160 history -- at least, most of the time. 160 161 161 - ## File Layout 162 + File Layout 163 + ~~~~~~~~~~~ 162 164 163 165 A commit-graph chain uses multiple files, and we use a fixed naming convention 164 166 to organize these files. Each commit-graph file has a name ··· 170 172 171 173 For example, if the `commit-graph-chain` file contains the lines 172 174 173 - ``` 175 + ---- 174 176 {hash0} 175 177 {hash1} 176 178 {hash2} 177 - ``` 179 + ---- 178 180 179 181 then the commit-graph chain looks like the following diagram: 180 182 ··· 213 215 `graph-{hash1}.graph` contains `{hash0}` while `graph-{hash2}.graph` contains 214 216 `{hash0}` and `{hash1}`. 215 217 216 - ## Merging commit-graph files 218 + Merging commit-graph files 219 + ~~~~~~~~~~~~~~~~~~~~~~~~~~ 217 220 218 221 If we only added a new commit-graph file on every write, we would run into a 219 222 linear search problem through many commit-graph files. Instead, we use a merge ··· 225 228 the commits in `graph-{hash1}` should be combined into a new `graph-{hash3}` 226 229 file. 227 230 231 + .... 228 232 +---------------------+ 229 233 | | 230 234 | (new commits) | ··· 250 254 | | 251 255 | | 252 256 +-----------------------+ 257 + .... 253 258 254 259 During this process, the commits to write are combined, sorted and we write the 255 260 contents to a temporary file, all while holding a `commit-graph-chain.lock` ··· 257 262 according to the computed `{hash3}`. Finally, we write the new chain data to 258 263 `commit-graph-chain.lock`: 259 264 260 - ``` 265 + ---- 261 266 {hash3} 262 267 {hash0} 263 - ``` 268 + ---- 264 269 265 270 We then close the lock-file. 266 271 267 - ## Merge Strategy 272 + Merge Strategy 273 + ~~~~~~~~~~~~~~ 268 274 269 275 When writing a set of commits that do not exist in the commit-graph stack of 270 276 height N, we default to creating a new file at level N + 1. We then decide to ··· 289 295 number of commits) could be extracted into config settings for full 290 296 flexibility. 291 297 292 - ## Handling Mixed Generation Number Chains 298 + Handling Mixed Generation Number Chains 299 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 293 300 294 301 With the introduction of generation number v2 and generation data chunk, the 295 302 following scenario is possible: ··· 318 325 rewriting split commit-graph as a single file (`--split=replace`) creates a 319 326 single layer with corrected commit dates. 320 327 321 - ## Deleting graph-{hash} files 328 + Deleting graph-\{hash\} files 329 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 322 330 323 331 After a new tip file is written, some `graph-{hash}` files may no longer 324 332 be part of a chain. It is important to remove these files from disk, eventually. ··· 333 341 defaults to zero, but can be changed using command-line arguments or a config 334 342 setting. 335 343 336 - ## Chains across multiple object directories 344 + Chains across multiple object directories 345 + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 337 346 338 347 In a repo with alternates, we look for the `commit-graph-chain` file starting 339 348 in the local object directory and then in each alternate. The first file that

+32 -32

Documentation/technical/large-object-promisors.adoc

··· 34 34 35 35 https://lore.kernel.org/git/xmqqbkdometi.fsf@gitster.g/ 36 36 37 - 0) Non goals 38 - ------------ 37 + Non goals 38 + --------- 39 39 40 40 - We will not discuss those client side improvements here, as they 41 41 would require changes in different parts of Git than this effort. ··· 90 90 even more to host content with larger blobs or more large blobs 91 91 than currently. 92 92 93 - I) Issues with the current situation 94 - ------------------------------------ 93 + I Issues with the current situation 94 + ----------------------------------- 95 95 96 96 - Some statistics made on GitLab repos have shown that more than 75% 97 97 of the disk space is used by blobs that are larger than 1MB and ··· 138 138 complaining that these tools require significant effort to set up, 139 139 learn and use correctly. 140 140 141 - II) Main features of the "Large Object Promisors" solution 142 - ---------------------------------------------------------- 141 + II Main features of the "Large Object Promisors" solution 142 + --------------------------------------------------------- 143 143 144 144 The main features below should give a rough overview of how the 145 145 solution may work. Details about needed elements can be found in ··· 166 166 other objects. 167 167 168 168 Note 1 169 - ++++++ 169 + ^^^^^^ 170 170 171 171 To clarify, a LOP is a normal promisor remote, except that: 172 172 ··· 178 178 itself. 179 179 180 180 Note 2 181 - ++++++ 181 + ^^^^^^ 182 182 183 183 Git already makes it possible for a main remote to also be a promisor 184 184 remote storing both regular objects and large blobs for a client that ··· 186 186 to avoid that. 187 187 188 188 Rationale 189 - +++++++++ 189 + ^^^^^^^^^ 190 190 191 191 LOPs aim to be good at handling large blobs while main remotes are 192 192 already good at handling other objects. 193 193 194 194 Implementation 195 - ++++++++++++++ 195 + ^^^^^^^^^^^^^^ 196 196 197 197 Git already has support for multiple promisor remotes, see 198 198 link:partial-clone.html#using-many-promisor-remotes[the partial clone documentation]. ··· 213 213 underlying object storage appear like a remote to Git. 214 214 215 215 Note 216 - ++++ 216 + ^^^^ 217 217 218 218 A LOP can be a promisor remote accessed using a remote helper by 219 219 both some clients and the main remote. 220 220 221 221 Rationale 222 - +++++++++ 222 + ^^^^^^^^^ 223 223 224 224 This looks like the simplest way to create LOPs that can cheaply 225 225 handle many large blobs. 226 226 227 227 Implementation 228 - ++++++++++++++ 228 + ^^^^^^^^^^^^^^ 229 229 230 230 Remote helpers are quite easy to write as shell scripts, but it might 231 231 be more efficient and maintainable to write them using other languages ··· 247 247 storage for large files handled by Git LFS. 248 248 249 249 Rationale 250 - +++++++++ 250 + ^^^^^^^^^ 251 251 252 252 This would simplify the server side if it wants to both use a LOP and 253 253 act as a Git LFS server. ··· 259 259 LOP all its blobs with a size over a configurable threshold. 260 260 261 261 Rationale 262 - +++++++++ 262 + ^^^^^^^^^ 263 263 264 264 This makes it easy to set things up and to clean things up. For 265 265 example, an admin could use this to manually convert a repo not using ··· 268 268 to regularly make sure the large blobs are moved to the LOP. 269 269 270 270 Implementation 271 - ++++++++++++++ 271 + ^^^^^^^^^^^^^^ 272 272 273 273 Using something based on `git repack --filter=...` to separate the 274 274 blobs we want to offload from the other Git objects could be a good ··· 284 284 perhaps pushed, into it. 285 285 286 286 Rationale 287 - +++++++++ 287 + ^^^^^^^^^ 288 288 289 289 A main remote containing many oversize blobs would defeat the purpose 290 290 of LOPs. 291 291 292 292 Implementation 293 - ++++++++++++++ 293 + ^^^^^^^^^^^^^^ 294 294 295 295 The way to offload to a LOP discussed in 4) above can be used to 296 296 regularly offload oversize blobs. About preventing oversize blobs from ··· 326 326 fetch those blobs from the LOP to be able to serve the client. 327 327 328 328 Note 329 - ++++ 329 + ^^^^ 330 330 331 331 For fetches instead of clones, a protocol negotiation might not always 332 332 happen, see the "What about fetches?" FAQ entry below for details. 333 333 334 334 Rationale 335 - +++++++++ 335 + ^^^^^^^^^ 336 336 337 337 Security, configurability and efficiency of setting things up. 338 338 339 339 Implementation 340 - ++++++++++++++ 340 + ^^^^^^^^^^^^^^ 341 341 342 342 A "promisor-remote" protocol v2 capability looks like a good way to 343 343 implement this. The way the client and server use this capability ··· 356 356 but might not need anymore, to the LOP. 357 357 358 358 Note 359 - ++++ 359 + ^^^^ 360 360 361 361 It might depend on the context if it should be OK or not for clients 362 362 to offload large blobs they have created, instead of fetched, directly ··· 367 367 implementing this feature. 368 368 369 369 Rationale 370 - +++++++++ 370 + ^^^^^^^^^ 371 371 372 372 On the client, the easiest way to deal with unneeded large blobs is to 373 373 offload them. 374 374 375 375 Implementation 376 - ++++++++++++++ 376 + ^^^^^^^^^^^^^^ 377 377 378 378 This is very similar to what 4) above is about, except on the client 379 379 side instead of the server side. So a good solution to 4) could likely ··· 385 385 a LOP, it is likely, and can easily be confirmed, that the LOP still 386 386 has them, so that they can just be removed from the client. 387 387 388 - III) Benefits of using LOPs 389 - --------------------------- 388 + III Benefits of using LOPs 389 + -------------------------- 390 390 391 391 Many benefits are related to the issues discussed in "I) Issues with 392 392 the current situation" above: ··· 406 406 407 407 - Reduced storage needs on the client side. 408 408 409 - IV) FAQ 410 - ------- 409 + IV FAQ 410 + ------ 411 411 412 412 What about using multiple LOPs on the server and client side? 413 413 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ··· 533 533 on a promisor remote. 534 534 535 535 Regular fetch 536 - +++++++++++++ 536 + ^^^^^^^^^^^^^ 537 537 538 538 In a regular fetch, the client will contact the main remote and a 539 539 protocol negotiation will happen between them. It's a good thing that ··· 551 551 using, or not using, the same LOP(s) as last time. 552 552 553 553 "Backfill" or "lazy" fetch 554 - ++++++++++++++++++++++++++ 554 + ^^^^^^^^^^^^^^^^^^^^^^^^^^ 555 555 556 556 When there is a backfill fetch, the client doesn't necessarily contact 557 557 the main remote first. It will try to fetch from its promisor remotes ··· 576 576 token when performing a protocol negotiation with the main remote (see 577 577 section II.6 above). 578 578 579 - V) Future improvements 580 - ---------------------- 579 + V Future improvements 580 + --------------------- 581 581 582 582 It is expected that at the beginning using LOPs will be mostly worth 583 583 it either in a corporate context where the Git version that clients

+1

Documentation/technical/meson.build

··· 13 13 'commit-graph.adoc', 14 14 'directory-rename-detection.adoc', 15 15 'hash-function-transition.adoc', 16 + 'large-object-promisors.adoc', 16 17 'long-running-process-protocol.adoc', 17 18 'multi-pack-index.adoc', 18 19 'packfile-uri.adoc',

+78 -42

Documentation/technical/remembering-renames.adoc

··· 10 10 11 11 Outline: 12 12 13 - 0. Assumptions 13 + 1. Assumptions 14 14 15 - 1. How rebasing and cherry-picking work 15 + 2. How rebasing and cherry-picking work 16 16 17 - 2. Why the renames on MERGE_SIDE1 in any given pick are *always* a 17 + 3. Why the renames on MERGE_SIDE1 in any given pick are *always* a 18 18 superset of the renames on MERGE_SIDE1 for the next pick. 19 19 20 - 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also 20 + 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also 21 21 a rename on MERGE_SIDE1 for the next pick 22 22 23 - 4. A detailed description of the counter-examples to #3. 23 + 5. A detailed description of the counter-examples to #4. 24 24 25 - 5. Why the special cases in #4 are still fully reasonable to use to pair 25 + 6. Why the special cases in #5 are still fully reasonable to use to pair 26 26 up files for three-way content merging in the merge machinery, and why 27 27 they do not affect the correctness of the merge. 28 28 29 - 6. Interaction with skipping of "irrelevant" renames 29 + 7. Interaction with skipping of "irrelevant" renames 30 30 31 - 7. Additional items that need to be cached 31 + 8. Additional items that need to be cached 32 32 33 - 8. How directory rename detection interacts with the above and why this 33 + 9. How directory rename detection interacts with the above and why this 34 34 optimization is still safe even if merge.directoryRenames is set to 35 35 "true". 36 36 37 37 38 - === 0. Assumptions === 38 + == 1. Assumptions == 39 39 40 40 There are two assumptions that will hold throughout this document: 41 41 ··· 44 44 45 45 * All merges are fully automatic 46 46 47 - and a third that will hold in sections 2-5 for simplicity, that I'll later 48 - address in section 8: 47 + and a third that will hold in sections 3-6 for simplicity, that I'll later 48 + address in section 9: 49 49 50 50 * No directory renames occur 51 51 ··· 77 77 stored on disk, and thus is thrown away as soon as the rebase or cherry 78 78 pick stops for the user to resolve the operation. 79 79 80 - The third assumption makes sections 2-5 simpler, and allows people to 80 + The third assumption makes sections 3-6 simpler, and allows people to 81 81 understand the basics of why this optimization is safe and effective, and 82 - then I can go back and address the specifics in section 8. It is probably 82 + then I can go back and address the specifics in section 9. It is probably 83 83 also worth noting that if directory renames do occur, then the default of 84 84 merge.directoryRenames being set to "conflict" means that the operation 85 85 will stop for users to resolve the conflicts and the cache will be thrown ··· 88 88 users will have set merge.directoryRenames to "true" to allow the merges to 89 89 continue to proceed automatically. The optimization is still safe with 90 90 this config setting, but we have to discuss a few more cases to show why; 91 - this discussion is deferred until section 8. 91 + this discussion is deferred until section 9. 92 92 93 93 94 - === 1. How rebasing and cherry-picking work === 94 + == 2. How rebasing and cherry-picking work == 95 95 96 96 Consider the following setup (from the git-rebase manpage): 97 97 98 + ------------ 98 99 A---B---C topic 99 100 / 100 101 D---E---F---G main 102 + ------------ 101 103 102 104 After rebasing or cherry-picking topic onto main, this will appear as: 103 105 106 + ------------ 104 107 A'--B'--C' topic 105 108 / 106 109 D---E---F---G main 110 + ------------ 107 111 108 112 The way the commits A', B', and C' are created is through a series of 109 113 merges, where rebase or cherry-pick sequentially uses each of the three ··· 111 115 in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For 112 116 this picture, the three commits for each of the three merges would be: 113 117 118 + .... 114 119 To create A': 115 120 MERGE_BASE: E 116 121 MERGE_SIDE1: G ··· 125 130 MERGE_BASE: B 126 131 MERGE_SIDE1: B' 127 132 MERGE_SIDE2: C 133 + .... 128 134 129 135 Sometimes, folks are surprised that these three-way merges are done. It 130 136 can be useful in understanding these three-way merges to view them in a ··· 138 144 B, B', and C, at least the parts before you decide to record a commit. 139 145 140 146 141 - === 2. Why the renames on MERGE_SIDE1 in any given pick are always a === 142 - === superset of the renames on MERGE_SIDE1 for the next pick. === 147 + == 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. == 143 148 144 149 The merge machinery uses the filenames it is fed from MERGE_BASE, 145 150 MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different ··· 156 161 First, let's remember what commits are involved in the first and second 157 162 picks of the cherry-pick or rebase sequence: 158 163 164 + .... 159 165 To create A': 160 166 MERGE_BASE: E 161 167 MERGE_SIDE1: G ··· 165 171 MERGE_BASE: A 166 172 MERGE_SIDE1: A' 167 173 MERGE_SIDE2: B 174 + .... 168 175 169 176 So, in particular, we need to show that the renames between E and G are a 170 177 superset of those between A and A'. ··· 181 188 and G are a superset of those between A and A'. 182 189 183 190 184 - === 3. Why any rename on MERGE_SIDE1 in any given pick is _almost_ === 185 - === always also a rename on MERGE_SIDE1 for the next pick. === 191 + == 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. == 186 192 187 193 Let's again look at the first two picks: 188 194 195 + .... 189 196 To create A': 190 197 MERGE_BASE: E 191 198 MERGE_SIDE1: G ··· 195 202 MERGE_BASE: A 196 203 MERGE_SIDE1: A' 197 204 MERGE_SIDE2: B 205 + .... 198 206 199 207 Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. 200 208 any given rename from E to G. Let's use the filenames 'oldfile' and 201 209 'newfile' for demonstration purposes. That first pick will function as 202 210 follows; when the rename is detected, the merge machinery will do a 203 211 three-way content merge of the following: 212 + 213 + .... 204 214 E:oldfile 205 215 G:newfile 206 216 A:oldfile 217 + .... 218 + 207 219 and produce a new result: 220 + 221 + .... 208 222 A':newfile 223 + .... 209 224 210 225 Note above that I've assumed that E->A did not rename oldfile. If that 211 226 side did rename, then we most likely have a rename/rename(1to2) conflict ··· 254 269 detectable as renames almost always. 255 270 256 271 257 - === 4. A detailed description of the counter-examples to #3. === 272 + == 5. A detailed description of the counter-examples to #4. == 258 273 259 - We already noted in section 3 that rename/rename(1to1) (i.e. both sides 274 + We already noted in section 4 that rename/rename(1to1) (i.e. both sides 260 275 renaming a file the same way) was one counter-example. The more 261 276 interesting bit, though, is why did we need to use the "almost" qualifier 262 277 when stating that A:oldfile and A':newfile are "almost" always detectable 263 278 as renames? 264 279 265 - Let's repeat an earlier point that section 3 made: 280 + Let's repeat an earlier point that section 4 made: 266 281 282 + .... 267 283 A':newfile was created by applying the changes between E:oldfile and 268 284 G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were 269 285 <50% of the size of E:oldfile. 286 + .... 270 287 271 288 If those changes that were <50% of the size of E:oldfile are also <50% of 272 289 the size of A:oldfile, then A:oldfile and A':newfile will be detectable as ··· 276 293 detect A:oldfile and A':newfile as renames. 277 294 278 295 Here's an example where that can happen: 296 + 279 297 * E:oldfile had 20 lines 280 298 * G:newfile added 10 new lines at the beginning of the file 281 299 * A:oldfile kept the first 3 lines of the file, and deleted all the rest 300 + 282 301 then 302 + 303 + .... 283 304 => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. 284 - E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and 285 - A':newfile would not be. 305 + E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and 306 + A':newfile would not be. 307 + .... 286 308 287 309 288 - === 5. Why the special cases in #4 are still fully reasonable to use to === 289 - === pair up files for three-way content merging in the merge machinery, === 290 - === and why they do not affect the correctness of the merge. === 310 + == 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. == 291 311 292 312 In the rename/rename(1to1) case, A:newfile and A':newfile are not renames 293 313 since they use the *same* filename. However, files with the same filename ··· 295 315 machinery has never employed break detection). The interesting 296 316 counter-example case is thus not the rename/rename(1to1) case, but the case 297 317 where A did not rename oldfile. That was the case that we spent most of 298 - the time discussing in sections 3 and 4. The remainder of this section 318 + the time discussing in sections 4 and 5. The remainder of this section 299 319 will be devoted to that case as well. 300 320 301 321 So, even if A:oldfile and A':newfile aren't detectable as renames, why is 302 322 it still reasonable to pair them up for three-way content merging in the 303 323 merge machinery? There are multiple reasons: 304 324 305 - * As noted in sections 3 and 4, the diff between A:oldfile and A':newfile 325 + * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile 306 326 is *exactly* the same as the diff between E:oldfile and G:newfile. The 307 327 latter pair were detected as renames, so it seems unlikely to surprise 308 328 users for us to treat A:oldfile and A':newfile as renames. ··· 394 414 optimization than without. 395 415 396 416 397 - === 6. Interaction with skipping of "irrelevant" renames === 417 + == 7. Interaction with skipping of "irrelevant" renames == 398 418 399 419 Previous optimizations involved skipping rename detection for paths 400 420 considered to be "irrelevant". See for example the following commits: ··· 421 441 already detected renames. 422 442 423 443 424 - === 7. Additional items that need to be cached === 444 + == 8. Additional items that need to be cached == 425 445 426 446 It turns out we have to cache more than just renames; we also cache: 427 447 448 + .... 428 449 A) non-renames (i.e. unpaired deletes) 429 450 B) counts of renames within directories 430 451 C) sources that were marked as RELEVANT_LOCATION, but which were 431 452 downgraded to RELEVANT_NO_MORE 432 453 D) the toplevel trees involved in the merge 454 + .... 433 455 434 456 These are all stored in struct rename_info, and respectively appear in 457 + 435 458 * cached_pairs (along side actual renames, just with a value of NULL) 436 459 * dir_rename_counts 437 460 * cached_irrelevant 438 461 * merge_trees 439 462 440 - The reason for (A) comes from the irrelevant renames skipping 441 - optimization discussed in section 6. The fact that irrelevant renames 463 + The reason for `(A)` comes from the irrelevant renames skipping 464 + optimization discussed in section 7. The fact that irrelevant renames 442 465 are skipped means we only get a subset of the potential renames 443 466 detected and subsequent commits may need to run rename detection on 444 467 the upstream side on a subset of the remaining renames (to get the ··· 447 470 repeatedly check that those paths remain unpaired on the upstream side 448 471 with every commit we are transplanting. 449 472 450 - The reason for (B) is that diffcore_rename_extended() is what 473 + The reason for `(B)` is that diffcore_rename_extended() is what 451 474 generates the counts of renames by directory which is needed in 452 475 directory rename detection, and if we don't run 453 476 diffcore_rename_extended() again then we need to have the output from 454 477 it, including dir_rename_counts, from the previous run. 455 478 456 - The reason for (C) is that merge-ort's tree traversal will again think 479 + The reason for `(C)` is that merge-ort's tree traversal will again think 457 480 those paths are relevant (marking them as RELEVANT_LOCATION), but the 458 481 fact that they were downgraded to RELEVANT_NO_MORE means that 459 482 dir_rename_counts already has the information we need for directory 460 483 rename detection. (A path which becomes RELEVANT_CONTENT in a 461 484 subsequent commit will be removed from cached_irrelevant.) 462 485 463 - The reason for (D) is that is how we determine whether the remember 486 + The reason for `(D)` is that is how we determine whether the remember 464 487 renames optimization can be used. In particular, remembering that our 465 488 sequence of merges looks like: 466 489 490 + .... 467 491 Merge 1: 468 492 MERGE_BASE: E 469 493 MERGE_SIDE1: G ··· 475 499 MERGE_SIDE1: A' 476 500 MERGE_SIDE2: B 477 501 => Creates B' 502 + .... 478 503 479 504 It is the fact that the trees A and A' appear both in Merge 1 and in 480 505 Merge 2, with A as a parent of A' that allows this optimization. So ··· 482 507 time. 483 508 484 509 485 - === 8. How directory rename detection interacts with the above and === 486 - === why this optimization is still safe even if === 487 - === merge.directoryRenames is set to "true". === 510 + == 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". == 488 511 489 512 As noted in the assumptions section: 490 513 514 + .... 491 515 """ 492 516 ...if directory renames do occur, then the default of 493 517 merge.directoryRenames being set to "conflict" means that the operation ··· 497 521 is that some users will have set merge.directoryRenames to "true" to 498 522 allow the merges to continue to proceed automatically. 499 523 """ 524 + .... 500 525 501 526 Let's remember that we need to look at how any given pick affects the next 502 527 one. So let's again use the first two picks from the diagram in section 503 528 one: 504 529 530 + .... 505 531 First pick does this three-way merge: 506 532 MERGE_BASE: E 507 533 MERGE_SIDE1: G ··· 513 539 MERGE_SIDE1: A' 514 540 MERGE_SIDE2: B 515 541 => creates B' 542 + .... 516 543 517 544 Now, directory rename detection exists so that if one side of history 518 545 renames a directory, and the other side adds a new file to the old ··· 545 572 concerned; see the assumptions section). Two interesting sub-notes 546 573 about these counts: 547 574 548 - * If we need to perform rename-detection again on the given side (e.g. 575 + ** If we need to perform rename-detection again on the given side (e.g. 549 576 some paths are relevant for rename detection that weren't before), 550 577 then we clear dir_rename_counts and recompute it, making use of 551 578 cached_pairs. The reason it is important to do this is optimizations ··· 556 583 easiest way to "fix up" dir_rename_counts in such cases is to just 557 584 recompute it. 558 585 559 - * If we prune rename/rename(1to1) entries from the cache, then we also 586 + ** If we prune rename/rename(1to1) entries from the cache, then we also 560 587 need to update dir_rename_counts to decrement the counts for the 561 588 involved directory and any relevant parent directories (to undo what 562 589 update_dir_rename_counts() in diffcore-rename.c incremented when the ··· 578 605 579 606 Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir 580 607 608 + .... 581 609 This case looks like this: 582 610 583 611 MERGE_BASE: E, Has olddir/ ··· 595 623 * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile 596 624 Given the cached rename noted above, the second merge can proceed as 597 625 expected without needing to perform rename detection from A -> A'. 626 + .... 598 627 599 628 Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir 600 629 630 + .... 601 631 This case looks like this: 632 + 602 633 MERGE_BASE: E oldfile, olddir/ 603 634 MERGE_SIDE1: G oldfile, olddir/ -> newdir/ 604 635 MERGE_SIDE2: A oldfile -> olddir/newfile ··· 617 648 618 649 Given the cached rename noted above, the second merge can proceed as 619 650 expected without needing to perform rename detection from A -> A'. 651 + .... 620 652 621 653 Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir 622 654 655 + .... 623 656 This case looks like this: 624 657 625 658 MERGE_BASE: E, Has olddir/ ··· 635 668 In this case, with the optimization, note that after the first commit there 636 669 were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. 637 670 But the second merge didn't need any renames so this is fine. 671 + .... 638 672 639 673 Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir 640 674 675 + .... 641 676 This case looks like this: 642 677 643 678 MERGE_BASE: E, Has olddir/ ··· 658 693 659 694 Given the cached rename noted above, the second merge can proceed as 660 695 expected without needing to perform rename detection from A -> A'. 696 + .... 661 697 662 698 Finally, I'll just note here that interactions with the 663 699 skip-irrelevant-renames optimization means we sometimes don't detect

+362 -314

Documentation/technical/sparse-checkout.adoc

··· 14 14 * Reference Emails 15 15 16 16 17 - === Terminology === 17 + == Terminology == 18 18 19 - cone mode: one of two modes for specifying the desired subset of files 19 + *`cone mode`*:: 20 + one of two modes for specifying the desired subset of files 20 21 in a sparse-checkout. In cone-mode, the user specifies 21 22 directories (getting both everything under that directory as 22 23 well as everything in leading directories), while in non-cone 23 24 mode, the user specifies gitignore-style patterns. Controlled 24 25 by the --[no-]cone option to sparse-checkout init|set. 25 26 26 - SKIP_WORKTREE: When tracked files do not match the sparse specification and 27 + *`SKIP_WORKTREE`*:: 28 + When tracked files do not match the sparse specification and 27 29 are removed from the working tree, the file in the index is marked 28 30 with a SKIP_WORKTREE bit. Note that if a tracked file has the 29 31 SKIP_WORKTREE bit set but the file is later written by the user to 30 32 the working tree anyway, the SKIP_WORKTREE bit will be cleared at 31 33 the beginning of any subsequent Git operation. 32 - 33 - Most sparse checkout users are unaware of this implementation 34 - detail, and the term should generally be avoided in user-facing 35 - descriptions and command flags. Unfortunately, prior to the 36 - `sparse-checkout` subcommand this low-level detail was exposed, 37 - and as of time of writing, is still exposed in various places. 34 + + 35 + Most sparse checkout users are unaware of this implementation 36 + detail, and the term should generally be avoided in user-facing 37 + descriptions and command flags. Unfortunately, prior to the 38 + `sparse-checkout` subcommand this low-level detail was exposed, 39 + and as of time of writing, is still exposed in various places. 38 40 39 - sparse-checkout: a subcommand in git used to reduce the files present in 41 + *`sparse-checkout`*:: 42 + a subcommand in git used to reduce the files present in 40 43 the working tree to a subset of all tracked files. Also, the 41 44 name of the file in the $GIT_DIR/info directory used to track 42 45 the sparsity patterns corresponding to the user's desired 43 46 subset. 44 47 45 - sparse cone: see cone mode 48 + *`sparse cone`*:: see cone mode 46 49 47 - sparse directory: An entry in the index corresponding to a directory, which 50 + *`sparse directory`*:: 51 + An entry in the index corresponding to a directory, which 48 52 appears in the index instead of all the files under that directory 49 53 that would normally appear. See also sparse-index. Something that 50 54 can cause confusion is that the "sparse directory" does NOT match ··· 52 56 working tree. May be renamed in the future (e.g. to "skipped 53 57 directory"). 54 58 55 - sparse index: A special mode for sparse-checkout that also makes the 59 + *`sparse index`*:: 60 + A special mode for sparse-checkout that also makes the 56 61 index sparse by recording a directory entry in lieu of all the 57 62 files underneath that directory (thus making that a "skipped 58 63 directory" which unfortunately has also been called a "sparse ··· 60 65 directories. Controlled by the --[no-]sparse-index option to 61 66 init|set|reapply. 62 67 63 - sparsity patterns: patterns from $GIT_DIR/info/sparse-checkout used to 68 + *`sparsity patterns`*:: 69 + patterns from $GIT_DIR/info/sparse-checkout used to 64 70 define the set of files of interest. A warning: It is easy to 65 71 over-use this term (or the shortened "patterns" term), for two 66 72 reasons: (1) users in cone mode specify directories rather than ··· 70 76 transiently differ in the working tree or index from the sparsity 71 77 patterns (see "Sparse specification vs. sparsity patterns"). 72 78 73 - sparse specification: The set of paths in the user's area of focus. This 79 + *`sparse specification`*:: 80 + The set of paths in the user's area of focus. This 74 81 is typically just the tracked files that match the sparsity 75 82 patterns, but the sparse specification can temporarily differ and 76 83 include additional files. (See also "Sparse specification ··· 87 94 * If working with the index and the working copy, the sparse 88 95 specification is the union of the paths from above. 89 96 90 - vivifying: When a command restores a tracked file to the working tree (and 97 + *`vivifying`*:: 98 + When a command restores a tracked file to the working tree (and 91 99 hopefully also clears the SKIP_WORKTREE bit in the index for that 92 100 file), this is referred to as "vivifying" the file. 93 101 94 102 95 - === Purpose of sparse-checkouts === 103 + == Purpose of sparse-checkouts == 96 104 97 105 sparse-checkouts exist to allow users to work with a subset of their 98 106 files. ··· 120 128 half dozen different ways. Let's start by considering the high level 121 129 usecases: 122 130 123 - A) Users are _only_ interested in the sparse portion of the repo 124 - 125 - A*) Users are _only_ interested in the sparse portion of the repo 126 - that they have downloaded so far 127 - 128 - B) Users want a sparse working tree, but are working in a larger whole 129 - 130 - C) sparse-checkout is a behind-the-scenes implementation detail allowing 131 + [horizontal] 132 + A):: Users are _only_ interested in the sparse portion of the repo 133 + A*):: Users are _only_ interested in the sparse portion of the repo 134 + that they have downloaded so far 135 + B):: Users want a sparse working tree, but are working in a larger whole 136 + C):: sparse-checkout is a behind-the-scenes implementation detail allowing 131 137 Git to work with a specially crafted in-house virtual file system; 132 138 users are actually working with a "full" working tree that is 133 139 lazily populated, and sparse-checkout helps with the lazy population ··· 136 142 It may be worth explaining each of these in a bit more detail: 137 143 138 144 139 - (Behavior A) Users are _only_ interested in the sparse portion of the repo 145 + === (Behavior A) Users are _only_ interested in the sparse portion of the repo 140 146 141 147 These folks might know there are other things in the repository, but 142 148 don't care. They are uninterested in other parts of the repository, and ··· 163 169 after a merge or pull) can lead to worries about local repository size 164 170 growing unnecessarily[10]. 165 171 166 - (Behavior A*) Users are _only_ interested in the sparse portion of the repo 167 - that they have downloaded so far (a variant on the first usecase) 172 + === (Behavior A*) Users are _only_ interested in the sparse portion of the repo that they have downloaded so far (a variant on the first usecase) 168 173 169 174 This variant is driven by folks who using partial clones together with 170 175 sparse checkouts and do disconnected development (so far sounding like a ··· 173 178 through history within their sparse specification may be too much, so they 174 179 only download some. They would still like operations to succeed without 175 180 network connectivity, though, so things like `git log -S${SEARCH_TERM} -p` 176 - or `git grep ${SEARCH_TERM} OLDREV ` would need to be prepared to provide 181 + or `git grep ${SEARCH_TERM} OLDREV` would need to be prepared to provide 177 182 partial results that depend on what happens to have been downloaded. 178 183 179 184 This variant could be viewed as Behavior A with the sparse specification 180 185 for history querying operations modified from "sparsity patterns" to 181 186 "sparsity patterns limited to the blobs we have already downloaded". 182 187 183 - (Behavior B) Users want a sparse working tree, but are working in a 184 - larger whole 188 + === (Behavior B) Users want a sparse working tree, but are working in a larger whole 185 189 186 190 Stolee described this usecase this way[11]: 187 191 ··· 229 233 prefer getting "unrelated" results from their history queries over having 230 234 slow commands. 231 235 232 - (Behavior C) sparse-checkout is an implementational detail supporting a 233 - special VFS. 236 + === (Behavior C) sparse-checkout is an implementational detail supporting a special VFS. 234 237 235 238 This usecase goes slightly against the traditional definition of 236 239 sparse-checkout in that it actually tries to present a full or dense ··· 255 258 all files are present. 256 259 257 260 258 - === Usecases of primary concern === 261 + == Usecases of primary concern == 259 262 260 263 Most of the rest of this document will focus on Behavior A and Behavior 261 264 B. Some notes about the other two cases and why we are not focusing on 262 265 them: 263 266 264 - (Behavior A*) 267 + === (Behavior A*) 265 268 266 269 Supporting this usecase is estimated to be difficult and a lot of work. 267 270 There are no plans to implement it currently, but it may be a potential ··· 275 278 sparse specification to restrict it to already-downloaded blobs. The hard 276 279 part is in making commands capable of respecting that modified definition. 277 280 278 - (Behavior C) 281 + === (Behavior C) 279 282 280 283 This usecase violates some of the early sparse-checkout documented 281 284 assumptions (since files marked as SKIP_WORKTREE will be displayed to users ··· 300 303 patches that break things for the real Behavior B folks. 301 304 302 305 303 - === Oversimplified mental models === 306 + == Oversimplified mental models == 304 307 305 308 An oversimplification of the differences in the above behaviors is: 306 309 307 - Behavior A: Restrict worktree and history operations to sparse specification 308 - Behavior B: Restrict worktree operations to sparse specification; have any 309 - history operations work across all files 310 - Behavior C: Do not restrict either worktree or history operations to the 311 - sparse specification...with the exception of branch checkouts or 312 - switches which avoid writing files that will match the index so 313 - they can later lazily be populated instead. 310 + (Behavior A):: Restrict worktree and history operations to sparse specification 311 + (Behavior B):: Restrict worktree operations to sparse specification; have any 312 + history operations work across all files 313 + (Behavior C):: Do not restrict either worktree or history operations to the 314 + sparse specification...with the exception of branch checkouts or 315 + switches which avoid writing files that will match the index so 316 + they can later lazily be populated instead. 314 317 315 318 316 - === Desired behavior === 319 + == Desired behavior == 317 320 318 321 As noted previously, despite the simple idea of just working with a subset 319 322 of files, there are a range of different behavioral changes that need to be ··· 326 329 327 330 * Commands behaving the same regardless of high-level use-case 328 331 329 - * commands that only look at files within the sparsity specification 332 + ** commands that only look at files within the sparsity specification 330 333 331 - * diff (without --cached or REVISION arguments) 332 - * grep (without --cached or REVISION arguments) 333 - * diff-files 334 + *** diff (without --cached or REVISION arguments) 335 + *** grep (without --cached or REVISION arguments) 336 + *** diff-files 334 337 335 - * commands that restore files to the working tree that match sparsity 338 + ** commands that restore files to the working tree that match sparsity 336 339 patterns, and remove unmodified files that don't match those 337 340 patterns: 338 341 339 - * switch 340 - * checkout (the switch-like half) 341 - * read-tree 342 - * reset --hard 342 + *** switch 343 + *** checkout (the switch-like half) 344 + *** read-tree 345 + *** reset --hard 343 346 344 - * commands that write conflicted files to the working tree, but otherwise 347 + ** commands that write conflicted files to the working tree, but otherwise 345 348 will omit writing files to the working tree that do not match the 346 349 sparsity patterns: 347 350 348 - * merge 349 - * rebase 350 - * cherry-pick 351 - * revert 351 + *** merge 352 + *** rebase 353 + *** cherry-pick 354 + *** revert 352 355 353 - * `am` and `apply --cached` should probably be in this section but 356 + *** `am` and `apply --cached` should probably be in this section but 354 357 are buggy (see the "Known bugs" section below) 355 358 356 359 The behavior for these commands somewhat depends upon the merge 357 360 strategy being used: 358 - * `ort` behaves as described above 359 - * `octopus` and `resolve` will always vivify any file changed in the merge 361 + 362 + *** `ort` behaves as described above 363 + *** `octopus` and `resolve` will always vivify any file changed in the merge 360 364 relative to the first parent, which is rather suboptimal. 361 365 362 366 It is also important to note that these commands WILL update the index ··· 372 376 specification and the sparsity patterns (much like the commands in the 373 377 previous section). 374 378 375 - * commands that always ignore sparsity since commits must be full-tree 379 + ** commands that always ignore sparsity since commits must be full-tree 376 380 377 - * archive 378 - * bundle 379 - * commit 380 - * format-patch 381 - * fast-export 382 - * fast-import 383 - * commit-tree 381 + *** archive 382 + *** bundle 383 + *** commit 384 + *** format-patch 385 + *** fast-export 386 + *** fast-import 387 + *** commit-tree 384 388 385 - * commands that write any modified file to the working tree (conflicted 389 + ** commands that write any modified file to the working tree (conflicted 386 390 or not, and whether those paths match sparsity patterns or not): 387 391 388 - * stash 389 - * apply (without `--index` or `--cached`) 392 + *** stash 393 + *** apply (without `--index` or `--cached`) 390 394 391 395 * Commands that may slightly differ for behavior A vs. behavior B: 392 396 ··· 394 398 behaviors, but may differ in verbosity and types of warning and error 395 399 messages. 396 400 397 - * commands that make modifications to which files are tracked: 398 - * add 399 - * rm 400 - * mv 401 - * update-index 401 + ** commands that make modifications to which files are tracked: 402 + 403 + *** add 404 + *** rm 405 + *** mv 406 + *** update-index 402 407 403 408 The fact that files can move between the 'tracked' and 'untracked' 404 409 categories means some commands will have to treat untracked files 405 410 differently. But if we have to treat untracked files differently, 406 411 then additional commands may also need changes: 407 412 408 - * status 409 - * clean 413 + *** status 414 + *** clean 410 415 411 416 In particular, `status` may need to report any untracked files outside 412 417 the sparsity specification as an erroneous condition (especially to ··· 420 425 may need to ignore the sparse specification by its nature. Also, its 421 426 current --[no-]ignore-skip-worktree-entries default is totally bogus. 422 427 423 - * commands for manually tweaking paths in both the index and the working tree 424 - * `restore` 425 - * the restore-like half of `checkout` 428 + ** commands for manually tweaking paths in both the index and the working tree 429 + 430 + *** `restore` 431 + *** the restore-like half of `checkout` 426 432 427 433 These commands should be similar to add/rm/mv in that they should 428 434 only operate on the sparse specification by default, and require a ··· 433 439 434 440 * Commands that significantly differ for behavior A vs. behavior B: 435 441 436 - * commands that query history 437 - * diff (with --cached or REVISION arguments) 438 - * grep (with --cached or REVISION arguments) 439 - * show (when given commit arguments) 440 - * blame (only matters when one or more -C flags are passed) 441 - * and annotate 442 - * log 443 - * whatchanged (may not exist anymore) 444 - * ls-files 445 - * diff-index 446 - * diff-tree 447 - * ls-tree 442 + ** commands that query history 443 + 444 + *** diff (with --cached or REVISION arguments) 445 + *** grep (with --cached or REVISION arguments) 446 + *** show (when given commit arguments) 447 + *** blame (only matters when one or more -C flags are passed) 448 + **** and annotate 449 + *** log 450 + *** whatchanged (may not exist anymore) 451 + *** ls-files 452 + *** diff-index 453 + *** diff-tree 454 + *** ls-tree 448 455 449 456 Note: for log and whatchanged, revision walking logic is unaffected 450 457 but displaying of patches is affected by scoping the command to the ··· 458 465 459 466 * Commands I don't know how to classify 460 467 461 - * range-diff 468 + ** range-diff 462 469 463 470 Is this like `log` or `format-patch`? 464 471 465 - * cherry 472 + ** cherry 466 473 467 474 See range-diff 468 475 469 476 * Commands unaffected by sparse-checkouts 470 477 471 - * shortlog 472 - * show-branch 473 - * rev-list 474 - * bisect 478 + ** shortlog 479 + ** show-branch 480 + ** rev-list 481 + ** bisect 475 482 476 - * branch 477 - * describe 478 - * fetch 479 - * gc 480 - * init 481 - * maintenance 482 - * notes 483 - * pull (merge & rebase have the necessary changes) 484 - * push 485 - * submodule 486 - * tag 483 + ** branch 484 + ** describe 485 + ** fetch 486 + ** gc 487 + ** init 488 + ** maintenance 489 + ** notes 490 + ** pull (merge & rebase have the necessary changes) 491 + ** push 492 + ** submodule 493 + ** tag 487 494 488 - * config 489 - * filter-branch (works in separate checkout without sparse-checkout setup) 490 - * pack-refs 491 - * prune 492 - * remote 493 - * repack 494 - * replace 495 + ** config 496 + ** filter-branch (works in separate checkout without sparse-checkout setup) 497 + ** pack-refs 498 + ** prune 499 + ** remote 500 + ** repack 501 + ** replace 495 502 496 - * bugreport 497 - * count-objects 498 - * fsck 499 - * gitweb 500 - * help 501 - * instaweb 502 - * merge-tree (doesn't touch worktree or index, and merges always compute full-tree) 503 - * rerere 504 - * verify-commit 505 - * verify-tag 503 + ** bugreport 504 + ** count-objects 505 + ** fsck 506 + ** gitweb 507 + ** help 508 + ** instaweb 509 + ** merge-tree (doesn't touch worktree or index, and merges always compute full-tree) 510 + ** rerere 511 + ** verify-commit 512 + ** verify-tag 506 513 507 - * commit-graph 508 - * hash-object 509 - * index-pack 510 - * mktag 511 - * mktree 512 - * multi-pack-index 513 - * pack-objects 514 - * prune-packed 515 - * symbolic-ref 516 - * unpack-objects 517 - * update-ref 518 - * write-tree (operates on index, possibly optimized to use sparse dir entries) 514 + ** commit-graph 515 + ** hash-object 516 + ** index-pack 517 + ** mktag 518 + ** mktree 519 + ** multi-pack-index 520 + ** pack-objects 521 + ** prune-packed 522 + ** symbolic-ref 523 + ** unpack-objects 524 + ** update-ref 525 + ** write-tree (operates on index, possibly optimized to use sparse dir entries) 519 526 520 - * for-each-ref 521 - * get-tar-commit-id 522 - * ls-remote 523 - * merge-base (merges are computed full tree, so merge base should be too) 524 - * name-rev 525 - * pack-redundant 526 - * rev-parse 527 - * show-index 528 - * show-ref 529 - * unpack-file 530 - * var 531 - * verify-pack 527 + ** for-each-ref 528 + ** get-tar-commit-id 529 + ** ls-remote 530 + ** merge-base (merges are computed full tree, so merge base should be too) 531 + ** name-rev 532 + ** pack-redundant 533 + ** rev-parse 534 + ** show-index 535 + ** show-ref 536 + ** unpack-file 537 + ** var 538 + ** verify-pack 532 539 533 - * <Everything under 'Interacting with Others' in 'git help --all'> 534 - * <Everything under 'Low-level...Syncing' in 'git help --all'> 535 - * <Everything under 'Low-level...Internal Helpers' in 'git help --all'> 536 - * <Everything under 'External commands' in 'git help --all'> 540 + ** <Everything under 'Interacting with Others' in 'git help --all'> 541 + ** <Everything under 'Low-level...Syncing' in 'git help --all'> 542 + ** <Everything under 'Low-level...Internal Helpers' in 'git help --all'> 543 + ** <Everything under 'External commands' in 'git help --all'> 537 544 538 545 * Commands that might be affected, but who cares? 539 546 540 - * merge-file 541 - * merge-index 542 - * gitk? 547 + ** merge-file 548 + ** merge-index 549 + ** gitk? 543 550 544 551 545 - === Behavior classes === 552 + == Behavior classes == 546 553 547 554 From the above there are a few classes of behavior: 548 555 ··· 573 580 574 581 Commands in this class generally behave like the "restrict" class, 575 582 except that: 576 - (1) they will ignore the sparse specification and write files with 577 - conflicts to the working tree (thus temporarily expanding the 578 - sparse specification to include such files.) 579 - (2) they are grouped with commands which move to a new commit, since 580 - they often create a commit and then move to it, even though we 581 - know there are many exceptions to moving to the new commit. (For 582 - example, the user may rebase a commit that becomes empty, or have 583 - a cherry-pick which conflicts, or a user could run `merge 584 - --no-commit`, and we also view `apply --index` kind of like `am 585 - --no-commit`.) As such, these commands can make changes to index 586 - files outside the sparse specification, though they'll mark such 587 - files with SKIP_WORKTREE. 583 + 584 + (1) they will ignore the sparse specification and write files with 585 + conflicts to the working tree (thus temporarily expanding the 586 + sparse specification to include such files.) 587 + (2) they are grouped with commands which move to a new commit, since 588 + they often create a commit and then move to it, even though we 589 + know there are many exceptions to moving to the new commit. (For 590 + example, the user may rebase a commit that becomes empty, or have 591 + a cherry-pick which conflicts, or a user could run `merge 592 + --no-commit`, and we also view `apply --index` kind of like `am 593 + --no-commit`.) As such, these commands can make changes to index 594 + files outside the sparse specification, though they'll mark such 595 + files with SKIP_WORKTREE. 588 596 589 597 * "restrict also specially applied to untracked files" 590 598 ··· 609 617 specification. 610 618 611 619 612 - === Subcommand-dependent defaults === 620 + == Subcommand-dependent defaults == 613 621 614 622 Note that we have different defaults depending on the command for the 615 623 desired behavior : 616 624 617 625 * Commands defaulting to "restrict": 618 - * diff-files 619 - * diff (without --cached or REVISION arguments) 620 - * grep (without --cached or REVISION arguments) 621 - * switch 622 - * checkout (the switch-like half) 623 - * reset (<commit>) 626 + 627 + ** diff-files 628 + ** diff (without --cached or REVISION arguments) 629 + ** grep (without --cached or REVISION arguments) 630 + ** switch 631 + ** checkout (the switch-like half) 632 + ** reset (<commit>) 624 633 625 - * restore 626 - * checkout (the restore-like half) 627 - * checkout-index 628 - * reset (with pathspec) 634 + ** restore 635 + ** checkout (the restore-like half) 636 + ** checkout-index 637 + ** reset (with pathspec) 629 638 630 639 This behavior makes sense; these interact with the working tree. 631 640 632 641 * Commands defaulting to "restrict modulo conflicts": 633 - * merge 634 - * rebase 635 - * cherry-pick 636 - * revert 642 + 643 + ** merge 644 + ** rebase 645 + ** cherry-pick 646 + ** revert 637 647 638 - * am 639 - * apply --index (which is kind of like an `am --no-commit`) 648 + ** am 649 + ** apply --index (which is kind of like an `am --no-commit`) 640 650 641 - * read-tree (especially with -m or -u; is kind of like a --no-commit merge) 642 - * reset (<tree-ish>, due to similarity to read-tree) 651 + ** read-tree (especially with -m or -u; is kind of like a --no-commit merge) 652 + ** reset (<tree-ish>, due to similarity to read-tree) 643 653 644 654 These also interact with the working tree, but require slightly 645 655 different behavior either so that (a) conflicts can be resolved or (b) ··· 648 658 (See also the "Known bugs" section below regarding `am` and `apply`) 649 659 650 660 * Commands defaulting to "no restrict": 651 - * archive 652 - * bundle 653 - * commit 654 - * format-patch 655 - * fast-export 656 - * fast-import 657 - * commit-tree 658 661 659 - * stash 660 - * apply (without `--index`) 662 + ** archive 663 + ** bundle 664 + ** commit 665 + ** format-patch 666 + ** fast-export 667 + ** fast-import 668 + ** commit-tree 669 + 670 + ** stash 671 + ** apply (without `--index`) 661 672 662 673 These have completely different defaults and perhaps deserve the most 663 674 detailed explanation: ··· 679 690 sparse specification then we'll lose changes from the user. 680 691 681 692 * Commands defaulting to "restrict also specially applied to untracked files": 682 - * add 683 - * rm 684 - * mv 685 - * update-index 686 - * status 687 - * clean (?) 693 + 694 + ** add 695 + ** rm 696 + ** mv 697 + ** update-index 698 + ** status 699 + ** clean (?) 700 + 701 + .... 702 + Our original implementation for the first three of these commands was 703 + "no restrict", but it had some severe usability issues: 688 704 689 - Our original implementation for the first three of these commands was 690 - "no restrict", but it had some severe usability issues: 691 - * `git add <somefile>` if honored and outside the sparse 692 - specification, can result in the file randomly disappearing later 693 - when some subsequent command is run (since various commands 694 - automatically clean up unmodified files outside the sparse 695 - specification). 696 - * `git rm '*.jpg'` could very negatively surprise users if it deletes 697 - files outside the range of the user's interest. 698 - * `git mv` has similar surprises when moving into or out of the cone, 699 - so best to restrict by default 705 + * `git add <somefile>` if honored and outside the sparse 706 + specification, can result in the file randomly disappearing later 707 + when some subsequent command is run (since various commands 708 + automatically clean up unmodified files outside the sparse 709 + specification). 710 + * `git rm '*.jpg'` could very negatively surprise users if it deletes 711 + files outside the range of the user's interest. 712 + * `git mv` has similar surprises when moving into or out of the cone, 713 + so best to restrict by default 700 714 701 - So, we switched `add` and `rm` to default to "restrict", which made 702 - usability problems much less severe and less frequent, but we still got 703 - complaints because commands like: 704 - git add <file-outside-sparse-specification> 705 - git rm <file-outside-sparse-specification> 706 - would silently do nothing. We should instead print an error in those 707 - cases to get usability right. 715 + So, we switched `add` and `rm` to default to "restrict", which made 716 + usability problems much less severe and less frequent, but we still got 717 + complaints because commands like: 708 718 709 - update-index needs to be updated to match, and status and maybe clean 710 - also need to be updated to specially handle untracked paths. 719 + git add <file-outside-sparse-specification> 720 + git rm <file-outside-sparse-specification> 711 721 712 - There may be a difference in here between behavior A and behavior B in 713 - terms of verboseness of errors or additional warnings. 722 + would silently do nothing. We should instead print an error in those 723 + cases to get usability right. 724 + 725 + update-index needs to be updated to match, and status and maybe clean 726 + also need to be updated to specially handle untracked paths. 727 + 728 + There may be a difference in here between behavior A and behavior B in 729 + terms of verboseness of errors or additional warnings. 730 + .... 714 731 715 732 * Commands falling under "restrict or no restrict dependent upon behavior 716 733 A vs. behavior B" 717 734 718 - * diff (with --cached or REVISION arguments) 719 - * grep (with --cached or REVISION arguments) 720 - * show (when given commit arguments) 721 - * blame (only matters when one or more -C flags passed) 722 - * and annotate 723 - * log 724 - * and variants: shortlog, gitk, show-branch, whatchanged, rev-list 725 - * ls-files 726 - * diff-index 727 - * diff-tree 728 - * ls-tree 735 + ** diff (with --cached or REVISION arguments) 736 + ** grep (with --cached or REVISION arguments) 737 + ** show (when given commit arguments) 738 + ** blame (only matters when one or more -C flags passed) 739 + *** and annotate 740 + ** log 741 + *** and variants: shortlog, gitk, show-branch, whatchanged, rev-list 742 + ** ls-files 743 + ** diff-index 744 + ** diff-tree 745 + ** ls-tree 729 746 730 747 For now, we default to behavior B for these, which want a default of 731 748 "no restrict". ··· 749 766 implemented. 750 767 751 768 752 - === Sparse specification vs. sparsity patterns === 769 + == Sparse specification vs. sparsity patterns == 753 770 754 771 In a well-behaved situation, the sparse specification is given directly 755 772 by the $GIT_DIR/info/sparse-checkout file. However, it can transiently ··· 821 838 operate full-tree. 822 839 823 840 824 - === Implementation Questions === 841 + == Implementation Questions == 825 842 826 - * Do the options --scope={sparse,all} sound good to others? Are there better 827 - options? 828 - * Names in use, or appearing in patches, or previously suggested: 829 - * --sparse/--dense 830 - * --ignore-skip-worktree-bits 831 - * --ignore-skip-worktree-entries 832 - * --ignore-sparsity 833 - * --[no-]restrict-to-sparse-paths 834 - * --full-tree/--sparse-tree 835 - * --[no-]restrict 836 - * --scope={sparse,all} 837 - * --focus/--unfocus 838 - * --limit/--unlimited 839 - * Rationale making me lean slightly towards --scope={sparse,all}: 840 - * We want a name that works for many commands, so we need a name that 843 + * Do the options --scope={sparse,all} sound good to others? Are there better options? 844 + 845 + ** Names in use, or appearing in patches, or previously suggested: 846 + 847 + *** --sparse/--dense 848 + *** --ignore-skip-worktree-bits 849 + *** --ignore-skip-worktree-entries 850 + *** --ignore-sparsity 851 + *** --[no-]restrict-to-sparse-paths 852 + *** --full-tree/--sparse-tree 853 + *** --[no-]restrict 854 + *** --scope={sparse,all} 855 + *** --focus/--unfocus 856 + *** --limit/--unlimited 857 + 858 + ** Rationale making me lean slightly towards --scope={sparse,all}: 859 + 860 + *** We want a name that works for many commands, so we need a name that 841 861 does not conflict 842 - * We know that we have more than two possible usecases, so it is best 862 + *** We know that we have more than two possible usecases, so it is best 843 863 to avoid a flag that appears to be binary. 844 - * --scope={sparse,all} isn't overly long and seems relatively 864 + *** --scope={sparse,all} isn't overly long and seems relatively 845 865 explanatory 846 - * `--sparse`, as used in add/rm/mv, is totally backwards for 866 + *** `--sparse`, as used in add/rm/mv, is totally backwards for 847 867 grep/log/etc. Changing the meaning of `--sparse` for these 848 868 commands would fix the backwardness, but possibly break existing 849 869 scripts. Using a new name pairing would allow us to treat 850 870 `--sparse` in these commands as a deprecated alias. 851 - * There is a different `--sparse`/`--dense` pair for commands using 871 + *** There is a different `--sparse`/`--dense` pair for commands using 852 872 revision machinery, so using that naming might cause confusion 853 - * There is also a `--sparse` in both pack-objects and show-branch, which 873 + *** There is also a `--sparse` in both pack-objects and show-branch, which 854 874 don't conflict but do suggest that `--sparse` is overloaded 855 - * The name --ignore-skip-worktree-bits is a double negative, is 875 + *** The name --ignore-skip-worktree-bits is a double negative, is 856 876 quite a mouthful, refers to an implementation detail that many 857 877 users may not be familiar with, and we'd need a negation for it 858 878 which would probably be even more ridiculously long. (But we 859 879 can make --ignore-skip-worktree-bits a deprecated alias for 860 880 --no-restrict.) 861 881 862 - * If a config option is added (sparse.scope?) what should the values and 882 + ** If a config option is added (sparse.scope?) what should the values and 863 883 description be? "sparse" (behavior A), "worktree-sparse-history-dense" 864 884 (behavior B), "dense" (behavior C)? There's a risk of confusion, 865 885 because even for Behaviors A and B we want some commands to be ··· 868 888 the primary difference we are focusing is just the history-querying 869 889 commands (log/diff/grep). Previous config suggestion here: [13] 870 890 871 - * Is `--no-expand` a good alias for ls-files's `--sparse` option? 891 + ** Is `--no-expand` a good alias for ls-files's `--sparse` option? 872 892 (`--sparse` does not map to either `--scope=sparse` or `--scope=all`, 873 893 because in non-cone mode it does nothing and in cone-mode it shows the 874 894 sparse directory entries which are technically outside the sparse 875 895 specification) 876 896 877 - * Under Behavior A: 878 - * Does ls-files' `--no-expand` override the default `--scope=all`, or 897 + ** Under Behavior A: 898 + 899 + *** Does ls-files' `--no-expand` override the default `--scope=all`, or 879 900 does it need an extra flag? 880 - * Does ls-files' `-t` option imply `--scope=all`? 881 - * Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? 901 + *** Does ls-files' `-t` option imply `--scope=all`? 902 + *** Does update-index's `--[no-]skip-worktree` option imply `--scope=all`? 882 903 883 - * sparse-checkout: once behavior A is fully implemented, should we take 904 + ** sparse-checkout: once behavior A is fully implemented, should we take 884 905 an interim measure to ease people into switching the default? Namely, 885 906 if folks are not already in a sparse checkout, then require 886 907 `sparse-checkout init/set` to take a ··· 892 913 is seamless for them. 893 914 894 915 895 - === Implementation Goals/Plans === 916 + == Implementation Goals/Plans == 896 917 897 918 * Get buy-in on this document in general. 898 919 ··· 910 931 request that they not trigger this bug." flag 911 932 912 933 * Flags & Config 913 - * Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` 914 - * Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore 934 + 935 + ** Make `--sparse` in add/rm/mv a deprecated alias for `--scope=all` 936 + ** Make `--ignore-skip-worktree-bits` in checkout-index/checkout/restore 915 937 a deprecated aliases for `--scope=all` 916 - * Create config option (sparse.scope?), tie it to the "Cliff notes" 938 + ** Create config option (sparse.scope?), tie it to the "Cliff notes" 917 939 overview 918 940 919 - * Add --scope=sparse (and --scope=all) flag to each of the history querying 941 + ** Add --scope=sparse (and --scope=all) flag to each of the history querying 920 942 commands. IMPORTANT: make sure diff machinery changes don't mess with 921 943 format-patch, fast-export, etc. 922 944 923 - === Known bugs === 945 + == Known bugs == 924 946 925 947 This list used to be a lot longer (see e.g. [1,2,3,4,5,6,7,8,9]), but we've 926 948 been working on it. 927 949 928 - 0. Behavior A is not well supported in Git. (Behavior B didn't used to 950 + 1. Behavior A is not well supported in Git. (Behavior B didn't used to 929 951 be either, but was the easier of the two to implement.) 930 952 931 - 1. am and apply: 953 + 2. am and apply: 932 954 933 955 apply, without `--index` or `--cached`, relies on files being present 934 956 in the working copy, and also writes to them unconditionally. As ··· 948 970 files and then complain that those vivified files would be 949 971 overwritten by merge. 950 972 951 - 2. reset --hard: 973 + 3. reset --hard: 952 974 953 975 reset --hard provides confusing error message (works correctly, but 954 976 misleads the user into believing it didn't): ··· 971 993 `git reset --hard` DID remove addme from the index and the working tree, contrary 972 994 to the error message, but in line with how reset --hard should behave. 973 995 974 - 3. read-tree 996 + 4. read-tree 975 997 976 998 `read-tree` doesn't apply the 'SKIP_WORKTREE' bit to *any* of the 977 999 entries it reads into the index, resulting in all your files suddenly 978 1000 appearing to be "deleted". 979 1001 980 - 4. Checkout, restore: 1002 + 5. Checkout, restore: 981 1003 982 1004 These command do not handle path & revision arguments appropriately: 983 1005 ··· 1030 1052 S tracked 1031 1053 H tracked-but-maybe-skipped 1032 1054 1033 - 5. checkout and restore --staged, continued: 1055 + 6. checkout and restore --staged, continued: 1034 1056 1035 1057 These commands do not correctly scope operations to the sparse 1036 1058 specification, and make it worse by not setting important SKIP_WORKTREE ··· 1046 1068 the sparse specification, but then it will be important to set the 1047 1069 SKIP_WORKTREE bits appropriately. 1048 1070 1049 - 6. Performance issues; see: 1050 - https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ 1071 + 7. Performance issues; see: 1051 1072 1073 + https://lore.kernel.org/git/CABPp-BEkJQoKZsQGCYioyga_uoDQ6iBeW+FKr8JhyuuTMK1RDw@mail.gmail.com/ 1052 1074 1053 - === Reference Emails === 1075 + 1076 + == Reference Emails == 1054 1077 1055 1078 Emails that detail various bugs we've had in sparse-checkout: 1056 1079 1057 - [1] (Original descriptions of behavior A & behavior B) 1058 - https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ 1059 - [2] (Fix stash applications in sparse checkouts; bugs from behavioral differences) 1060 - https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ 1061 - [3] (Present-despite-skipped entries) 1062 - https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ 1063 - [4] (Clone --no-checkout interaction) 1064 - https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) 1065 - [5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`) 1066 - https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ 1067 - [6] (SKIP_WORKTREE is advisory, not mandatory) 1068 - https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ 1069 - [7] (`worktree add` should copy sparsity settings from current worktree) 1070 - https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ 1071 - [8] (Avoid negative surprises in add, rm, and mv) 1072 - https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ 1073 - https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ 1074 - [9] (Move from out-of-cone to in-cone) 1075 - https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ 1076 - https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ 1077 - [10] (Unnecessarily downloading objects outside sparse specification) 1078 - https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ 1080 + [1] (Original descriptions of behavior A & behavior B): 1081 + 1082 + https://lore.kernel.org/git/CABPp-BGJ_Nvi5TmgriD9Bh6eNXE2EDq2f8e8QKXAeYG3BxZafA@mail.gmail.com/ 1083 + 1084 + [2] (Fix stash applications in sparse checkouts; bugs from behavioral differences): 1085 + 1086 + https://lore.kernel.org/git/ccfedc7140dbf63ba26a15f93bd3885180b26517.1606861519.git.gitgitgadget@gmail.com/ 1087 + 1088 + [3] (Present-despite-skipped entries): 1089 + 1090 + https://lore.kernel.org/git/11d46a399d26c913787b704d2b7169cafc28d639.1642175983.git.gitgitgadget@gmail.com/ 1079 1091 1080 - [11] (Stolee's comments on high-level usecases) 1081 - https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ 1092 + [4] (Clone --no-checkout interaction): 1093 + 1094 + https://lore.kernel.org/git/pull.801.v2.git.git.1591324899170.gitgitgadget@gmail.com/ (clone --no-checkout) 1095 + 1096 + [5] (The need for update_sparsity() and avoiding `read-tree -mu HEAD`): 1097 + 1098 + https://lore.kernel.org/git/3a1f084641eb47515b5a41ed4409a36128913309.1585270142.git.gitgitgadget@gmail.com/ 1099 + 1100 + [6] (SKIP_WORKTREE is advisory, not mandatory): 1101 + 1102 + https://lore.kernel.org/git/844306c3e86ef67591cc086decb2b760e7d710a3.1585270142.git.gitgitgadget@gmail.com/ 1103 + 1104 + [7] (`worktree add` should copy sparsity settings from current worktree): 1105 + 1106 + https://lore.kernel.org/git/c51cb3714e7b1d2f8c9370fe87eca9984ff4859f.1644269584.git.gitgitgadget@gmail.com/ 1107 + 1108 + [8] (Avoid negative surprises in add, rm, and mv): 1109 + 1110 + * https://lore.kernel.org/git/cover.1617914011.git.matheus.bernardino@usp.br/ 1111 + * https://lore.kernel.org/git/pull.1018.v4.git.1632497954.gitgitgadget@gmail.com/ 1112 + 1113 + [9] (Move from out-of-cone to in-cone): 1114 + 1115 + * https://lore.kernel.org/git/20220630023737.473690-6-shaoxuan.yuan02@gmail.com/ 1116 + * https://lore.kernel.org/git/20220630023737.473690-4-shaoxuan.yuan02@gmail.com/ 1117 + 1118 + [10] (Unnecessarily downloading objects outside sparse specification): 1119 + 1120 + https://lore.kernel.org/git/CAOLTT8QfwOi9yx_qZZgyGa8iL8kHWutEED7ok_jxwTcYT_hf9Q@mail.gmail.com/ 1121 + 1122 + [11] (Stolee's comments on high-level usecases): 1123 + 1124 + https://lore.kernel.org/git/1a1e33f6-3514-9afc-0a28-5a6b85bd8014@gmail.com/ 1082 1125 1083 1126 [12] Others commenting on eventually switching default to behavior A: 1127 + 1084 1128 * https://lore.kernel.org/git/xmqqh719pcoo.fsf@gitster.g/ 1085 1129 * https://lore.kernel.org/git/xmqqzgeqw0sy.fsf@gitster.g/ 1086 1130 * https://lore.kernel.org/git/a86af661-cf58-a4e5-0214-a67d3a794d7e@github.com/ 1087 1131 1088 - [13] Previous config name suggestion and description 1089 - * https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ 1132 + [13] Previous config name suggestion and description: 1133 + 1134 + https://lore.kernel.org/git/CABPp-BE6zW0nJSStcVU=_DoDBnPgLqOR8pkTXK3dW11=T01OhA@mail.gmail.com/ 1090 1135 1091 1136 [14] Tangential issue: switch to cone mode as default sparse specification mechanism: 1092 - https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ 1137 + 1138 + https://lore.kernel.org/git/a1b68fd6126eb341ef3637bb93fedad4309b36d0.1650594746.git.gitgitgadget@gmail.com/ 1093 1139 1094 1140 [15] Lengthy email on grep behavior, covering what should be searched: 1095 - * https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ 1141 + 1142 + https://lore.kernel.org/git/CABPp-BGVO3QdbfE84uF_3QDF0-y2iHHh6G5FAFzNRfeRitkuHw@mail.gmail.com/ 1096 1143 1097 1144 [16] Email explaining sparsity patterns vs. SKIP_WORKTREE and history operations, 1098 1145 search for the parenthetical comment starting "We do not check". 1099 - https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ 1146 + 1147 + https://lore.kernel.org/git/CABPp-BFsCPPNOZ92JQRJeGyNd0e-TCW-LcLyr0i_+VSQJP+GCg@mail.gmail.com/ 1100 1148 1101 1149 [17] https://lore.kernel.org/git/20220207190320.2960362-1-jonathantanmy@google.com/