Git fork
at reftables-rust 707 lines 30 kB view raw
1Rebases and cherry-picks involve a sequence of merges whose results are 2recorded as new single-parent commits. The first parent side of those 3merges represent the "upstream" side, and often include a far larger set of 4changes than the second parent side. Traditionally, the renames on the 5first-parent side of that sequence of merges were repeatedly re-detected 6for every merge. This file explains why it is safe and effective during 7rebases and cherry-picks to remember renames on the upstream side of 8history as an optimization, assuming all merges are automatic and clean 9(i.e. no conflicts and not interrupted for user input or editing). 10 11Outline: 12 13 1. Assumptions 14 15 2. How rebasing and cherry-picking work 16 17 3. Why the renames on MERGE_SIDE1 in any given pick are *always* a 18 superset of the renames on MERGE_SIDE1 for the next pick. 19 20 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also 21 a rename on MERGE_SIDE1 for the next pick 22 23 5. A detailed description of the counter-examples to #4. 24 25 6. Why the special cases in #5 are still fully reasonable to use to pair 26 up files for three-way content merging in the merge machinery, and why 27 they do not affect the correctness of the merge. 28 29 7. Interaction with skipping of "irrelevant" renames 30 31 8. Additional items that need to be cached 32 33 9. How directory rename detection interacts with the above and why this 34 optimization is still safe even if merge.directoryRenames is set to 35 "true". 36 37 38== 1. Assumptions == 39 40There are two assumptions that will hold throughout this document: 41 42 * The upstream side where commits are transplanted to is treated as the 43 first parent side when rebase/cherry-pick call the merge machinery 44 45 * All merges are fully automatic 46 47and a third that will hold in sections 3-6 for simplicity, that I'll later 48address in section 9: 49 50 * No directory renames occur 51 52 53Let me explain more about each assumption and why I include it: 54 55 56The first assumption is merely for the purposes of making this document 57clearer; the optimization implementation does not actually depend upon it. 58However, the assumption does hold in all cases because it reflects the way 59that both rebase and cherry-pick were implemented; and the implementation 60of cherry-pick and rebase are not readily changeable for backwards 61compatibility reasons (see for example the discussion of the --ours and 62--theirs flag in the documentation of `git checkout`, particularly the 63comments about how they behave with rebase). The optimization avoids 64checking first-parent-ness, though. It checks the conditions that make the 65optimization valid instead, so it would still continue working if someone 66changed the parent ordering that cherry-pick and rebase use. But making 67this assumption does make this document much clearer and prevents me from 68having to repeat every example twice. 69 70If the second assumption is violated, then the optimization simply is 71turned off and thus isn't relevant to consider. The second assumption can 72also be stated as "there is no interruption for a user to resolve conflicts 73or to just further edit or tweak files". While real rebases and 74cherry-picks are often interrupted (either because it's an interactive 75rebase where the user requested to stop and edit, or because there were 76conflicts that the user needs to resolve), the cache of renames is not 77stored on disk, and thus is thrown away as soon as the rebase or cherry 78pick stops for the user to resolve the operation. 79 80The third assumption makes sections 3-6 simpler, and allows people to 81understand the basics of why this optimization is safe and effective, and 82then I can go back and address the specifics in section 9. It is probably 83also worth noting that if directory renames do occur, then the default of 84merge.directoryRenames being set to "conflict" means that the operation 85will stop for users to resolve the conflicts and the cache will be thrown 86away, and thus that there won't be an optimization to apply. So, the only 87reason we need to address directory renames specifically, is that some 88users will have set merge.directoryRenames to "true" to allow the merges to 89continue to proceed automatically. The optimization is still safe with 90this config setting, but we have to discuss a few more cases to show why; 91this discussion is deferred until section 9. 92 93 94== 2. How rebasing and cherry-picking work == 95 96Consider the following setup (from the git-rebase manpage): 97 98------------ 99 A---B---C topic 100 / 101 D---E---F---G main 102------------ 103 104After rebasing or cherry-picking topic onto main, this will appear as: 105 106------------ 107 A'--B'--C' topic 108 / 109 D---E---F---G main 110------------ 111 112The way the commits A', B', and C' are created is through a series of 113merges, where rebase or cherry-pick sequentially uses each of the three 114A-B-C commits in a special merge operation. Let's label the three commits 115in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2. For 116this picture, the three commits for each of the three merges would be: 117 118.... 119To create A': 120 MERGE_BASE: E 121 MERGE_SIDE1: G 122 MERGE_SIDE2: A 123 124To create B': 125 MERGE_BASE: A 126 MERGE_SIDE1: A' 127 MERGE_SIDE2: B 128 129To create C': 130 MERGE_BASE: B 131 MERGE_SIDE1: B' 132 MERGE_SIDE2: C 133.... 134 135Sometimes, folks are surprised that these three-way merges are done. It 136can be useful in understanding these three-way merges to view them in a 137slightly different light. For example, in creating C', you can view it as 138either: 139 140 * Apply the changes between B & C to B' 141 * Apply the changes between B & B' to C 142 143Conceptually the two statements above are the same as a three-way merge of 144B, B', and C, at least the parts before you decide to record a commit. 145 146 147== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. == 148 149The merge machinery uses the filenames it is fed from MERGE_BASE, 150MERGE_SIDE1, and MERGE_SIDE2. It will only move content to a different 151filename under one of three conditions: 152 153 * To make both pieces of a conflict available to a user during conflict 154 resolution (examples: directory/file conflict, add/add type conflict 155 such as symlink vs. regular file) 156 157 * When MERGE_SIDE1 renames the file. 158 159 * When MERGE_SIDE2 renames the file. 160 161First, let's remember what commits are involved in the first and second 162picks of the cherry-pick or rebase sequence: 163 164.... 165To create A': 166 MERGE_BASE: E 167 MERGE_SIDE1: G 168 MERGE_SIDE2: A 169 170To create B': 171 MERGE_BASE: A 172 MERGE_SIDE1: A' 173 MERGE_SIDE2: B 174.... 175 176So, in particular, we need to show that the renames between E and G are a 177superset of those between A and A'. 178 179A' is created by the first merge. A' will only have renames for one of the 180three reasons listed above. The first case, a conflict, results in a 181situation where the cache is dropped and thus this optimization doesn't 182take effect, so we need not consider that case. The third case, a rename 183on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up 184in A -- therefore when diffing A and A' that path does not show up as a 185rename. The only remaining way for renames to show up in A' is for the 186rename to come from MERGE_SIDE1. Therefore, all renames between A and A' 187are a subset of those between E and G. Equivalently, all renames between E 188and G are a superset of those between A and A'. 189 190 191== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. == 192 193Let's again look at the first two picks: 194 195.... 196To create A': 197 MERGE_BASE: E 198 MERGE_SIDE1: G 199 MERGE_SIDE2: A 200 201To create B': 202 MERGE_BASE: A 203 MERGE_SIDE1: A' 204 MERGE_SIDE2: B 205.... 206 207Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e. 208any given rename from E to G. Let's use the filenames 'oldfile' and 209'newfile' for demonstration purposes. That first pick will function as 210follows; when the rename is detected, the merge machinery will do a 211three-way content merge of the following: 212 213.... 214 E:oldfile 215 G:newfile 216 A:oldfile 217.... 218 219and produce a new result: 220 221.... 222 A':newfile 223.... 224 225Note above that I've assumed that E->A did not rename oldfile. If that 226side did rename, then we most likely have a rename/rename(1to2) conflict 227that will cause the rebase or cherry-pick operation to halt and drop the 228in-memory cache of renames and thus doesn't need to be considered further. 229In the special case that E->A does rename the file but also renames it to 230newfile, then there is no conflict from the renaming and the merge can 231succeed. In this special case, the rename is not valid to cache because 232the second merge will find A:newfile in the MERGE_BASE (see also the new 233testcases in t6429 with "rename same file identically" in their 234description). So a rename/rename(1to1) needs to be specially handled by 235pruning renames from the cache and decrementing the dir_rename_counts in 236the current and leading directories associated with those renames. Or, 237since these are really rare, one could just take the easy way out and 238disable the remembering renames optimization when a rename/rename(1to1) 239happens. 240 241The previous paragraph handled the cases for E->A renaming oldfile, let's 242continue assuming that oldfile is not renamed in A. 243 244As per the diagram for creating B', MERGE_SIDE1 involves the changes from A 245to A'. So, we are curious whether A:oldfile and A':newfile will be viewed 246as renames. Note that: 247 248 * There will be no A':oldfile (because there could not have been a 249 G:oldfile as we do not do break detection in the merge machinery and 250 G:newfile was detected as a rename, and by the construction of the 251 rename above that merged cleanly, the merge machinery will ensure there 252 is no 'oldfile' in the result). 253 254 * There will be no A:newfile (if there had been, we would have had a 255 rename/add conflict). 256 257 * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a 258 clean three-way content merge involving A:oldfile). 259 260We can also expound on the third point above, by noting that three-way 261content merges can also be viewed as applying the differences between the 262base and one side to the other side. Thus we can view A':newfile as 263having been created by taking the changes between E:oldfile and G:newfile 264(which were detected as being related, i.e. <50% changed) to A:oldfile. 265 266Thus A:oldfile and A':newfile are just as related as E:oldfile and 267G:newfile are -- they have exactly identical differences. Since the latter 268were detected as renames, A:oldfile and A':newfile should also be 269detectable as renames almost always. 270 271 272== 5. A detailed description of the counter-examples to #4. == 273 274We already noted in section 4 that rename/rename(1to1) (i.e. both sides 275renaming a file the same way) was one counter-example. The more 276interesting bit, though, is why did we need to use the "almost" qualifier 277when stating that A:oldfile and A':newfile are "almost" always detectable 278as renames? 279 280Let's repeat an earlier point that section 4 made: 281 282.... 283 A':newfile was created by applying the changes between E:oldfile and 284 G:newfile to A:oldfile. The changes between E:oldfile and G:newfile were 285 <50% of the size of E:oldfile. 286.... 287 288If those changes that were <50% of the size of E:oldfile are also <50% of 289the size of A:oldfile, then A:oldfile and A':newfile will be detectable as 290renames. However, if there is a dramatic size reduction between E:oldfile 291and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile 292still somehow merge cleanly), then traditional rename detection would not 293detect A:oldfile and A':newfile as renames. 294 295Here's an example where that can happen: 296 297 * E:oldfile had 20 lines 298 * G:newfile added 10 new lines at the beginning of the file 299 * A:oldfile kept the first 3 lines of the file, and deleted all the rest 300 301then 302 303.... 304 => A':newfile would have 13 lines, 3 of which matches those in A:oldfile. 305 E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and 306 A':newfile would not be. 307.... 308 309 310== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. == 311 312In the rename/rename(1to1) case, A:newfile and A':newfile are not renames 313since they use the *same* filename. However, files with the same filename 314are obviously fine to pair up for three-way content merging (the merge 315machinery has never employed break detection). The interesting 316counter-example case is thus not the rename/rename(1to1) case, but the case 317where A did not rename oldfile. That was the case that we spent most of 318the time discussing in sections 4 and 5. The remainder of this section 319will be devoted to that case as well. 320 321So, even if A:oldfile and A':newfile aren't detectable as renames, why is 322it still reasonable to pair them up for three-way content merging in the 323merge machinery? There are multiple reasons: 324 325 * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile 326 is *exactly* the same as the diff between E:oldfile and G:newfile. The 327 latter pair were detected as renames, so it seems unlikely to surprise 328 users for us to treat A:oldfile and A':newfile as renames. 329 330 * In fact, "oldfile" and "newfile" were at one point detected as renames 331 due to how they were constructed in the E..G chain. And we used that 332 information once already in this rebase/cherry-pick. I think users 333 would be unlikely to be surprised at us continuing to treat the files 334 as renames and would quickly understand why we had done so. 335 336 * Marking or declaring files as renames is *not* the end goal for merges. 337 Merges use renames to determine which files make sense to be paired up 338 for three-way content merges. 339 340 * A:oldfile and A':newfile were _already_ paired up in a three-way 341 content merge; that is how A':newfile was created. In fact, that 342 three-way content merge was clean. So using them again in a later 343 three-way content merge seems very reasonable. 344 345However, the above is focusing on the common scenarios. Let's try to look 346at all possible unusual scenarios and compare without the optimization to 347with the optimization. Consider the following theoretical cases; we will 348then dive into each to determine which of them are possible, 349and if so, what they mean: 350 351 1. Without the optimization, the second merge results in a conflict. 352 With the optimization, the second merge also results in a conflict. 353 Questions: Are the conflicts confusingly different? Better in one case? 354 355 2. Without the optimization, the second merge results in NO conflict. 356 With the optimization, the second merge also results in NO conflict. 357 Questions: Are the merges the same? 358 359 3. Without the optimization, the second merge results in a conflict. 360 With the optimization, the second merge results in NO conflict. 361 Questions: Possible? Bug, bugfix, or something else? 362 363 4. Without the optimization, the second merge results in NO conflict. 364 With the optimization, the second merge results in a conflict. 365 Questions: Possible? Bug, bugfix, or something else? 366 367I'll consider all four cases, but out of order. 368 369The fourth case is impossible. For the code without the remembering 370renames optimization to not get a conflict, B:oldfile would need to exactly 371match A:oldfile -- if it doesn't, there would be a modify/delete conflict. 372If A:oldfile matches B:oldfile exactly, then a three-way content merge 373between A:oldfile, A':newfile, and B:oldfile would have no conflict and 374just give us the version of newfile from A' as the result. 375 376From the same logic as the above paragraph, the second case would indeed 377result in identical merges. When A:oldfile exactly matches B:oldfile, an 378undetected rename would say, "Oh, I see one side didn't modify 'oldfile' 379and the other side deleted it. I'll delete it. And I see you have this 380brand new file named 'newfile' in A', so I'll keep it." That gives the 381same results as three-way content merging A:oldfile, A':newfile, and 382B:oldfile -- a removal of oldfile with the version of newfile from A' 383showing up in the result. 384 385The third case is interesting. It means that A:oldfile and A':newfile were 386not just similar enough, but that the changes between them did not conflict 387with the changes between A:oldfile and B:oldfile. This would validate our 388hunch that the files were similar enough to be used in a three-way content 389merge, and thus seems entirely correct for us to have used them that way. 390(Sidenote: One particular example here may be enlightening. Let's say that 391B was an immediate revert of A. B clearly would have been a clean revert 392of A, since A was B's immediate parent. One would assume that if you can 393pick a commit, you should also be able to cherry-pick its immediate revert. 394However, this is one of those funny corner cases; without this 395optimization, we just successfully picked a commit cleanly, but we are 396unable to cherry-pick its immediate revert due to the size differences 397between E:oldfile and A:oldfile.) 398 399That leaves only the first case to consider -- when we get conflicts both 400with or without the optimization. Without the optimization, we'll have a 401modify/delete conflict, where both A':newfile and B:oldfile are left in the 402tree for the user to deal with and no hints about the potential similarity 403between the two. With the optimization, we'll have a three-way content 404merged A:oldfile, A':newfile, and B:oldfile with conflict markers 405suggesting we thought the files were related but giving the user the chance 406to resolve. As noted above, I don't think users will find us treating 407'oldfile' and 'newfile' as related as a surprise since they were between E 408and G. In any event, though, this case shouldn't be concerning since we 409hit a conflict in both cases, told the user what we know, and asked them to 410resolve it. 411 412So, in summary, case 4 is impossible, case 2 yields the same behavior, and 413cases 1 and 3 seem to provide as good or better behavior with the 414optimization than without. 415 416 417== 7. Interaction with skipping of "irrelevant" renames == 418 419Previous optimizations involved skipping rename detection for paths 420considered to be "irrelevant". See for example the following commits: 421 422 * 32a56dfb99 ("merge-ort: precompute subset of sources for which we 423 need rename detection", 2021-03-11) 424 * 2fd9eda462 ("merge-ort: precompute whether directory rename 425 detection is needed", 2021-03-11) 426 * 9bd342137e ("diffcore-rename: determine which relevant_sources are 427 no longer relevant", 2021-03-13) 428 429Relevance is always determined by what the _other_ side of history has 430done, in terms of modifying a file that our side renamed, or adding a 431file to a directory which our side renamed. This means that a path 432that is "irrelevant" when picking the first commit of a series in a 433rebase or cherry-pick, may suddenly become "relevant" when picking the 434next commit. 435 436The upshot of this is that we can only cache rename detection results 437for relevant paths, and need to re-check relevance in subsequent 438commits. If those subsequent commits have additional paths that are 439relevant for rename detection, then we will need to redo rename 440detection -- though we can limit it to the paths for which we have not 441already detected renames. 442 443 444== 8. Additional items that need to be cached == 445 446It turns out we have to cache more than just renames; we also cache: 447 448.... 449 A) non-renames (i.e. unpaired deletes) 450 B) counts of renames within directories 451 C) sources that were marked as RELEVANT_LOCATION, but which were 452 downgraded to RELEVANT_NO_MORE 453 D) the toplevel trees involved in the merge 454.... 455 456These are all stored in struct rename_info, and respectively appear in 457 458 * cached_pairs (along side actual renames, just with a value of NULL) 459 * dir_rename_counts 460 * cached_irrelevant 461 * merge_trees 462 463The reason for `(A)` comes from the irrelevant renames skipping 464optimization discussed in section 7. The fact that irrelevant renames 465are skipped means we only get a subset of the potential renames 466detected and subsequent commits may need to run rename detection on 467the upstream side on a subset of the remaining renames (to get the 468renames that are relevant for that later commit). Since unpaired 469deletes are involved in rename detection too, we don't want to 470repeatedly check that those paths remain unpaired on the upstream side 471with every commit we are transplanting. 472 473The reason for `(B)` is that diffcore_rename_extended() is what 474generates the counts of renames by directory which is needed in 475directory rename detection, and if we don't run 476diffcore_rename_extended() again then we need to have the output from 477it, including dir_rename_counts, from the previous run. 478 479The reason for `(C)` is that merge-ort's tree traversal will again think 480those paths are relevant (marking them as RELEVANT_LOCATION), but the 481fact that they were downgraded to RELEVANT_NO_MORE means that 482dir_rename_counts already has the information we need for directory 483rename detection. (A path which becomes RELEVANT_CONTENT in a 484subsequent commit will be removed from cached_irrelevant.) 485 486The reason for `(D)` is that is how we determine whether the remember 487renames optimization can be used. In particular, remembering that our 488sequence of merges looks like: 489 490.... 491 Merge 1: 492 MERGE_BASE: E 493 MERGE_SIDE1: G 494 MERGE_SIDE2: A 495 => Creates A' 496 497 Merge 2: 498 MERGE_BASE: A 499 MERGE_SIDE1: A' 500 MERGE_SIDE2: B 501 => Creates B' 502.... 503 504It is the fact that the trees A and A' appear both in Merge 1 and in 505Merge 2, with A as a parent of A' that allows this optimization. So 506we store the trees to compare with what we are asked to merge next 507time. 508 509 510== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". == 511 512As noted in the assumptions section: 513 514.... 515 """ 516 ...if directory renames do occur, then the default of 517 merge.directoryRenames being set to "conflict" means that the operation 518 will stop for users to resolve the conflicts and the cache will be 519 thrown away, and thus that there won't be an optimization to apply. 520 So, the only reason we need to address directory renames specifically, 521 is that some users will have set merge.directoryRenames to "true" to 522 allow the merges to continue to proceed automatically. 523 """ 524.... 525 526Let's remember that we need to look at how any given pick affects the next 527one. So let's again use the first two picks from the diagram in section 528one: 529 530.... 531 First pick does this three-way merge: 532 MERGE_BASE: E 533 MERGE_SIDE1: G 534 MERGE_SIDE2: A 535 => creates A' 536 537 Second pick does this three-way merge: 538 MERGE_BASE: A 539 MERGE_SIDE1: A' 540 MERGE_SIDE2: B 541 => creates B' 542.... 543 544Now, directory rename detection exists so that if one side of history 545renames a directory, and the other side adds a new file to the old 546directory, then the merge (with merge.directoryRenames=true) can move the 547file into the new directory. There are two qualitatively different ways to 548add a new file to an old directory: create a new file, or rename a file 549into that directory. Also, directory renames can be done on either side of 550history, so there are four cases to consider: 551 552 * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir 553 * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir 554 * MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir 555 * MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir 556 557One last note before we consider these four cases: There are some 558important properties about how we implement this optimization with 559respect to directory rename detection that we need to bear in mind 560while considering all of these cases: 561 562 * rename caching occurs *after* applying directory renames 563 564 * a rename created by directory rename detection is recorded for the side 565 of history that did the directory rename. 566 567 * dir_rename_counts, the nested map of 568 {oldname => {newname => count}}, 569 is cached between runs as well. This basically means that directory 570 rename detection is also cached, though only on the side of history 571 that we cache renames for (MERGE_SIDE1 as far as this document is 572 concerned; see the assumptions section). Two interesting sub-notes 573 about these counts: 574 575 ** If we need to perform rename-detection again on the given side (e.g. 576 some paths are relevant for rename detection that weren't before), 577 then we clear dir_rename_counts and recompute it, making use of 578 cached_pairs. The reason it is important to do this is optimizations 579 around RELEVANT_LOCATION exist to prevent us from computing 580 unnecessary renames for directory rename detection and from computing 581 dir_rename_counts for irrelevant directories; but those same renames 582 or directories may become necessary for subsequent merges. The 583 easiest way to "fix up" dir_rename_counts in such cases is to just 584 recompute it. 585 586 ** If we prune rename/rename(1to1) entries from the cache, then we also 587 need to update dir_rename_counts to decrement the counts for the 588 involved directory and any relevant parent directories (to undo what 589 update_dir_rename_counts() in diffcore-rename.c incremented when the 590 rename was initially found). If we instead just disable the 591 remembering renames optimization when the exceedingly rare 592 rename/rename(1to1) cases occur, then dir_rename_counts will get 593 re-computed the next time rename detection occurs, as noted above. 594 595 * the side with multiple commits to pick, is the side of history that we 596 do NOT cache renames for. Thus, there are no additional commits to 597 change the number of renames in a directory, except for those done by 598 directory rename detection (which always pad the majority). 599 600 * the "renames" we cache are modified slightly by any directory rename, 601 as noted below. 602 603Now, with those notes out of the way, let's go through the four cases 604in order: 605 606Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir 607 608.... 609 This case looks like this: 610 611 MERGE_BASE: E, Has olddir/ 612 MERGE_SIDE1: G, Renames olddir/ -> newdir/ 613 MERGE_SIDE2: A, Adds olddir/newfile 614 => creates A', With newdir/newfile 615 616 MERGE_BASE: A, Has olddir/newfile 617 MERGE_SIDE1: A', Has newdir/newfile 618 MERGE_SIDE2: B, Modifies olddir/newfile 619 => expected B', with threeway-merged newdir/newfile from above 620 621 In this case, with the optimization, note that after the first commit: 622 * MERGE_SIDE1 remembers olddir/ -> newdir/ 623 * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile 624 Given the cached rename noted above, the second merge can proceed as 625 expected without needing to perform rename detection from A -> A'. 626.... 627 628Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames file into old dir 629 630.... 631 This case looks like this: 632 633 MERGE_BASE: E oldfile, olddir/ 634 MERGE_SIDE1: G oldfile, olddir/ -> newdir/ 635 MERGE_SIDE2: A oldfile -> olddir/newfile 636 => creates A', With newdir/newfile representing original oldfile 637 638 MERGE_BASE: A olddir/newfile 639 MERGE_SIDE1: A' newdir/newfile 640 MERGE_SIDE2: B modify olddir/newfile 641 => expected B', with threeway-merged newdir/newfile from above 642 643 In this case, with the optimization, note that after the first commit: 644 * MERGE_SIDE1 remembers olddir/ -> newdir/ 645 * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile 646 (NOT oldfile -> newdir/newfile; compare to case with 647 (p->status == 'R' && new_path) in possibly_cache_new_pair()) 648 649 Given the cached rename noted above, the second merge can proceed as 650 expected without needing to perform rename detection from A -> A'. 651.... 652 653Case 3: MERGE_SIDE1 adds new file to old dir, MERGE_SIDE2 renames old dir 654 655.... 656 This case looks like this: 657 658 MERGE_BASE: E, Has olddir/ 659 MERGE_SIDE1: G, Adds olddir/newfile 660 MERGE_SIDE2: A, Renames olddir/ -> newdir/ 661 => creates A', With newdir/newfile 662 663 MERGE_BASE: A, Has newdir/, but no notion of newdir/newfile 664 MERGE_SIDE1: A', Has newdir/newfile 665 MERGE_SIDE2: B, Has newdir/, but no notion of newdir/newfile 666 => expected B', with newdir/newfile from A' 667 668 In this case, with the optimization, note that after the first commit there 669 were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed. 670 But the second merge didn't need any renames so this is fine. 671.... 672 673Case 4: MERGE_SIDE1 renames file into old dir, MERGE_SIDE2 renames old dir 674 675.... 676 This case looks like this: 677 678 MERGE_BASE: E, Has olddir/ 679 MERGE_SIDE1: G, Renames oldfile -> olddir/newfile 680 MERGE_SIDE2: A, Renames olddir/ -> newdir/ 681 => creates A', With newdir/newfile representing original oldfile 682 683 MERGE_BASE: A, Has oldfile 684 MERGE_SIDE1: A', Has newdir/newfile 685 MERGE_SIDE2: B, Modifies oldfile 686 => expected B', with threeway-merged newdir/newfile from above 687 688 In this case, with the optimization, note that after the first commit: 689 * MERGE_SIDE1 remembers oldfile -> newdir/newfile 690 (NOT oldfile -> olddir/newfile; compare to case of second 691 block under p->status == 'R' in possibly_cache_new_pair()) 692 * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered 693 694 Given the cached rename noted above, the second merge can proceed as 695 expected without needing to perform rename detection from A -> A'. 696.... 697 698Finally, I'll just note here that interactions with the 699skip-irrelevant-renames optimization means we sometimes don't detect 700renames for any files within a directory that was renamed, in which 701case we will not have been able to detect any rename for the directory 702itself. In such a case, we do not know whether the directory was 703renamed; we want to be careful to avoid caching some kind of "this 704directory was not renamed" statement. If we did, then a subsequent 705commit being rebased could add a file to the old directory, and the 706user would expect it to end up in the correct directory -- something 707our erroneous "this directory was not renamed" cache would preclude.