Documentation/technical/remembering-renames.adoc at reftables-rust

freshlybakedca.ke / git
fork atom
Git fork
fork atom
git / Documentation / technical / remembering-renames.adoc
at reftables-rust 707 lines 30 kB view raw
wrap content
Ramsay Jones doc: remembering-renames.adoc: fix asciidoc warnings 4mo ago
4fa0e4d0
  1Rebases and cherry-picks involve a sequence of merges whose results are
  2recorded as new single-parent commits.  The first parent side of those
  3merges represent the "upstream" side, and often include a far larger set of
  4changes than the second parent side.  Traditionally, the renames on the
  5first-parent side of that sequence of merges were repeatedly re-detected
  6for every merge.  This file explains why it is safe and effective during
  7rebases and cherry-picks to remember renames on the upstream side of
  8history as an optimization, assuming all merges are automatic and clean
  9(i.e. no conflicts and not interrupted for user input or editing).
 10
 11Outline:
 12
 13  1. Assumptions
 14
 15  2. How rebasing and cherry-picking work
 16
 17  3. Why the renames on MERGE_SIDE1 in any given pick are *always* a
 18     superset of the renames on MERGE_SIDE1 for the next pick.
 19
 20  4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also
 21     a rename on MERGE_SIDE1 for the next pick
 22
 23  5. A detailed description of the counter-examples to #4.
 24
 25  6. Why the special cases in #5 are still fully reasonable to use to pair
 26     up files for three-way content merging in the merge machinery, and why
 27     they do not affect the correctness of the merge.
 28
 29  7. Interaction with skipping of "irrelevant" renames
 30
 31  8. Additional items that need to be cached
 32
 33  9. How directory rename detection interacts with the above and why this
 34     optimization is still safe even if merge.directoryRenames is set to
 35     "true".
 36
 37
 38== 1. Assumptions ==
 39
 40There are two assumptions that will hold throughout this document:
 41
 42  * The upstream side where commits are transplanted to is treated as the
 43    first parent side when rebase/cherry-pick call the merge machinery
 44
 45  * All merges are fully automatic
 46
 47and a third that will hold in sections 3-6 for simplicity, that I'll later
 48address in section 9:
 49
 50  * No directory renames occur
 51
 52
 53Let me explain more about each assumption and why I include it:
 54
 55
 56The first assumption is merely for the purposes of making this document
 57clearer; the optimization implementation does not actually depend upon it.
 58However, the assumption does hold in all cases because it reflects the way
 59that both rebase and cherry-pick were implemented; and the implementation
 60of cherry-pick and rebase are not readily changeable for backwards
 61compatibility reasons (see for example the discussion of the --ours and
 62--theirs flag in the documentation of `git checkout`, particularly the
 63comments about how they behave with rebase).  The optimization avoids
 64checking first-parent-ness, though.  It checks the conditions that make the
 65optimization valid instead, so it would still continue working if someone
 66changed the parent ordering that cherry-pick and rebase use.  But making
 67this assumption does make this document much clearer and prevents me from
 68having to repeat every example twice.
 69
 70If the second assumption is violated, then the optimization simply is
 71turned off and thus isn't relevant to consider.  The second assumption can
 72also be stated as "there is no interruption for a user to resolve conflicts
 73or to just further edit or tweak files".  While real rebases and
 74cherry-picks are often interrupted (either because it's an interactive
 75rebase where the user requested to stop and edit, or because there were
 76conflicts that the user needs to resolve), the cache of renames is not
 77stored on disk, and thus is thrown away as soon as the rebase or cherry
 78pick stops for the user to resolve the operation.
 79
 80The third assumption makes sections 3-6 simpler, and allows people to
 81understand the basics of why this optimization is safe and effective, and
 82then I can go back and address the specifics in section 9.  It is probably
 83also worth noting that if directory renames do occur, then the default of
 84merge.directoryRenames being set to "conflict" means that the operation
 85will stop for users to resolve the conflicts and the cache will be thrown
 86away, and thus that there won't be an optimization to apply.  So, the only
 87reason we need to address directory renames specifically, is that some
 88users will have set merge.directoryRenames to "true" to allow the merges to
 89continue to proceed automatically.  The optimization is still safe with
 90this config setting, but we have to discuss a few more cases to show why;
 91this discussion is deferred until section 9.
 92
 93
 94== 2. How rebasing and cherry-picking work ==
 95
 96Consider the following setup (from the git-rebase manpage):
 97
 98------------
 99		     A---B---C topic
100		    /
101	       D---E---F---G main
102------------
103
104After rebasing or cherry-picking topic onto main, this will appear as:
105
106------------
107			     A'--B'--C' topic
108			    /
109	       D---E---F---G main
110------------
111
112The way the commits A', B', and C' are created is through a series of
113merges, where rebase or cherry-pick sequentially uses each of the three
114A-B-C commits in a special merge operation.  Let's label the three commits
115in the merge operation as MERGE_BASE, MERGE_SIDE1, and MERGE_SIDE2.  For
116this picture, the three commits for each of the three merges would be:
117
118....
119To create A':
120   MERGE_BASE:   E
121   MERGE_SIDE1:  G
122   MERGE_SIDE2:  A
123
124To create B':
125   MERGE_BASE:   A
126   MERGE_SIDE1:  A'
127   MERGE_SIDE2:  B
128
129To create C':
130   MERGE_BASE:   B
131   MERGE_SIDE1:  B'
132   MERGE_SIDE2:  C
133....
134
135Sometimes, folks are surprised that these three-way merges are done.  It
136can be useful in understanding these three-way merges to view them in a
137slightly different light.  For example, in creating C', you can view it as
138either:
139
140  * Apply the changes between B & C to B'
141  * Apply the changes between B & B' to C
142
143Conceptually the two statements above are the same as a three-way merge of
144B, B', and C, at least the parts before you decide to record a commit.
145
146
147== 3. Why the renames on MERGE_SIDE1 in any given pick are always a superset of the renames on MERGE_SIDE1 for the next pick. ==
148
149The merge machinery uses the filenames it is fed from MERGE_BASE,
150MERGE_SIDE1, and MERGE_SIDE2.  It will only move content to a different
151filename under one of three conditions:
152
153  * To make both pieces of a conflict available to a user during conflict
154    resolution (examples: directory/file conflict, add/add type conflict
155    such as symlink vs. regular file)
156
157  * When MERGE_SIDE1 renames the file.
158
159  * When MERGE_SIDE2 renames the file.
160
161First, let's remember what commits are involved in the first and second
162picks of the cherry-pick or rebase sequence:
163
164....
165To create A':
166   MERGE_BASE:   E
167   MERGE_SIDE1:  G
168   MERGE_SIDE2:  A
169
170To create B':
171   MERGE_BASE:   A
172   MERGE_SIDE1:  A'
173   MERGE_SIDE2:  B
174....
175
176So, in particular, we need to show that the renames between E and G are a
177superset of those between A and A'.
178
179A' is created by the first merge.  A' will only have renames for one of the
180three reasons listed above.  The first case, a conflict, results in a
181situation where the cache is dropped and thus this optimization doesn't
182take effect, so we need not consider that case.  The third case, a rename
183on MERGE_SIDE2 (i.e. from G to A), will show up in A' but it also shows up
184in A -- therefore when diffing A and A' that path does not show up as a
185rename.  The only remaining way for renames to show up in A' is for the
186rename to come from MERGE_SIDE1.  Therefore, all renames between A and A'
187are a subset of those between E and G.  Equivalently, all renames between E
188and G are a superset of those between A and A'.
189
190
191== 4. Why any rename on MERGE_SIDE1 in any given pick is _almost_ always also a rename on MERGE_SIDE1 for the next pick. ==
192
193Let's again look at the first two picks:
194
195....
196To create A':
197   MERGE_BASE:   E
198   MERGE_SIDE1:  G
199   MERGE_SIDE2:  A
200
201To create B':
202   MERGE_BASE:   A
203   MERGE_SIDE1:  A'
204   MERGE_SIDE2:  B
205....
206
207Now let's look at any given rename from MERGE_SIDE1 of the first pick, i.e.
208any given rename from E to G.  Let's use the filenames 'oldfile' and
209'newfile' for demonstration purposes.  That first pick will function as
210follows; when the rename is detected, the merge machinery will do a
211three-way content merge of the following:
212
213....
214    E:oldfile
215    G:newfile
216    A:oldfile
217....
218
219and produce a new result:
220
221....
222    A':newfile
223....
224
225Note above that I've assumed that E->A did not rename oldfile.  If that
226side did rename, then we most likely have a rename/rename(1to2) conflict
227that will cause the rebase or cherry-pick operation to halt and drop the
228in-memory cache of renames and thus doesn't need to be considered further.
229In the special case that E->A does rename the file but also renames it to
230newfile, then there is no conflict from the renaming and the merge can
231succeed.  In this special case, the rename is not valid to cache because
232the second merge will find A:newfile in the MERGE_BASE (see also the new
233testcases in t6429 with "rename same file identically" in their
234description).  So a rename/rename(1to1) needs to be specially handled by
235pruning renames from the cache and decrementing the dir_rename_counts in
236the current and leading directories associated with those renames.  Or,
237since these are really rare, one could just take the easy way out and
238disable the remembering renames optimization when a rename/rename(1to1)
239happens.
240
241The previous paragraph handled the cases for E->A renaming oldfile, let's
242continue assuming that oldfile is not renamed in A.
243
244As per the diagram for creating B', MERGE_SIDE1 involves the changes from A
245to A'.  So, we are curious whether A:oldfile and A':newfile will be viewed
246as renames.  Note that:
247
248  * There will be no A':oldfile (because there could not have been a
249    G:oldfile as we do not do break detection in the merge machinery and
250    G:newfile was detected as a rename, and by the construction of the
251    rename above that merged cleanly, the merge machinery will ensure there
252    is no 'oldfile' in the result).
253
254  * There will be no A:newfile (if there had been, we would have had a
255    rename/add conflict).
256
257  * Clearly A:oldfile and A':newfile are "related" (A':newfile came from a
258    clean three-way content merge involving A:oldfile).
259
260We can also expound on the third point above, by noting that three-way
261content merges can also be viewed as applying the differences between the
262base and one side to the other side.  Thus we can view A':newfile as
263having been created by taking the changes between E:oldfile and G:newfile
264(which were detected as being related, i.e. <50% changed) to A:oldfile.
265
266Thus A:oldfile and A':newfile are just as related as E:oldfile and
267G:newfile are -- they have exactly identical differences.  Since the latter
268were detected as renames, A:oldfile and A':newfile should also be
269detectable as renames almost always.
270
271
272== 5. A detailed description of the counter-examples to #4. ==
273
274We already noted in section 4 that rename/rename(1to1) (i.e. both sides
275renaming a file the same way) was one counter-example.  The more
276interesting bit, though, is why did we need to use the "almost" qualifier
277when stating that A:oldfile and A':newfile are "almost" always detectable
278as renames?
279
280Let's repeat an earlier point that section 4 made:
281
282....
283  A':newfile was created by applying the changes between E:oldfile and
284  G:newfile to A:oldfile.  The changes between E:oldfile and G:newfile were
285  <50% of the size of E:oldfile.
286....
287
288If those changes that were <50% of the size of E:oldfile are also <50% of
289the size of A:oldfile, then A:oldfile and A':newfile will be detectable as
290renames.  However, if there is a dramatic size reduction between E:oldfile
291and A:oldfile (but the changes between E:oldfile, G:newfile, and A:oldfile
292still somehow merge cleanly), then traditional rename detection would not
293detect A:oldfile and A':newfile as renames.
294
295Here's an example where that can happen:
296
297  * E:oldfile had 20 lines
298  * G:newfile added 10 new lines at the beginning of the file
299  * A:oldfile kept the first 3 lines of the file, and deleted all the rest
300
301then
302
303....
304  => A':newfile would have 13 lines, 3 of which matches those in A:oldfile.
305  E:oldfile -> G:newfile would be detected as a rename, but A:oldfile and
306  A':newfile would not be.
307....
308
309
310== 6. Why the special cases in #5 are still fully reasonable to use to pair up files for three-way content merging in the merge machinery, and why they do not affect the correctness of the merge. ==
311
312In the rename/rename(1to1) case, A:newfile and A':newfile are not renames
313since they use the *same* filename.  However, files with the same filename
314are obviously fine to pair up for three-way content merging (the merge
315machinery has never employed break detection).  The interesting
316counter-example case is thus not the rename/rename(1to1) case, but the case
317where A did not rename oldfile.  That was the case that we spent most of
318the time discussing in sections 4 and 5.  The remainder of this section
319will be devoted to that case as well.
320
321So, even if A:oldfile and A':newfile aren't detectable as renames, why is
322it still reasonable to pair them up for three-way content merging in the
323merge machinery?  There are multiple reasons:
324
325  * As noted in sections 4 and 5, the diff between A:oldfile and A':newfile
326    is *exactly* the same as the diff between E:oldfile and G:newfile.  The
327    latter pair were detected as renames, so it seems unlikely to surprise
328    users for us to treat A:oldfile and A':newfile as renames.
329
330  * In fact, "oldfile" and "newfile" were at one point detected as renames
331    due to how they were constructed in the E..G chain.  And we used that
332    information once already in this rebase/cherry-pick.  I think users
333    would be unlikely to be surprised at us continuing to treat the files
334    as renames and would quickly understand why we had done so.
335
336  * Marking or declaring files as renames is *not* the end goal for merges.
337    Merges use renames to determine which files make sense to be paired up
338    for three-way content merges.
339
340  * A:oldfile and A':newfile were _already_ paired up in a three-way
341    content merge; that is how A':newfile was created.  In fact, that
342    three-way content merge was clean.  So using them again in a later
343    three-way content merge seems very reasonable.
344
345However, the above is focusing on the common scenarios.  Let's try to look
346at all possible unusual scenarios and compare without the optimization to
347with the optimization.  Consider the following theoretical cases; we will
348then dive into each to determine which of them are possible,
349and if so, what they mean:
350
351  1. Without the optimization, the second merge results in a conflict.
352     With the optimization, the second merge also results in a conflict.
353     Questions: Are the conflicts confusingly different?  Better in one case?
354
355  2. Without the optimization, the second merge results in NO conflict.
356     With the optimization, the second merge also results in NO conflict.
357     Questions: Are the merges the same?
358
359  3. Without the optimization, the second merge results in a conflict.
360     With the optimization, the second merge results in NO conflict.
361     Questions: Possible?  Bug, bugfix, or something else?
362
363  4. Without the optimization, the second merge results in NO conflict.
364     With the optimization, the second merge results in a conflict.
365     Questions: Possible?  Bug, bugfix, or something else?
366
367I'll consider all four cases, but out of order.
368
369The fourth case is impossible.  For the code without the remembering
370renames optimization to not get a conflict, B:oldfile would need to exactly
371match A:oldfile -- if it doesn't, there would be a modify/delete conflict.
372If A:oldfile matches B:oldfile exactly, then a three-way content merge
373between A:oldfile, A':newfile, and B:oldfile would have no conflict and
374just give us the version of newfile from A' as the result.
375
376From the same logic as the above paragraph, the second case would indeed
377result in identical merges.  When A:oldfile exactly matches B:oldfile, an
378undetected rename would say, "Oh, I see one side didn't modify 'oldfile'
379and the other side deleted it.  I'll delete it.  And I see you have this
380brand new file named 'newfile' in A', so I'll keep it."  That gives the
381same results as three-way content merging A:oldfile, A':newfile, and
382B:oldfile -- a removal of oldfile with the version of newfile from A'
383showing up in the result.
384
385The third case is interesting.  It means that A:oldfile and A':newfile were
386not just similar enough, but that the changes between them did not conflict
387with the changes between A:oldfile and B:oldfile.  This would validate our
388hunch that the files were similar enough to be used in a three-way content
389merge, and thus seems entirely correct for us to have used them that way.
390(Sidenote: One particular example here may be enlightening.  Let's say that
391B was an immediate revert of A.  B clearly would have been a clean revert
392of A, since A was B's immediate parent.  One would assume that if you can
393pick a commit, you should also be able to cherry-pick its immediate revert.
394However, this is one of those funny corner cases; without this
395optimization, we just successfully picked a commit cleanly, but we are
396unable to cherry-pick its immediate revert due to the size differences
397between E:oldfile and A:oldfile.)
398
399That leaves only the first case to consider -- when we get conflicts both
400with or without the optimization.  Without the optimization, we'll have a
401modify/delete conflict, where both A':newfile and B:oldfile are left in the
402tree for the user to deal with and no hints about the potential similarity
403between the two.  With the optimization, we'll have a three-way content
404merged A:oldfile, A':newfile, and B:oldfile with conflict markers
405suggesting we thought the files were related but giving the user the chance
406to resolve.  As noted above, I don't think users will find us treating
407'oldfile' and 'newfile' as related as a surprise since they were between E
408and G.  In any event, though, this case shouldn't be concerning since we
409hit a conflict in both cases, told the user what we know, and asked them to
410resolve it.
411
412So, in summary, case 4 is impossible, case 2 yields the same behavior, and
413cases 1 and 3 seem to provide as good or better behavior with the
414optimization than without.
415
416
417== 7. Interaction with skipping of "irrelevant" renames ==
418
419Previous optimizations involved skipping rename detection for paths
420considered to be "irrelevant".  See for example the following commits:
421
422  * 32a56dfb99 ("merge-ort: precompute subset of sources for which we
423		need rename detection", 2021-03-11)
424  * 2fd9eda462 ("merge-ort: precompute whether directory rename
425		detection is needed", 2021-03-11)
426  * 9bd342137e ("diffcore-rename: determine which relevant_sources are
427		no longer relevant", 2021-03-13)
428
429Relevance is always determined by what the _other_ side of history has
430done, in terms of modifying a file that our side renamed, or adding a
431file to a directory which our side renamed.  This means that a path
432that is "irrelevant" when picking the first commit of a series in a
433rebase or cherry-pick, may suddenly become "relevant" when picking the
434next commit.
435
436The upshot of this is that we can only cache rename detection results
437for relevant paths, and need to re-check relevance in subsequent
438commits.  If those subsequent commits have additional paths that are
439relevant for rename detection, then we will need to redo rename
440detection -- though we can limit it to the paths for which we have not
441already detected renames.
442
443
444== 8. Additional items that need to be cached ==
445
446It turns out we have to cache more than just renames; we also cache:
447
448....
449  A) non-renames (i.e. unpaired deletes)
450  B) counts of renames within directories
451  C) sources that were marked as RELEVANT_LOCATION, but which were
452     downgraded to RELEVANT_NO_MORE
453  D) the toplevel trees involved in the merge
454....
455
456These are all stored in struct rename_info, and respectively appear in
457
458  * cached_pairs (along side actual renames, just with a value of NULL)
459  * dir_rename_counts
460  * cached_irrelevant
461  * merge_trees
462
463The reason for `(A)` comes from the irrelevant renames skipping
464optimization discussed in section 7.  The fact that irrelevant renames
465are skipped means we only get a subset of the potential renames
466detected and subsequent commits may need to run rename detection on
467the upstream side on a subset of the remaining renames (to get the
468renames that are relevant for that later commit).  Since unpaired
469deletes are involved in rename detection too, we don't want to
470repeatedly check that those paths remain unpaired on the upstream side
471with every commit we are transplanting.
472
473The reason for `(B)` is that diffcore_rename_extended() is what
474generates the counts of renames by directory which is needed in
475directory rename detection, and if we don't run
476diffcore_rename_extended() again then we need to have the output from
477it, including dir_rename_counts, from the previous run.
478
479The reason for `(C)` is that merge-ort's tree traversal will again think
480those paths are relevant (marking them as RELEVANT_LOCATION), but the
481fact that they were downgraded to RELEVANT_NO_MORE means that
482dir_rename_counts already has the information we need for directory
483rename detection.  (A path which becomes RELEVANT_CONTENT in a
484subsequent commit will be removed from cached_irrelevant.)
485
486The reason for `(D)` is that is how we determine whether the remember
487renames optimization can be used.  In particular, remembering that our
488sequence of merges looks like:
489
490....
491   Merge 1:
492   MERGE_BASE:   E
493   MERGE_SIDE1:  G
494   MERGE_SIDE2:  A
495   => Creates    A'
496
497   Merge 2:
498   MERGE_BASE:   A
499   MERGE_SIDE1:  A'
500   MERGE_SIDE2:  B
501   => Creates    B'
502....
503
504It is the fact that the trees A and A' appear both in Merge 1 and in
505Merge 2, with A as a parent of A' that allows this optimization.  So
506we store the trees to compare with what we are asked to merge next
507time.
508
509
510== 9. How directory rename detection interacts with the above and why this optimization is still safe even if merge.directoryRenames is set to "true". ==
511
512As noted in the assumptions section:
513
514....
515    """
516    ...if directory renames do occur, then the default of
517    merge.directoryRenames being set to "conflict" means that the operation
518    will stop for users to resolve the conflicts and the cache will be
519    thrown away, and thus that there won't be an optimization to apply.
520    So, the only reason we need to address directory renames specifically,
521    is that some users will have set merge.directoryRenames to "true" to
522    allow the merges to continue to proceed automatically.
523    """
524....
525
526Let's remember that we need to look at how any given pick affects the next
527one.  So let's again use the first two picks from the diagram in section
528one:
529
530....
531  First pick does this three-way merge:
532    MERGE_BASE:   E
533    MERGE_SIDE1:  G
534    MERGE_SIDE2:  A
535    => creates A'
536
537  Second pick does this three-way merge:
538    MERGE_BASE:   A
539    MERGE_SIDE1:  A'
540    MERGE_SIDE2:  B
541    => creates B'
542....
543
544Now, directory rename detection exists so that if one side of history
545renames a directory, and the other side adds a new file to the old
546directory, then the merge (with merge.directoryRenames=true) can move the
547file into the new directory.  There are two qualitatively different ways to
548add a new file to an old directory: create a new file, or rename a file
549into that directory.  Also, directory renames can be done on either side of
550history, so there are four cases to consider:
551
552  * MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to   old dir
553  * MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
554  * MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
555  * MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
556
557One last note before we consider these four cases: There are some
558important properties about how we implement this optimization with
559respect to directory rename detection that we need to bear in mind
560while considering all of these cases:
561
562  * rename caching occurs *after* applying directory renames
563
564  * a rename created by directory rename detection is recorded for the side
565    of history that did the directory rename.
566
567  * dir_rename_counts, the nested map of
568	{oldname => {newname => count}},
569    is cached between runs as well.  This basically means that directory
570    rename detection is also cached, though only on the side of history
571    that we cache renames for (MERGE_SIDE1 as far as this document is
572    concerned; see the assumptions section).  Two interesting sub-notes
573    about these counts:
574
575   ** If we need to perform rename-detection again on the given side (e.g.
576      some paths are relevant for rename detection that weren't before),
577      then we clear dir_rename_counts and recompute it, making use of
578      cached_pairs.  The reason it is important to do this is optimizations
579      around RELEVANT_LOCATION exist to prevent us from computing
580      unnecessary renames for directory rename detection and from computing
581      dir_rename_counts for irrelevant directories; but those same renames
582      or directories may become necessary for subsequent merges.  The
583      easiest way to "fix up" dir_rename_counts in such cases is to just
584      recompute it.
585
586   ** If we prune rename/rename(1to1) entries from the cache, then we also
587      need to update dir_rename_counts to decrement the counts for the
588      involved directory and any relevant parent directories (to undo what
589      update_dir_rename_counts() in diffcore-rename.c incremented when the
590      rename was initially found).  If we instead just disable the
591      remembering renames optimization when the exceedingly rare
592      rename/rename(1to1) cases occur, then dir_rename_counts will get
593      re-computed the next time rename detection occurs, as noted above.
594
595  * the side with multiple commits to pick, is the side of history that we
596    do NOT cache renames for.  Thus, there are no additional commits to
597    change the number of renames in a directory, except for those done by
598    directory rename detection (which always pad the majority).
599
600  * the "renames" we cache are modified slightly by any directory rename,
601    as noted below.
602
603Now, with those notes out of the way, let's go through the four cases
604in order:
605
606Case 1: MERGE_SIDE1 renames old dir, MERGE_SIDE2 adds new file to old dir
607
608....
609  This case looks like this:
610
611    MERGE_BASE:   E,   Has olddir/
612    MERGE_SIDE1:  G,   Renames olddir/ -> newdir/
613    MERGE_SIDE2:  A,   Adds olddir/newfile
614    => creates    A',  With newdir/newfile
615
616    MERGE_BASE:   A,   Has olddir/newfile
617    MERGE_SIDE1:  A',  Has newdir/newfile
618    MERGE_SIDE2:  B,   Modifies olddir/newfile
619    => expected   B',  with threeway-merged newdir/newfile from above
620
621  In this case, with the optimization, note that after the first commit:
622    * MERGE_SIDE1 remembers olddir/ -> newdir/
623    * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
624  Given the cached rename noted above, the second merge can proceed as
625  expected without needing to perform rename detection from A -> A'.
626....
627
628Case 2: MERGE_SIDE1 renames old dir, MERGE_SIDE2 renames  file into old dir
629
630....
631  This case looks like this:
632
633    MERGE_BASE:   E    oldfile, olddir/
634    MERGE_SIDE1:  G    oldfile, olddir/ -> newdir/
635    MERGE_SIDE2:  A    oldfile -> olddir/newfile
636    => creates    A',  With newdir/newfile representing original oldfile
637
638    MERGE_BASE:   A    olddir/newfile
639    MERGE_SIDE1:  A'   newdir/newfile
640    MERGE_SIDE2:  B    modify olddir/newfile
641    => expected   B',  with threeway-merged newdir/newfile from above
642
643  In this case, with the optimization, note that after the first commit:
644    * MERGE_SIDE1 remembers olddir/ -> newdir/
645    * MERGE_SIDE1 has cached olddir/newfile -> newdir/newfile
646		  (NOT oldfile -> newdir/newfile; compare to case with
647		   (p->status == 'R' && new_path) in possibly_cache_new_pair())
648
649  Given the cached rename noted above, the second merge can proceed as
650  expected without needing to perform rename detection from A -> A'.
651....
652
653Case 3: MERGE_SIDE1 adds new file to   old dir, MERGE_SIDE2 renames old dir
654
655....
656  This case looks like this:
657
658    MERGE_BASE:   E,   Has olddir/
659    MERGE_SIDE1:  G,   Adds olddir/newfile
660    MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
661    => creates    A',  With newdir/newfile
662
663    MERGE_BASE:   A,   Has newdir/, but no notion of newdir/newfile
664    MERGE_SIDE1:  A',  Has newdir/newfile
665    MERGE_SIDE2:  B,   Has newdir/, but no notion of newdir/newfile
666    => expected   B',  with newdir/newfile from A'
667
668  In this case, with the optimization, note that after the first commit there
669  were no renames on MERGE_SIDE1, and any renames on MERGE_SIDE2 are tossed.
670  But the second merge didn't need any renames so this is fine.
671....
672
673Case 4: MERGE_SIDE1 renames  file into old dir, MERGE_SIDE2 renames old dir
674
675....
676  This case looks like this:
677
678    MERGE_BASE:   E,   Has olddir/
679    MERGE_SIDE1:  G,   Renames oldfile -> olddir/newfile
680    MERGE_SIDE2:  A,   Renames olddir/ -> newdir/
681    => creates    A',  With newdir/newfile representing original oldfile
682
683    MERGE_BASE:   A,   Has oldfile
684    MERGE_SIDE1:  A',  Has newdir/newfile
685    MERGE_SIDE2:  B,   Modifies oldfile
686    => expected   B',  with threeway-merged newdir/newfile from above
687
688  In this case, with the optimization, note that after the first commit:
689    * MERGE_SIDE1 remembers oldfile -> newdir/newfile
690		  (NOT oldfile -> olddir/newfile; compare to case of second
691		   block under p->status == 'R' in possibly_cache_new_pair())
692    * MERGE_SIDE2 renames are tossed because only MERGE_SIDE1 is remembered
693
694  Given the cached rename noted above, the second merge can proceed as
695  expected without needing to perform rename detection from A -> A'.
696....
697
698Finally, I'll just note here that interactions with the
699skip-irrelevant-renames optimization means we sometimes don't detect
700renames for any files within a directory that was renamed, in which
701case we will not have been able to detect any rename for the directory
702itself.  In such a case, we do not know whether the directory was
703renamed; we want to be careful to avoid caching some kind of "this
704directory was not renamed" statement.  If we did, then a subsequent
705commit being rebased could add a file to the old directory, and the
706user would expect it to end up in the correct directory -- something
707our erroneous "this directory was not renamed" cache would preclude.