Git fork

--dirstat-by-file: Make it faster and more correct

Currently, when using --dirstat-by-file, it first does the full --dirstat
analysis (using diffcore_count_changes()), and then resets 'damage' to 1,
if any damage was found by diffcore_count_changes().

But --dirstat-by-file is not interested in the file damage per se. It only
cares if the file changed at all. In that sense it only cares if the blob
object for a file has changed. We therefore only need to compare the
object names of each file pair in the diff queue and we can skip the
entire --dirstat analysis and simply set 'damage' to 1 for each entry
where the object name has changed.

This makes --dirstat-by-file faster, and also bypasses --dirstat's practice
of ignoring rearranged lines within a file.

The patch also contains an added testcase verifying that --dirstat-by-file
now detects changes that only rearrange lines within a file.

Signed-off-by: Johan Herland <johan@herland.net>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Johan Herland and committed by
Junio C Hamano
0133dab7 204f01a2

+25 -5
+20 -5
diff.c
··· 1539 struct diff_filepair *p = q->queue[i]; 1540 const char *name; 1541 unsigned long copied, added, damage; 1542 1543 name = p->one->path ? p->one->path : p->two->path; 1544 1545 if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) { 1546 diff_populate_filespec(p->one, 0); 1547 diff_populate_filespec(p->two, 0); ··· 1564 /* 1565 * Original minus copied is the removed material, 1566 * added is the new material. They are both damages 1567 - * made to the preimage. In --dirstat-by-file mode, count 1568 - * damaged files, not damaged lines. This is done by 1569 - * counting only a single damaged line per file. 1570 */ 1571 damage = (p->one->size - copied) + added; 1572 - if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE) && damage > 0) 1573 - damage = 1; 1574 1575 ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc); 1576 dir.files[dir.nr].name = name; 1577 dir.files[dir.nr].changed = damage;
··· 1539 struct diff_filepair *p = q->queue[i]; 1540 const char *name; 1541 unsigned long copied, added, damage; 1542 + int content_changed; 1543 1544 name = p->one->path ? p->one->path : p->two->path; 1545 1546 + if (p->one->sha1_valid && p->two->sha1_valid) 1547 + content_changed = hashcmp(p->one->sha1, p->two->sha1); 1548 + else 1549 + content_changed = 1; 1550 + 1551 + if (DIFF_OPT_TST(options, DIRSTAT_BY_FILE)) { 1552 + /* 1553 + * In --dirstat-by-file mode, we don't really need to 1554 + * look at the actual file contents at all. 1555 + * The fact that the SHA1 changed is enough for us to 1556 + * add this file to the list of results 1557 + * (with each file contributing equal damage). 1558 + */ 1559 + damage = content_changed ? 1 : 0; 1560 + goto found_damage; 1561 + } 1562 + 1563 if (DIFF_FILE_VALID(p->one) && DIFF_FILE_VALID(p->two)) { 1564 diff_populate_filespec(p->one, 0); 1565 diff_populate_filespec(p->two, 0); ··· 1582 /* 1583 * Original minus copied is the removed material, 1584 * added is the new material. They are both damages 1585 + * made to the preimage. 1586 */ 1587 damage = (p->one->size - copied) + added; 1588 1589 + found_damage: 1590 ALLOC_GROW(dir.files, dir.nr + 1, dir.alloc); 1591 dir.files[dir.nr].name = name; 1592 dir.files[dir.nr].changed = damage;
+2
t/t4013-diff-various.sh
··· 302 diff --dirstat master~1 master~2 303 # --dirstat doesn't notice changes that simply rearrange existing lines 304 diff --dirstat initial rearrange 305 EOF 306 307 test_expect_success 'log -S requires an argument' '
··· 302 diff --dirstat master~1 master~2 303 # --dirstat doesn't notice changes that simply rearrange existing lines 304 diff --dirstat initial rearrange 305 + # ...but --dirstat-by-file does notice changes that only rearrange lines 306 + diff --dirstat-by-file initial rearrange 307 EOF 308 309 test_expect_success 'log -S requires an argument' '
+3
t/t4013/diff.diff_--dirstat-by-file_initial_rearrange
···
··· 1 + $ git diff --dirstat-by-file initial rearrange 2 + 100.0% dir/ 3 + $