Git fork

refs/packed: fix BUG when seeking refs with UTF-8 characters

It was reported that using git-pull(1) in a repository whose remote
contains branches with emojis leads to the following bug:

$ git pull
remote: Enumerating objects: 161255, done.
remote: Counting objects: 100% (55884/55884), done.
remote: Compressing objects: 100% (5518/5518), done.
remote: Total 161255 (delta 54253), reused 50509 (delta 50364),
pack-reused 105371 (from 4)
Receiving objects: 100% (161255/161255), 309.90 MiB | 16.87 MiB/s, done.
Resolving deltas: 100% (118048/118048), completed with 13416 local objects.
From github.com:github/github
97ab7ae3f3745..8fb2f9fa180ed master -> origin/master
[...snip many screenfuls of updates to origin remotes...]
BUG: refs/packed-backend.c:984: packed-refs backend yielded reference
preceding its prefix
error: fetch died of signal 6

This issue bisects to 22600c04529 (refs/iterator: implement seeking for
packed-ref iterators, 2025-03-12) where we have implemented seeking for
the packed-ref iterator. As part of that change we introduced a check
that verifies that the iterator only returns refnames bigger than the
prefix. In theory, this check should always hold: when a prefix is set
we know that we would've seeked that prefix first, so we should never
see a reference sorting before that prefix.

But in practice the check itself is misbehaving when handling unicode
characters. The particular issue triggered with a branch that got the
"shaved ice" unicode character in its name, which is composed of the
bytes "0xEE 0x90 0xBF". The bug triggers when we compare the refname
"refs/heads/<shaved-ice>" to something like "refs/heads/z", and it
specifically hits when comparing the first byte, "0xEE".

The root cause is that the most-significant bit of 0xEE is set. The
`refname` and `prefix` pointers that we use to compare bytes with one
another are both pointers to signed characters. As such, when we
dereference the 0xEE byte the result is a _negative_ value, and this
value will of course compare smaller than "z".

We can see that this issue is avoided in `cmp_packed_refname()`, where
we explicitly cast each byte to its unsigned form. Fix the bug by doing
the same in `packed_ref_iterator_advance()`.

Reported-by: Elijah Newren <newren@gmail.com>
Signed-off-by: Patrick Steinhardt <ps@pks.im>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Patrick Steinhardt and committed by
Junio C Hamano
f1fb0644 87d297f4

+17 -2
+2 -2
refs/packed-backend.c
··· 955 955 continue; 956 956 957 957 while (prefix && *prefix) { 958 - if (*refname < *prefix) 958 + if ((unsigned char)*refname < (unsigned char)*prefix) 959 959 BUG("packed-refs backend yielded reference preceding its prefix"); 960 - else if (*refname > *prefix) 960 + else if ((unsigned char)*refname > (unsigned char)*prefix) 961 961 return ITER_DONE; 962 962 prefix++; 963 963 refname++;
+15
t/t1408-packed-refs.sh
··· 42 42 test_cmp expect actual 43 43 ' 44 44 45 + test_expect_success 'list packed refs with unicode characters' ' 46 + test_when_finished "rm -rf repo" && 47 + git init repo && 48 + ( 49 + cd repo && 50 + test_commit --no-tag A && 51 + git update-ref refs/heads/ HEAD && 52 + git update-ref refs/heads/z HEAD && 53 + git pack-refs --all && 54 + printf "%s commit\trefs/heads/z\n" $(git rev-parse HEAD) >expect && 55 + git for-each-ref refs/heads/z >actual && 56 + test_cmp expect actual 57 + ) 58 + ' 59 + 45 60 test_done