Git fork

git_inflate(): skip zlib_post_call() sanity check on Z_NEED_DICT

This fixes a case where malformed object input can cause us to hit a
BUG() call in the git-zlib.c code.

The zlib format allows the use of preset dictionaries to reduce the size
of deflated data. The checksum of the dictionary is computed by the
deflate code and goes into the stream. On the inflating side, zlib sees
the dictionary checksum and returns Z_NEED_DICT, asking the caller to
provide the dictionary data via inflateSetDictionary().

This should never happen in Git, because we never provide a dictionary
for deflating (and if we get a stream that mentions a dictionary, we
have no idea how to provide it). So normally Z_NEED_DICT is a hard error
for us. But something interesting happens if we _do_ happen to see it
(e.g., because of a corrupt or malicious input).

In git_inflate() as we loop over calls to zlib's inflate(), we translate
between our large-integer git_zstream values and zlib's native z_stream
types, copying in and out with zlib_pre_call() and zlib_post_call(). In
zlib_post_call() we have a few sanity checks, including one that checks
that the number of bytes consumed by zlib (as measured by it moving the
"next_in" pointer) is equal to the movement of its "total_in" count.

But these do not correspond when we see Z_NEED_DICT! Zlib consumes the
bytes from the input buffer but it does not increment total_in. And so
we hit the BUG("total_in mismatch") call.

There are a few options here:

- We could ditch that BUG() check. It is making too many assumptions
about how zlib updates these values. But it does have value in most
cases as a sanity check on the values we're copying.

- We could skip the zlib_post_call() entirely when we see Z_NEED_DICT.
We know that it's hard error for us, so we should just send the
status up the stack and let the caller bail.

The downside is that if we ever did want to support dictionaries,
we couldn't (the git_zstream will be out of sync, since we never
copied its values back from the z_stream).

- We could continue to call zlib_post_call(), but skip just that BUG()
check if the status is Z_NEED_DICT. This keeps git_inflate() as a
thin wrapper around inflate(), and would let us later support
dictionaries for some calls if we wanted to.

This patch uses the third approach. It seems like the least-surprising
thing to keep git_inflate() a close to inflate() as possible. And while
it makes the diff a bit larger (since we have to pass the status down to
to the zlib_post_call() function), it's a static local function, and
every caller by definition will have just made a zlib call (and so will
have a status integer).

Co-authored-by: Taylor Blau <me@ttaylorr.com>
Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Jeff King
Taylor Blau
and committed by
Junio C Hamano
0b1493c2 b748ddb7

+48 -11
+16 -11
git-zlib.c
··· 45 45 s->z.avail_out = zlib_buf_cap(s->avail_out); 46 46 } 47 47 48 - static void zlib_post_call(git_zstream *s) 48 + static void zlib_post_call(git_zstream *s, int status) 49 49 { 50 50 unsigned long bytes_consumed; 51 51 unsigned long bytes_produced; ··· 54 54 bytes_produced = s->z.next_out - s->next_out; 55 55 if (s->z.total_out != s->total_out + bytes_produced) 56 56 BUG("total_out mismatch"); 57 - if (s->z.total_in != s->total_in + bytes_consumed) 57 + /* 58 + * zlib does not update total_in when it returns Z_NEED_DICT, 59 + * causing a mismatch here. Skip the sanity check in that case. 60 + */ 61 + if (status != Z_NEED_DICT && 62 + s->z.total_in != s->total_in + bytes_consumed) 58 63 BUG("total_in mismatch"); 59 64 60 65 s->total_out = s->z.total_out; ··· 71 76 72 77 zlib_pre_call(strm); 73 78 status = inflateInit(&strm->z); 74 - zlib_post_call(strm); 79 + zlib_post_call(strm, status); 75 80 if (status == Z_OK) 76 81 return; 77 82 die("inflateInit: %s (%s)", zerr_to_string(status), ··· 89 94 90 95 zlib_pre_call(strm); 91 96 status = inflateInit2(&strm->z, windowBits); 92 - zlib_post_call(strm); 97 + zlib_post_call(strm, status); 93 98 if (status == Z_OK) 94 99 return; 95 100 die("inflateInit2: %s (%s)", zerr_to_string(status), ··· 102 107 103 108 zlib_pre_call(strm); 104 109 status = inflateEnd(&strm->z); 105 - zlib_post_call(strm); 110 + zlib_post_call(strm, status); 106 111 if (status == Z_OK) 107 112 return; 108 113 error("inflateEnd: %s (%s)", zerr_to_string(status), ··· 121 126 ? 0 : flush); 122 127 if (status == Z_MEM_ERROR) 123 128 die("inflate: out of memory"); 124 - zlib_post_call(strm); 129 + zlib_post_call(strm, status); 125 130 126 131 /* 127 132 * Let zlib work another round, while we can still ··· 163 168 memset(strm, 0, sizeof(*strm)); 164 169 zlib_pre_call(strm); 165 170 status = deflateInit(&strm->z, level); 166 - zlib_post_call(strm); 171 + zlib_post_call(strm, status); 167 172 if (status == Z_OK) 168 173 return; 169 174 die("deflateInit: %s (%s)", zerr_to_string(status), ··· 179 184 status = deflateInit2(&strm->z, level, 180 185 Z_DEFLATED, windowBits, 181 186 8, Z_DEFAULT_STRATEGY); 182 - zlib_post_call(strm); 187 + zlib_post_call(strm, status); 183 188 if (status == Z_OK) 184 189 return; 185 190 die("deflateInit2: %s (%s)", zerr_to_string(status), ··· 210 215 211 216 zlib_pre_call(strm); 212 217 status = deflateEnd(&strm->z); 213 - zlib_post_call(strm); 218 + zlib_post_call(strm, status); 214 219 return status; 215 220 } 216 221 ··· 230 235 231 236 zlib_pre_call(strm); 232 237 status = deflateEnd(&strm->z); 233 - zlib_post_call(strm); 238 + zlib_post_call(strm, status); 234 239 return status; 235 240 } 236 241 ··· 247 252 ? 0 : flush); 248 253 if (status == Z_MEM_ERROR) 249 254 die("deflate: out of memory"); 250 - zlib_post_call(strm); 255 + zlib_post_call(strm, status); 251 256 252 257 /* 253 258 * Let zlib work another round, while we can still
+32
t/t1006-cat-file.sh
··· 836 836 test_grep "unable to unpack $blob header" err 837 837 EOT 838 838 839 + test_expect_success 'object reading handles zlib dictionary' - <<\EOT 840 + echo 'content that will be recompressed' >file && 841 + blob=$(git hash-object -w file) && 842 + objpath=.git/objects/$(test_oid_to_path "$blob") && 843 + 844 + # Recompress a loose object using a precomputed zlib dictionary. 845 + # This was originally done with: 846 + # 847 + # perl -MCompress::Raw::Zlib -e ' 848 + # binmode STDIN; 849 + # binmode STDOUT; 850 + # my $data = do { local $/; <STDIN> }; 851 + # my $in = new Compress::Raw::Zlib::Inflate; 852 + # my $de = new Compress::Raw::Zlib::Deflate( 853 + # -Dictionary => "anything" 854 + # ); 855 + # $in->inflate($data, $raw); 856 + # $de->deflate($raw, $out); 857 + # print $out; 858 + # ' <obj.bak >$objpath 859 + # 860 + # but we do not want to require the perl module for all test runs (nor 861 + # carry a custom t/helper program that uses zlib features we don't 862 + # otherwise care about). 863 + mv "$objpath" obj.bak && 864 + test_when_finished 'mv obj.bak "$objpath"' && 865 + printf '\170\273\017\112\003\143' >$objpath && 866 + 867 + test_must_fail git cat-file blob $blob 2>err && 868 + test_grep 'error: inflate: needs dictionary' err 869 + EOT 870 + 839 871 # Tests for git cat-file --follow-symlinks 840 872 test_expect_success 'prep for symlink tests' ' 841 873 echo_without_newline "$hello_content" >morx &&