revision: manage memory ownership of argv in setup_revisions()

Git fork

The setup_revisions() function takes an argc/argv pair and consumes
arguments from it, returning a reduced argc count to the caller. But it
may also overwrite entries within the argv array, as it shifts unknown
options to the front of argv (so they can be found in the range of
0..argc-1 after we return).

For a normal argc/argv coming from the operating system, this is OK.
We don't need to worry about memory ownership of the strings in those
entries. But some callers pass in allocated strings from a strvec, and
we do need to care about those.

We faced a similar issue in f92dbdbc6a (revisions API: don't leak memory
on argv elements that need free()-ing, 2022-08-02), which added an
option for callers to tell us that elements need to be freed. But the
implementation within setup_revisions() was incomplete. It only covered
the case of dropping "--", but not the movement of unknown options.

When we shift argv entries around, we should free the elements we are
about to overwrite, so they are not leaked. For example, in:

git stash show -p --invalid

we will pass this to setup_revisions():

argc = 3, argv[] = { "show", "-p", "--invalid", NULL }

which will then return:

argc = 2, argv[] = { "show", "--invalid", "--invalid", NULL }

overwriting the "-p" entry, which is leaked unless we free it at that
moment.

You can see in the output above another potential problem. We now have
two copies of the "--invalid" string. If the caller does not respect the
new argc when free-ing the strings via strvec_clear(), we'll get a
double-free. And git-stash suffers from this, and will crash with the
above command.

So it seems at first glance that the solution is to just assign the
reduced argc to the strvec.nr field in the caller. Then it would stop
after freeing only any copied entries. But that's not always right
either!

Remember that we are reducing "argc" to account for elements we've
consumed. So if there isn't an invalid option, we'd turn:

argc = 2, argv[] = { "show", "-p", NULL }

into:

argc = 1, argv[] = { "show", "-p", NULL }

In that case strvec_clear() must keep looking past the shortened argc we
return to find the original "-p" to free. It needs to use the original
argc to do that.

We can solve this by turning our argv writes into strict moves, not
copies. When we shuffle an unknown option to the front, we'll overwrite
its old position with NULL. That leaves an argv array that may have NULL
"holes" in it.

So in the "--invalid" example above we get:

argc = 2, argv[] = { "show", "--invalid", NULL, NULL }

but something like "git stash -p --invalid -p" would yield:

argc = 3, argv[] = { "show", "--invalid", NULL, "-p", NULL }

because we move "--invalid" to overwrite the first "-p", but the second
one is quietly consumed. But strvec_clear() can handle that fine (it
iterates over the "nr" field, and passing NULL to free() is OK).

To ease the implementation, I've introduced a helper function. It's a
little hacky because it must take a double-pointer to set the old
position to NULL. Which in turn means we cannot pass "&arg", our local
alias for the current entry we're parsing, but instead "&argv[i]", the
pointer in the original array. And to make it even more confusing, we
delegate some of this work to handle_revision_opt(), which is passed a
subset of the argv array, so is always working on "&argv[0]".

Likewise, because handle_revision_opt() only receives the part of argv
left to parse, it receives the array to accumulate unknown options as a
separate unkc/unkv pair. But we're always working on the same argv
array, so our strategy works fine. I suspect this would be a bit more
obvious (and avoid some pointer cleverness) if all functions saw the
full argv array and worked with positions within it (and our new helper
would take two positions, a src and dst). But that would involve
refactoring handle_revision_opt(). I punted on that, as what's here is
not too ugly and is all contained within revision.c itself.

The new test demonstrates that "git stash show -p --invalid" no longer
crashes with a double-free (because we move instead of copy). And it
passes with SANITIZE=leak because we free "-p" before overwriting.

Signed-off-by: Jeff King <peff@peff.net>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Jeff King and committed by

Junio C Hamano 5 months ago cd439487 3ea35c64

+26 -3

2 changed files

expand all

unified split

revision.c

t3903-stash.sh

+21 -3

revision.c

··· 2304 return num; 2305 } 2306 2307 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv, 2308 int *unkc, const char **unkv, 2309 const struct setup_revision_opt* opt) ··· 2325 starts_with(arg, "--branches=") || starts_with(arg, "--tags=") || 2326 starts_with(arg, "--remotes=") || starts_with(arg, "--no-walk=")) 2327 { 2328 - unkv[(*unkc)++] = arg; 2329 return 1; 2330 } 2331 ··· 2689 } else { 2690 int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix); 2691 if (!opts) 2692 - unkv[(*unkc)++] = arg; 2693 return opts; 2694 } 2695 ··· 3001 3002 if (!strcmp(arg, "--stdin")) { 3003 if (revs->disable_stdin) { 3004 - argv[left++] = arg; 3005 continue; 3006 } 3007 if (revs->read_from_stdin++)

··· 2304 return num; 2305 } 2306 2307 + static void overwrite_argv(int *argc, const char **argv, 2308 + const char **value, 2309 + const struct setup_revision_opt *opt) 2310 + { 2311 + /* 2312 + * Detect the case when we are overwriting ourselves. The assignment 2313 + * itself would be a noop either way, but this lets us avoid corner 2314 + * cases around the free() and NULL operations. 2315 + */ 2316 + if (*value != argv[*argc]) { 2317 + if (opt && opt->free_removed_argv_elements) 2318 + free((char *)argv[*argc]); 2319 + argv[*argc] = *value; 2320 + *value = NULL; 2321 + } 2322 + (*argc)++; 2323 + } 2324 + 2325 static int handle_revision_opt(struct rev_info *revs, int argc, const char **argv, 2326 int *unkc, const char **unkv, 2327 const struct setup_revision_opt* opt) ··· 2343 starts_with(arg, "--branches=") || starts_with(arg, "--tags=") || 2344 starts_with(arg, "--remotes=") || starts_with(arg, "--no-walk=")) 2345 { 2346 + overwrite_argv(unkc, unkv, &argv[0], opt); 2347 return 1; 2348 } 2349 ··· 2707 } else { 2708 int opts = diff_opt_parse(&revs->diffopt, argv, argc, revs->prefix); 2709 if (!opts) 2710 + overwrite_argv(unkc, unkv, &argv[0], opt); 2711 return opts; 2712 } 2713 ··· 3019 3020 if (!strcmp(arg, "--stdin")) { 3021 if (revs->disable_stdin) { 3022 + overwrite_argv(&left, argv, &argv[i], opt); 3023 continue; 3024 } 3025 if (revs->read_from_stdin++)

t/t3903-stash.sh

··· 1745 git stash show -- 1746 ' 1747 1748 test_done

··· 1745 git stash show -- 1746 ' 1747 1748 + test_expect_success 'controlled error return on unrecognized option' ' 1749 + test_expect_code 129 git stash show -p --invalid 2>usage && 1750 + grep -e "^usage: git stash show" usage 1751 + ' 1752 + 1753 test_done