Git fork
1= My First Object Walk
2
3== What's an Object Walk?
4
5The object walk is a key concept in Git - this is the process that underpins
6operations like object transfer and fsck. Beginning from a given commit, the
7list of objects is found by walking parent relationships between commits (commit
8X based on commit W) and containment relationships between objects (tree Y is
9contained within commit X, and blob Z is located within tree Y, giving our
10working tree for commit X something like `y/z.txt`).
11
12A related concept is the revision walk, which is focused on commit objects and
13their parent relationships and does not delve into other object types. The
14revision walk is used for operations like `git log`.
15
16=== Related Reading
17
18- `Documentation/user-manual.adoc` under "Hacking Git" contains some coverage of
19 the revision walker in its various incarnations.
20- `revision.h`
21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists]
22 gives a good overview of the types of objects in Git and what your object
23 walk is really describing.
24
25== Setting Up
26
27Create a new branch from `master`.
28
29----
30git checkout -b revwalk origin/master
31----
32
33We'll put our fiddling into a new command. For fun, let's name it `git walken`.
34Open up a new file `builtin/walken.c` and set up the command handler:
35
36----
37/*
38 * "git walken"
39 *
40 * Part of the "My First Object Walk" tutorial.
41 */
42
43#include "builtin.h"
44#include "trace.h"
45
46int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo)
47{
48 trace_printf(_("cmd_walken incoming...\n"));
49 return 0;
50}
51----
52
53NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in
54that it can be turned on or off at runtime. For the purposes of this
55tutorial, we will write `walken` as though it is intended for use as
56a "plumbing" command: that is, a command which is used primarily in
57scripts, rather than interactively by humans (a "porcelain" command).
58So we will send our debug output to `trace_printf()` instead.
59When running, enable trace output by setting the environment variable `GIT_TRACE`.
60
61Add usage text and `-h` handling, like all subcommands should consistently do
62(our test suite will notice and complain if you fail to do so).
63We'll need to include the `parse-options.h` header.
64
65----
66#include "parse-options.h"
67
68...
69
70int cmd_walken(int argc, const char **argv, const char *prefix)
71{
72 const char * const walken_usage[] = {
73 N_("git walken"),
74 NULL,
75 };
76 struct option options[] = {
77 OPT_END()
78 };
79
80 argc = parse_options(argc, argv, prefix, options, walken_usage, 0);
81
82 ...
83}
84----
85
86Also add the relevant line in `builtin.h` near `cmd_version()`:
87
88----
89int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo);
90----
91
92Include the command in `git.c` in `commands[]` near the entry for `version`,
93maintaining alphabetical ordering:
94
95----
96{ "walken", cmd_walken, RUN_SETUP },
97----
98
99Add an entry for the new command in the both the Make and Meson build system,
100before the entry for `worktree`:
101
102- In the `Makefile`:
103----
104...
105BUILTIN_OBJS += builtin/walken.o
106...
107----
108
109- In the `meson.build` file:
110----
111builtin_sources = [
112 ...
113 'builtin/walken.c',
114 ...
115]
116----
117
118Build and test out your command, without forgetting to ensure the `DEVELOPER`
119flag is set, and with `GIT_TRACE` enabled so the debug output can be seen:
120
121----
122$ echo DEVELOPER=1 >>config.mak
123$ make
124$ GIT_TRACE=1 ./bin-wrappers/git walken
125----
126
127NOTE: For a more exhaustive overview of the new command process, take a look at
128`Documentation/MyFirstContribution.adoc`.
129
130NOTE: A reference implementation can be found at
131https://github.com/nasamuffin/git/tree/revwalk.
132
133=== `struct rev_cmdline_info`
134
135The definition of `struct rev_cmdline_info` can be found in `revision.h`.
136
137This struct is contained within the `rev_info` struct and is used to reflect
138parameters provided by the user over the CLI.
139
140`nr` represents the number of `rev_cmdline_entry` present in the array.
141
142`alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is
143used to track the allocated size of the list.
144
145Per entry, we find:
146
147`item` is the object provided upon which to base the object walk. Items in Git
148can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.adoc`.)
149
150`name` is the object ID (OID) of the object - a hex string you may be familiar
151with from using Git to organize your source in the past. Check the tutorial
152mentioned above towards the top for a discussion of where the OID can come
153from.
154
155`whence` indicates some information about what to do with the parents of the
156specified object. We'll explore this flag more later on; take a look at
157`Documentation/revisions.adoc` to get an idea of what could set the `whence`
158value.
159
160`flags` are used to hint the beginning of the revision walk and are the first
161block under the `#include`s in `revision.h`. The most likely ones to be set in
162the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags
163can be used during the walk, as well.
164
165=== `struct rev_info`
166
167This one is quite a bit longer, and many fields are only used during the walk
168by `revision.c` - not configuration options. Most of the configurable flags in
169`struct rev_info` have a mirror in `Documentation/rev-list-options.adoc`. It's a
170good idea to take some time and read through that document.
171
172== Basic Commit Walk
173
174First, let's see if we can replicate the output of `git log --oneline`. We'll
175refer back to the implementation frequently to discover norms when performing
176an object walk of our own.
177
178To do so, we'll first find all the commits, in order, which preceded the current
179commit. We'll extract the name and subject of the commit from each.
180
181Ideally, we will also be able to find out which ones are currently at the tip of
182various branches.
183
184=== Setting Up
185
186Preparing for your object walk has some distinct stages.
187
1881. Perform default setup for this mode, and others which may be invoked.
1892. Check configuration files for relevant settings.
1903. Set up the `rev_info` struct.
1914. Tweak the initialized `rev_info` to suit the current walk.
1925. Prepare the `rev_info` for the walk.
1936. Iterate over the objects, processing each one.
194
195==== Default Setups
196
197Before examining configuration files which may modify command behavior, set up
198default state for switches or options your command may have. If your command
199utilizes other Git components, ask them to set up their default states as well.
200For instance, `git log` takes advantage of `grep` and `diff` functionality, so
201its `init_log_defaults()` sets its own state (`decoration_style`) and asks
202`grep` and `diff` to initialize themselves by calling each of their
203initialization functions.
204
205==== Configuring From `.gitconfig`
206
207Next, we should have a look at any relevant configuration settings (i.e.,
208settings readable and settable from `git config`). This is done by providing a
209callback to `repo_config()`; within that callback, you can also invoke methods
210from other components you may need that need to intercept these options. Your
211callback will be invoked once per each configuration value which Git knows about
212(global, local, worktree, etc.).
213
214Similarly to the default values, we don't have anything to do here yet
215ourselves; however, we should call `git_default_config()` if we aren't calling
216any other existing config callbacks.
217
218Add a new function to `builtin/walken.c`.
219We'll also need to include the `config.h` header:
220
221----
222#include "config.h"
223
224...
225
226static int git_walken_config(const char *var, const char *value,
227 const struct config_context *ctx, void *cb)
228{
229 /*
230 * For now, we don't have any custom configuration, so fall back to
231 * the default config.
232 */
233 return git_default_config(var, value, ctx, cb);
234}
235----
236
237Make sure to invoke `repo_config()` with it in your `cmd_walken()`:
238
239----
240int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo)
241{
242 ...
243
244 repo_config(repo, git_walken_config, NULL);
245
246 ...
247}
248----
249
250==== Setting Up `rev_info`
251
252Now that we've gathered external configuration and options, it's time to
253initialize the `rev_info` object which we will use to perform the walk. This is
254typically done by calling `repo_init_revisions()` with the repository you intend
255to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info`
256struct.
257
258Add the `struct rev_info` and the `repo_init_revisions()` call.
259We'll also need to include the `revision.h` header:
260
261----
262#include "revision.h"
263
264...
265
266int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo)
267{
268 /* This can go wherever you like in your declarations.*/
269 struct rev_info rev;
270 ...
271
272 /* This should go after the repo_config() call. */
273 repo_init_revisions(repo, &rev, prefix);
274
275 ...
276}
277----
278
279==== Tweaking `rev_info` For the Walk
280
281We're getting close, but we're still not quite ready to go. Now that `rev` is
282initialized, we can modify it to fit our needs. This is usually done within a
283helper for clarity, so let's add one:
284
285----
286static void final_rev_info_setup(struct rev_info *rev)
287{
288 /*
289 * We want to mimic the appearance of `git log --oneline`, so let's
290 * force oneline format.
291 */
292 get_commit_format("oneline", rev);
293
294 /* Start our object walk at HEAD. */
295 add_head_to_pending(rev);
296}
297----
298
299[NOTE]
300====
301Instead of using the shorthand `add_head_to_pending()`, you could do
302something like this:
303
304----
305 struct setup_revision_opt opt;
306
307 memset(&opt, 0, sizeof(opt));
308 opt.def = "HEAD";
309 opt.revarg_opt = REVARG_COMMITTISH;
310 setup_revisions(argc, argv, rev, &opt);
311----
312
313Using a `setup_revision_opt` gives you finer control over your walk's starting
314point.
315====
316
317Then let's invoke `final_rev_info_setup()` after the call to
318`repo_init_revisions()`:
319
320----
321int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo)
322{
323 ...
324
325 final_rev_info_setup(&rev);
326
327 ...
328}
329----
330
331Later, we may wish to add more arguments to `final_rev_info_setup()`. But for
332now, this is all we need.
333
334==== Preparing `rev_info` For the Walk
335
336Now that `rev` is all initialized and configured, we've got one more setup step
337before we get rolling. We can do this in a helper, which will both prepare the
338`rev_info` for the walk, and perform the walk itself. Let's start the helper
339with the call to `prepare_revision_walk()`, which can return an error without
340dying on its own:
341
342----
343static void walken_commit_walk(struct rev_info *rev)
344{
345 if (prepare_revision_walk(rev))
346 die(_("revision walk setup failed"));
347}
348----
349
350NOTE: `die()` prints to `stderr` and exits the program. Since it will print to
351`stderr` it's likely to be seen by a human, so we will localize it.
352
353==== Performing the Walk!
354
355Finally! We are ready to begin the walk itself. Now we can see that `rev_info`
356can also be used as an iterator; we move to the next item in the walk by using
357`get_revision()` repeatedly. Add the listed variable declarations at the top and
358the walk loop below the `prepare_revision_walk()` call within your
359`walken_commit_walk()`:
360
361----
362#include "pretty.h"
363
364...
365
366static void walken_commit_walk(struct rev_info *rev)
367{
368 struct commit *commit;
369 struct strbuf prettybuf = STRBUF_INIT;
370
371 ...
372
373 while ((commit = get_revision(rev))) {
374 strbuf_reset(&prettybuf);
375 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf);
376 puts(prettybuf.buf);
377 }
378 strbuf_release(&prettybuf);
379}
380----
381
382NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the
383command we expect to be machine-parsed, we're sending it directly to stdout.
384
385Give it a shot.
386
387----
388$ make
389$ ./bin-wrappers/git walken
390----
391
392You should see all of the subject lines of all the commits in
393your tree's history, in order, ending with the initial commit, "Initial revision
394of "git", the information manager from hell". Congratulations! You've written
395your first revision walk. You can play with printing some additional fields
396from each commit if you're curious; have a look at the functions available in
397`commit.h`.
398
399=== Adding a Filter
400
401Next, let's try to filter the commits we see based on their author. This is
402equivalent to running `git log --author=<pattern>`. We can add a filter by
403modifying `rev_info.grep_filter`, which is a `struct grep_opt`.
404
405First some setup. Add `grep_config()` to `git_walken_config()`:
406
407----
408static int git_walken_config(const char *var, const char *value,
409 const struct config_context *ctx, void *cb)
410{
411 grep_config(var, value, ctx, cb);
412 return git_default_config(var, value, ctx, cb);
413}
414----
415
416Next, we can modify the `grep_filter`. This is done with convenience functions
417found in `grep.h`. For fun, we're filtering to only commits from folks using a
418`gmail.com` email address - a not-very-precise guess at who may be working on
419Git as a hobby. Since we're checking the author, which is a specific line in the
420header, we'll use the `append_header_grep_pattern()` helper. We can use
421the `enum grep_header_field` to indicate which part of the commit header we want
422to search.
423
424In `final_rev_info_setup()`, add your filter line:
425
426----
427static void final_rev_info_setup(int argc, const char **argv,
428 const char *prefix, struct rev_info *rev)
429{
430 ...
431
432 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR,
433 "gmail");
434 compile_grep_patterns(&rev->grep_filter);
435
436 ...
437}
438----
439
440`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but
441it won't work unless we compile it with `compile_grep_patterns()`.
442
443NOTE: If you are using `setup_revisions()` (for example, if you are passing a
444`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need
445to call `compile_grep_patterns()` because `setup_revisions()` calls it for you.
446
447NOTE: We could add the same filter via the `append_grep_pattern()` helper if we
448wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and
449`enum grep_pat_token` for us.
450
451=== Changing the Order
452
453There are a few ways that we can change the order of the commits during a
454revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some
455typical orderings.
456
457`topo_order` is the same as `git log --topo-order`: we avoid showing a parent
458before all of its children have been shown, and we avoid mixing commits which
459are in different lines of history. (`git help log`'s section on `--topo-order`
460has a very nice diagram to illustrate this.)
461
462Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to
463`REV_SORT_BY_AUTHOR_DATE`. Add the following:
464
465----
466static void final_rev_info_setup(int argc, const char **argv,
467 const char *prefix, struct rev_info *rev)
468{
469 ...
470
471 rev->topo_order = 1;
472 rev->sort_order = REV_SORT_BY_COMMIT_DATE;
473
474 ...
475}
476----
477
478Let's output this into a file so we can easily diff it with the walk sorted by
479author date.
480
481----
482$ make
483$ ./bin-wrappers/git walken > commit-date.txt
484----
485
486Then, let's sort by author date and run it again.
487
488----
489static void final_rev_info_setup(int argc, const char **argv,
490 const char *prefix, struct rev_info *rev)
491{
492 ...
493
494 rev->topo_order = 1;
495 rev->sort_order = REV_SORT_BY_AUTHOR_DATE;
496
497 ...
498}
499----
500
501----
502$ make
503$ ./bin-wrappers/git walken > author-date.txt
504----
505
506Finally, compare the two. This is a little less helpful without object names or
507dates, but hopefully we get the idea.
508
509----
510$ diff -u commit-date.txt author-date.txt
511----
512
513This display indicates that commits can be reordered after they're written, for
514example with `git rebase`.
515
516Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag.
517Set that flag somewhere inside of `final_rev_info_setup()`:
518
519----
520static void final_rev_info_setup(int argc, const char **argv, const char *prefix,
521 struct rev_info *rev)
522{
523 ...
524
525 rev->reverse = 1;
526
527 ...
528}
529----
530
531Run your walk again and note the difference in order. (If you remove the grep
532pattern, you should see the last commit this call gives you as your current
533HEAD.)
534
535== Basic Object Walk
536
537So far we've been walking only commits. But Git has more types of objects than
538that! Let's see if we can walk _all_ objects, and find out some information
539about each one.
540
541We can base our work on an example. `git pack-objects` prepares all kinds of
542objects for packing into a bitmap or packfile. The work we are interested in
543resides in `builtin/pack-objects.c:get_object_list()`; examination of that
544function shows that the all-object walk is being performed by
545`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two
546functions reside in `list-objects.c`; examining the source shows that, despite
547the name, these functions traverse all kinds of objects. Let's have a look at
548the arguments to `traverse_commit_list()`.
549
550- `struct rev_info *revs`: This is the `rev_info` used for the walk. If
551 its `filter` member is not `NULL`, then `filter` contains information for
552 how to filter the object list.
553- `show_commit_fn show_commit`: A callback which will be used to handle each
554 individual commit object.
555- `show_object_fn show_object`: A callback which will be used to handle each
556 non-commit object (so each blob, tree, or tag).
557- `void *show_data`: A context buffer which is passed in turn to `show_commit`
558 and `show_object`.
559
560In addition, `traverse_commit_list_filtered()` has an additional parameter:
561
562- `struct oidset *omitted`: A linked-list of object IDs which the provided
563 filter caused to be omitted.
564
565It looks like these methods use callbacks we provide instead of needing us
566to call it repeatedly ourselves. Cool! Let's add the callbacks first.
567
568For the sake of this tutorial, we'll simply keep track of how many of each kind
569of object we find. At file scope in `builtin/walken.c` add the following
570tracking variables:
571
572----
573static int commit_count;
574static int tag_count;
575static int blob_count;
576static int tree_count;
577----
578
579Commits are handled by a different callback than other objects; let's do that
580one first:
581
582----
583static void walken_show_commit(struct commit *cmt, void *buf)
584{
585 commit_count++;
586}
587----
588
589The `cmt` argument is fairly self-explanatory. But it's worth mentioning that
590the `buf` argument is actually the context buffer that we can provide to the
591traversal calls - `show_data`, which we mentioned a moment ago.
592
593Since we have the `struct commit` object, we can look at all the same parts that
594we looked at in our earlier commit-only walk. For the sake of this tutorial,
595though, we'll just increment the commit counter and move on.
596
597The callback for non-commits is a little different, as we'll need to check
598which kind of object we're dealing with:
599
600----
601static void walken_show_object(struct object *obj, const char *str, void *buf)
602{
603 switch (obj->type) {
604 case OBJ_TREE:
605 tree_count++;
606 break;
607 case OBJ_BLOB:
608 blob_count++;
609 break;
610 case OBJ_TAG:
611 tag_count++;
612 break;
613 case OBJ_COMMIT:
614 BUG("unexpected commit object in walken_show_object\n");
615 default:
616 BUG("unexpected object type %s in walken_show_object\n",
617 type_name(obj->type));
618 }
619}
620----
621
622Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same
623context pointer that `walken_show_commit()` receives: the `show_data` argument
624to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally,
625`str` contains the name of the object, which ends up being something like
626`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag).
627
628To help assure us that we aren't double-counting commits, we'll include some
629complaining if a commit object is routed through our non-commit callback; we'll
630also complain if we see an invalid object type. Since those two cases should be
631unreachable, and would only change in the event of a semantic change to the Git
632codebase, we complain by using `BUG()` - which is a signal to a developer that
633the change they made caused unintended consequences, and the rest of the
634codebase needs to be updated to understand that change. `BUG()` is not intended
635to be seen by the public, so it is not localized.
636
637Our main object walk implementation is substantially different from our commit
638walk implementation, so let's make a new function to perform the object walk. We
639can perform setup which is applicable to all objects here, too, to keep separate
640from setup which is applicable to commit-only walks.
641
642We'll start by enabling all types of objects in the `struct rev_info`. We'll
643also turn on `tree_blobs_in_commit_order`, which means that we will walk a
644commit's tree and everything it points to immediately after we find each commit,
645as opposed to waiting for the end and walking through all trees after the commit
646history has been discovered. With the appropriate settings configured, we are
647ready to call `prepare_revision_walk()`.
648
649----
650static void walken_object_walk(struct rev_info *rev)
651{
652 rev->tree_objects = 1;
653 rev->blob_objects = 1;
654 rev->tag_objects = 1;
655 rev->tree_blobs_in_commit_order = 1;
656
657 if (prepare_revision_walk(rev))
658 die(_("revision walk setup failed"));
659
660 commit_count = 0;
661 tag_count = 0;
662 blob_count = 0;
663 tree_count = 0;
664----
665
666Let's start by calling just the unfiltered walk and reporting our counts.
667Complete your implementation of `walken_object_walk()`.
668We'll also need to include the `list-objects.h` header.
669
670----
671#include "list-objects.h"
672
673...
674
675 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL);
676
677 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count,
678 blob_count, tag_count, tree_count);
679}
680----
681
682NOTE: This output is intended to be machine-parsed. Therefore, we are not
683sending it to `trace_printf()`, and we are not localizing it - we need scripts
684to be able to count on the formatting to be exactly the way it is shown here.
685If we were intending this output to be read by humans, we would need to localize
686it with `_()`.
687
688Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing
689command line options is out of scope for this tutorial, so we'll just hardcode
690a branch we can change at compile time. Where you call `final_rev_info_setup()`
691and `walken_commit_walk()`, instead branch like so:
692
693----
694 if (1) {
695 add_head_to_pending(&rev);
696 walken_object_walk(&rev);
697 } else {
698 final_rev_info_setup(argc, argv, prefix, &rev);
699 walken_commit_walk(&rev);
700 }
701----
702
703NOTE: For simplicity, we've avoided all the filters and sorts we applied in
704`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you
705want, you can certainly use the filters we added before by moving
706`final_rev_info_setup()` out of the conditional and removing the call to
707`add_head_to_pending()`.
708
709Now we can try to run our command! It should take noticeably longer than the
710commit walk, but an examination of the output will give you an idea why. Your
711output should look similar to this example, but with different counts:
712
713----
714Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees.
715----
716
717This makes sense. We have more trees than commits because the Git project has
718lots of subdirectories which can change, plus at least one tree per commit. We
719have no tags because we started on a commit (`HEAD`) and while tags can point to
720commits, commits can't point to tags.
721
722NOTE: You will have different counts when you run this yourself! The number of
723objects grows along with the Git project.
724
725=== Adding a Filter
726
727There are a handful of filters that we can apply to the object walk laid out in
728`Documentation/rev-list-options.adoc`. These filters are typically useful for
729operations such as creating packfiles or performing a partial clone. They are
730defined in `list-objects-filter-options.h`. For the purposes of this tutorial we
731will use the "tree:1" filter, which causes the walk to omit all trees and blobs
732which are not directly referenced by commits reachable from the commit in
733`pending` when the walk begins. (`pending` is the list of objects which need to
734be traversed during a walk; you can imagine a breadth-first tree traversal to
735help understand. In our case, that means we omit trees and blobs not directly
736referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only
737`HEAD` in the `pending` list.)
738
739For now, we are not going to track the omitted objects, so we'll replace those
740parameters with `NULL`. For the sake of simplicity, we'll add a simple
741build-time branch to use our filter or not. Preface the line calling
742`traverse_commit_list()` with the following, which will remind us which kind of
743walk we've just performed:
744
745----
746 if (0) {
747 /* Unfiltered: */
748 trace_printf(_("Unfiltered object walk.\n"));
749 } else {
750 trace_printf(
751 _("Filtered object walk with filterspec 'tree:1'.\n"));
752
753 parse_list_objects_filter(&rev->filter, "tree:1");
754 }
755 traverse_commit_list(rev, walken_show_commit,
756 walken_show_object, NULL);
757----
758
759The `rev->filter` member is usually built directly from a command
760line argument, so the module provides an easy way to build one from a string.
761Even though we aren't taking user input right now, we can still build one with
762a hardcoded string using `parse_list_objects_filter()`.
763
764With the filter spec "tree:1", we are expecting to see _only_ the root tree for
765each commit; therefore, the tree object count should be less than or equal to
766the number of commits. (For an example of why that's true: `git commit --revert`
767points to the same tree object as its grandparent.)
768
769=== Counting Omitted Objects
770
771We also have the capability to enumerate all objects which were omitted by a
772filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this,
773change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is
774able to populate an `omitted` list. Asking for this list of filtered objects
775may cause performance degradations, however, because in this case, despite
776filtering objects, the possibly much larger set of all reachable objects must
777be processed in order to populate that list.
778
779First, add the `struct oidset` and related items we will use to iterate it:
780
781----
782#include "oidset.h"
783
784...
785
786static void walken_object_walk(
787 ...
788
789 struct oidset omitted;
790 struct oidset_iter oit;
791 struct object_id *oid = NULL;
792 int omitted_count = 0;
793 oidset_init(&omitted, 0);
794
795 ...
796----
797
798Replace the call to `traverse_commit_list()` with
799`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset
800defined and initialized above:
801
802----
803 ...
804
805 traverse_commit_list_filtered(rev,
806 walken_show_commit, walken_show_object, NULL, &omitted);
807
808 ...
809----
810
811Then, after your traversal, the `oidset` traversal is pretty straightforward.
812Count all the objects within and modify the print statement:
813
814----
815 /* Count the omitted objects. */
816 oidset_iter_init(&omitted, &oit);
817
818 while ((oid = oidset_iter_next(&oit)))
819 omitted_count++;
820
821 printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n",
822 commit_count, blob_count, tag_count, tree_count, omitted_count);
823----
824
825By running your walk with and without the filter, you should find that the total
826object count in each case is identical. You can also time each invocation of
827the `walken` subcommand, with and without `omitted` being passed in, to confirm
828to yourself the runtime impact of tracking all omitted objects.
829
830=== Changing the Order
831
832Finally, let's demonstrate that you can also reorder walks of all objects, not
833just walks of commits. First, we'll make our handlers chattier - modify
834`walken_show_commit()` and `walken_show_object()` to print the object as they
835go:
836
837----
838#include "hex.h"
839
840...
841
842static void walken_show_commit(struct commit *cmt, void *buf)
843{
844 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid));
845 commit_count++;
846}
847
848static void walken_show_object(struct object *obj, const char *str, void *buf)
849{
850 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid));
851
852 ...
853}
854----
855
856NOTE: Since we will be examining this output directly as humans, we'll use
857`trace_printf()` here. Additionally, since this change introduces a significant
858number of printed lines, using `trace_printf()` will allow us to easily silence
859those lines without having to recompile.
860
861(Leave the counter increment logic in place.)
862
863With only that change, run again (but save yourself some scrollback):
864
865----
866$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10
867----
868
869Take a look at the top commit with `git show` and the object ID you printed; it
870should be the same as the output of `git show HEAD`.
871
872Next, let's change a setting on our `struct rev_info` within
873`walken_object_walk()`. Find where you're changing the other settings on `rev`,
874such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the
875`reverse` setting at the bottom:
876
877----
878 ...
879
880 rev->tree_objects = 1;
881 rev->blob_objects = 1;
882 rev->tag_objects = 1;
883 rev->tree_blobs_in_commit_order = 1;
884 rev->reverse = 1;
885
886 ...
887----
888
889Now, run again, but this time, let's grab the last handful of objects instead
890of the first handful:
891
892----
893$ make
894$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10
895----
896
897The last commit object given should have the same OID as the one we saw at the
898top before, and running `git show <oid>` with that OID should give you again
899the same results as `git show HEAD`. Furthermore, if you run and examine the
900first ten lines again (with `head` instead of `tail` like we did before applying
901the `reverse` setting), you should see that now the first commit printed is the
902initial commit, `e83c5163`.
903
904== Wrapping Up
905
906Let's review. In this tutorial, we:
907
908- Built a commit walk from the ground up
909- Enabled a grep filter for that commit walk
910- Changed the sort order of that filtered commit walk
911- Built an object walk (tags, commits, trees, and blobs) from the ground up
912- Learned how to add a filter-spec to an object walk
913- Changed the display order of the filtered object walk