Git fork
at reftables-rust 913 lines 30 kB view raw
1= My First Object Walk 2 3== What's an Object Walk? 4 5The object walk is a key concept in Git - this is the process that underpins 6operations like object transfer and fsck. Beginning from a given commit, the 7list of objects is found by walking parent relationships between commits (commit 8X based on commit W) and containment relationships between objects (tree Y is 9contained within commit X, and blob Z is located within tree Y, giving our 10working tree for commit X something like `y/z.txt`). 11 12A related concept is the revision walk, which is focused on commit objects and 13their parent relationships and does not delve into other object types. The 14revision walk is used for operations like `git log`. 15 16=== Related Reading 17 18- `Documentation/user-manual.adoc` under "Hacking Git" contains some coverage of 19 the revision walker in its various incarnations. 20- `revision.h` 21- https://eagain.net/articles/git-for-computer-scientists/[Git for Computer Scientists] 22 gives a good overview of the types of objects in Git and what your object 23 walk is really describing. 24 25== Setting Up 26 27Create a new branch from `master`. 28 29---- 30git checkout -b revwalk origin/master 31---- 32 33We'll put our fiddling into a new command. For fun, let's name it `git walken`. 34Open up a new file `builtin/walken.c` and set up the command handler: 35 36---- 37/* 38 * "git walken" 39 * 40 * Part of the "My First Object Walk" tutorial. 41 */ 42 43#include "builtin.h" 44#include "trace.h" 45 46int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo) 47{ 48 trace_printf(_("cmd_walken incoming...\n")); 49 return 0; 50} 51---- 52 53NOTE: `trace_printf()`, defined in `trace.h`, differs from `printf()` in 54that it can be turned on or off at runtime. For the purposes of this 55tutorial, we will write `walken` as though it is intended for use as 56a "plumbing" command: that is, a command which is used primarily in 57scripts, rather than interactively by humans (a "porcelain" command). 58So we will send our debug output to `trace_printf()` instead. 59When running, enable trace output by setting the environment variable `GIT_TRACE`. 60 61Add usage text and `-h` handling, like all subcommands should consistently do 62(our test suite will notice and complain if you fail to do so). 63We'll need to include the `parse-options.h` header. 64 65---- 66#include "parse-options.h" 67 68... 69 70int cmd_walken(int argc, const char **argv, const char *prefix) 71{ 72 const char * const walken_usage[] = { 73 N_("git walken"), 74 NULL, 75 }; 76 struct option options[] = { 77 OPT_END() 78 }; 79 80 argc = parse_options(argc, argv, prefix, options, walken_usage, 0); 81 82 ... 83} 84---- 85 86Also add the relevant line in `builtin.h` near `cmd_version()`: 87 88---- 89int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo); 90---- 91 92Include the command in `git.c` in `commands[]` near the entry for `version`, 93maintaining alphabetical ordering: 94 95---- 96{ "walken", cmd_walken, RUN_SETUP }, 97---- 98 99Add an entry for the new command in the both the Make and Meson build system, 100before the entry for `worktree`: 101 102- In the `Makefile`: 103---- 104... 105BUILTIN_OBJS += builtin/walken.o 106... 107---- 108 109- In the `meson.build` file: 110---- 111builtin_sources = [ 112 ... 113 'builtin/walken.c', 114 ... 115] 116---- 117 118Build and test out your command, without forgetting to ensure the `DEVELOPER` 119flag is set, and with `GIT_TRACE` enabled so the debug output can be seen: 120 121---- 122$ echo DEVELOPER=1 >>config.mak 123$ make 124$ GIT_TRACE=1 ./bin-wrappers/git walken 125---- 126 127NOTE: For a more exhaustive overview of the new command process, take a look at 128`Documentation/MyFirstContribution.adoc`. 129 130NOTE: A reference implementation can be found at 131https://github.com/nasamuffin/git/tree/revwalk. 132 133=== `struct rev_cmdline_info` 134 135The definition of `struct rev_cmdline_info` can be found in `revision.h`. 136 137This struct is contained within the `rev_info` struct and is used to reflect 138parameters provided by the user over the CLI. 139 140`nr` represents the number of `rev_cmdline_entry` present in the array. 141 142`alloc` is used by the `ALLOC_GROW` macro. Check `alloc.h` - this variable is 143used to track the allocated size of the list. 144 145Per entry, we find: 146 147`item` is the object provided upon which to base the object walk. Items in Git 148can be blobs, trees, commits, or tags. (See `Documentation/gittutorial-2.adoc`.) 149 150`name` is the object ID (OID) of the object - a hex string you may be familiar 151with from using Git to organize your source in the past. Check the tutorial 152mentioned above towards the top for a discussion of where the OID can come 153from. 154 155`whence` indicates some information about what to do with the parents of the 156specified object. We'll explore this flag more later on; take a look at 157`Documentation/revisions.adoc` to get an idea of what could set the `whence` 158value. 159 160`flags` are used to hint the beginning of the revision walk and are the first 161block under the `#include`s in `revision.h`. The most likely ones to be set in 162the `rev_cmdline_info` are `UNINTERESTING` and `BOTTOM`, but these same flags 163can be used during the walk, as well. 164 165=== `struct rev_info` 166 167This one is quite a bit longer, and many fields are only used during the walk 168by `revision.c` - not configuration options. Most of the configurable flags in 169`struct rev_info` have a mirror in `Documentation/rev-list-options.adoc`. It's a 170good idea to take some time and read through that document. 171 172== Basic Commit Walk 173 174First, let's see if we can replicate the output of `git log --oneline`. We'll 175refer back to the implementation frequently to discover norms when performing 176an object walk of our own. 177 178To do so, we'll first find all the commits, in order, which preceded the current 179commit. We'll extract the name and subject of the commit from each. 180 181Ideally, we will also be able to find out which ones are currently at the tip of 182various branches. 183 184=== Setting Up 185 186Preparing for your object walk has some distinct stages. 187 1881. Perform default setup for this mode, and others which may be invoked. 1892. Check configuration files for relevant settings. 1903. Set up the `rev_info` struct. 1914. Tweak the initialized `rev_info` to suit the current walk. 1925. Prepare the `rev_info` for the walk. 1936. Iterate over the objects, processing each one. 194 195==== Default Setups 196 197Before examining configuration files which may modify command behavior, set up 198default state for switches or options your command may have. If your command 199utilizes other Git components, ask them to set up their default states as well. 200For instance, `git log` takes advantage of `grep` and `diff` functionality, so 201its `init_log_defaults()` sets its own state (`decoration_style`) and asks 202`grep` and `diff` to initialize themselves by calling each of their 203initialization functions. 204 205==== Configuring From `.gitconfig` 206 207Next, we should have a look at any relevant configuration settings (i.e., 208settings readable and settable from `git config`). This is done by providing a 209callback to `repo_config()`; within that callback, you can also invoke methods 210from other components you may need that need to intercept these options. Your 211callback will be invoked once per each configuration value which Git knows about 212(global, local, worktree, etc.). 213 214Similarly to the default values, we don't have anything to do here yet 215ourselves; however, we should call `git_default_config()` if we aren't calling 216any other existing config callbacks. 217 218Add a new function to `builtin/walken.c`. 219We'll also need to include the `config.h` header: 220 221---- 222#include "config.h" 223 224... 225 226static int git_walken_config(const char *var, const char *value, 227 const struct config_context *ctx, void *cb) 228{ 229 /* 230 * For now, we don't have any custom configuration, so fall back to 231 * the default config. 232 */ 233 return git_default_config(var, value, ctx, cb); 234} 235---- 236 237Make sure to invoke `repo_config()` with it in your `cmd_walken()`: 238 239---- 240int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo) 241{ 242 ... 243 244 repo_config(repo, git_walken_config, NULL); 245 246 ... 247} 248---- 249 250==== Setting Up `rev_info` 251 252Now that we've gathered external configuration and options, it's time to 253initialize the `rev_info` object which we will use to perform the walk. This is 254typically done by calling `repo_init_revisions()` with the repository you intend 255to target, as well as the `prefix` argument of `cmd_walken` and your `rev_info` 256struct. 257 258Add the `struct rev_info` and the `repo_init_revisions()` call. 259We'll also need to include the `revision.h` header: 260 261---- 262#include "revision.h" 263 264... 265 266int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo) 267{ 268 /* This can go wherever you like in your declarations.*/ 269 struct rev_info rev; 270 ... 271 272 /* This should go after the repo_config() call. */ 273 repo_init_revisions(repo, &rev, prefix); 274 275 ... 276} 277---- 278 279==== Tweaking `rev_info` For the Walk 280 281We're getting close, but we're still not quite ready to go. Now that `rev` is 282initialized, we can modify it to fit our needs. This is usually done within a 283helper for clarity, so let's add one: 284 285---- 286static void final_rev_info_setup(struct rev_info *rev) 287{ 288 /* 289 * We want to mimic the appearance of `git log --oneline`, so let's 290 * force oneline format. 291 */ 292 get_commit_format("oneline", rev); 293 294 /* Start our object walk at HEAD. */ 295 add_head_to_pending(rev); 296} 297---- 298 299[NOTE] 300==== 301Instead of using the shorthand `add_head_to_pending()`, you could do 302something like this: 303 304---- 305 struct setup_revision_opt opt; 306 307 memset(&opt, 0, sizeof(opt)); 308 opt.def = "HEAD"; 309 opt.revarg_opt = REVARG_COMMITTISH; 310 setup_revisions(argc, argv, rev, &opt); 311---- 312 313Using a `setup_revision_opt` gives you finer control over your walk's starting 314point. 315==== 316 317Then let's invoke `final_rev_info_setup()` after the call to 318`repo_init_revisions()`: 319 320---- 321int cmd_walken(int argc, const char **argv, const char *prefix, struct repository *repo) 322{ 323 ... 324 325 final_rev_info_setup(&rev); 326 327 ... 328} 329---- 330 331Later, we may wish to add more arguments to `final_rev_info_setup()`. But for 332now, this is all we need. 333 334==== Preparing `rev_info` For the Walk 335 336Now that `rev` is all initialized and configured, we've got one more setup step 337before we get rolling. We can do this in a helper, which will both prepare the 338`rev_info` for the walk, and perform the walk itself. Let's start the helper 339with the call to `prepare_revision_walk()`, which can return an error without 340dying on its own: 341 342---- 343static void walken_commit_walk(struct rev_info *rev) 344{ 345 if (prepare_revision_walk(rev)) 346 die(_("revision walk setup failed")); 347} 348---- 349 350NOTE: `die()` prints to `stderr` and exits the program. Since it will print to 351`stderr` it's likely to be seen by a human, so we will localize it. 352 353==== Performing the Walk! 354 355Finally! We are ready to begin the walk itself. Now we can see that `rev_info` 356can also be used as an iterator; we move to the next item in the walk by using 357`get_revision()` repeatedly. Add the listed variable declarations at the top and 358the walk loop below the `prepare_revision_walk()` call within your 359`walken_commit_walk()`: 360 361---- 362#include "pretty.h" 363 364... 365 366static void walken_commit_walk(struct rev_info *rev) 367{ 368 struct commit *commit; 369 struct strbuf prettybuf = STRBUF_INIT; 370 371 ... 372 373 while ((commit = get_revision(rev))) { 374 strbuf_reset(&prettybuf); 375 pp_commit_easy(CMIT_FMT_ONELINE, commit, &prettybuf); 376 puts(prettybuf.buf); 377 } 378 strbuf_release(&prettybuf); 379} 380---- 381 382NOTE: `puts()` prints a `char*` to `stdout`. Since this is the part of the 383command we expect to be machine-parsed, we're sending it directly to stdout. 384 385Give it a shot. 386 387---- 388$ make 389$ ./bin-wrappers/git walken 390---- 391 392You should see all of the subject lines of all the commits in 393your tree's history, in order, ending with the initial commit, "Initial revision 394of "git", the information manager from hell". Congratulations! You've written 395your first revision walk. You can play with printing some additional fields 396from each commit if you're curious; have a look at the functions available in 397`commit.h`. 398 399=== Adding a Filter 400 401Next, let's try to filter the commits we see based on their author. This is 402equivalent to running `git log --author=<pattern>`. We can add a filter by 403modifying `rev_info.grep_filter`, which is a `struct grep_opt`. 404 405First some setup. Add `grep_config()` to `git_walken_config()`: 406 407---- 408static int git_walken_config(const char *var, const char *value, 409 const struct config_context *ctx, void *cb) 410{ 411 grep_config(var, value, ctx, cb); 412 return git_default_config(var, value, ctx, cb); 413} 414---- 415 416Next, we can modify the `grep_filter`. This is done with convenience functions 417found in `grep.h`. For fun, we're filtering to only commits from folks using a 418`gmail.com` email address - a not-very-precise guess at who may be working on 419Git as a hobby. Since we're checking the author, which is a specific line in the 420header, we'll use the `append_header_grep_pattern()` helper. We can use 421the `enum grep_header_field` to indicate which part of the commit header we want 422to search. 423 424In `final_rev_info_setup()`, add your filter line: 425 426---- 427static void final_rev_info_setup(int argc, const char **argv, 428 const char *prefix, struct rev_info *rev) 429{ 430 ... 431 432 append_header_grep_pattern(&rev->grep_filter, GREP_HEADER_AUTHOR, 433 "gmail"); 434 compile_grep_patterns(&rev->grep_filter); 435 436 ... 437} 438---- 439 440`append_header_grep_pattern()` adds your new "gmail" pattern to `rev_info`, but 441it won't work unless we compile it with `compile_grep_patterns()`. 442 443NOTE: If you are using `setup_revisions()` (for example, if you are passing a 444`setup_revision_opt` instead of using `add_head_to_pending()`), you don't need 445to call `compile_grep_patterns()` because `setup_revisions()` calls it for you. 446 447NOTE: We could add the same filter via the `append_grep_pattern()` helper if we 448wanted to, but `append_header_grep_pattern()` adds the `enum grep_context` and 449`enum grep_pat_token` for us. 450 451=== Changing the Order 452 453There are a few ways that we can change the order of the commits during a 454revision walk. Firstly, we can use the `enum rev_sort_order` to choose from some 455typical orderings. 456 457`topo_order` is the same as `git log --topo-order`: we avoid showing a parent 458before all of its children have been shown, and we avoid mixing commits which 459are in different lines of history. (`git help log`'s section on `--topo-order` 460has a very nice diagram to illustrate this.) 461 462Let's see what happens when we run with `REV_SORT_BY_COMMIT_DATE` as opposed to 463`REV_SORT_BY_AUTHOR_DATE`. Add the following: 464 465---- 466static void final_rev_info_setup(int argc, const char **argv, 467 const char *prefix, struct rev_info *rev) 468{ 469 ... 470 471 rev->topo_order = 1; 472 rev->sort_order = REV_SORT_BY_COMMIT_DATE; 473 474 ... 475} 476---- 477 478Let's output this into a file so we can easily diff it with the walk sorted by 479author date. 480 481---- 482$ make 483$ ./bin-wrappers/git walken > commit-date.txt 484---- 485 486Then, let's sort by author date and run it again. 487 488---- 489static void final_rev_info_setup(int argc, const char **argv, 490 const char *prefix, struct rev_info *rev) 491{ 492 ... 493 494 rev->topo_order = 1; 495 rev->sort_order = REV_SORT_BY_AUTHOR_DATE; 496 497 ... 498} 499---- 500 501---- 502$ make 503$ ./bin-wrappers/git walken > author-date.txt 504---- 505 506Finally, compare the two. This is a little less helpful without object names or 507dates, but hopefully we get the idea. 508 509---- 510$ diff -u commit-date.txt author-date.txt 511---- 512 513This display indicates that commits can be reordered after they're written, for 514example with `git rebase`. 515 516Let's try one more reordering of commits. `rev_info` exposes a `reverse` flag. 517Set that flag somewhere inside of `final_rev_info_setup()`: 518 519---- 520static void final_rev_info_setup(int argc, const char **argv, const char *prefix, 521 struct rev_info *rev) 522{ 523 ... 524 525 rev->reverse = 1; 526 527 ... 528} 529---- 530 531Run your walk again and note the difference in order. (If you remove the grep 532pattern, you should see the last commit this call gives you as your current 533HEAD.) 534 535== Basic Object Walk 536 537So far we've been walking only commits. But Git has more types of objects than 538that! Let's see if we can walk _all_ objects, and find out some information 539about each one. 540 541We can base our work on an example. `git pack-objects` prepares all kinds of 542objects for packing into a bitmap or packfile. The work we are interested in 543resides in `builtin/pack-objects.c:get_object_list()`; examination of that 544function shows that the all-object walk is being performed by 545`traverse_commit_list()` or `traverse_commit_list_filtered()`. Those two 546functions reside in `list-objects.c`; examining the source shows that, despite 547the name, these functions traverse all kinds of objects. Let's have a look at 548the arguments to `traverse_commit_list()`. 549 550- `struct rev_info *revs`: This is the `rev_info` used for the walk. If 551 its `filter` member is not `NULL`, then `filter` contains information for 552 how to filter the object list. 553- `show_commit_fn show_commit`: A callback which will be used to handle each 554 individual commit object. 555- `show_object_fn show_object`: A callback which will be used to handle each 556 non-commit object (so each blob, tree, or tag). 557- `void *show_data`: A context buffer which is passed in turn to `show_commit` 558 and `show_object`. 559 560In addition, `traverse_commit_list_filtered()` has an additional parameter: 561 562- `struct oidset *omitted`: A linked-list of object IDs which the provided 563 filter caused to be omitted. 564 565It looks like these methods use callbacks we provide instead of needing us 566to call it repeatedly ourselves. Cool! Let's add the callbacks first. 567 568For the sake of this tutorial, we'll simply keep track of how many of each kind 569of object we find. At file scope in `builtin/walken.c` add the following 570tracking variables: 571 572---- 573static int commit_count; 574static int tag_count; 575static int blob_count; 576static int tree_count; 577---- 578 579Commits are handled by a different callback than other objects; let's do that 580one first: 581 582---- 583static void walken_show_commit(struct commit *cmt, void *buf) 584{ 585 commit_count++; 586} 587---- 588 589The `cmt` argument is fairly self-explanatory. But it's worth mentioning that 590the `buf` argument is actually the context buffer that we can provide to the 591traversal calls - `show_data`, which we mentioned a moment ago. 592 593Since we have the `struct commit` object, we can look at all the same parts that 594we looked at in our earlier commit-only walk. For the sake of this tutorial, 595though, we'll just increment the commit counter and move on. 596 597The callback for non-commits is a little different, as we'll need to check 598which kind of object we're dealing with: 599 600---- 601static void walken_show_object(struct object *obj, const char *str, void *buf) 602{ 603 switch (obj->type) { 604 case OBJ_TREE: 605 tree_count++; 606 break; 607 case OBJ_BLOB: 608 blob_count++; 609 break; 610 case OBJ_TAG: 611 tag_count++; 612 break; 613 case OBJ_COMMIT: 614 BUG("unexpected commit object in walken_show_object\n"); 615 default: 616 BUG("unexpected object type %s in walken_show_object\n", 617 type_name(obj->type)); 618 } 619} 620---- 621 622Again, `obj` is fairly self-explanatory, and we can guess that `buf` is the same 623context pointer that `walken_show_commit()` receives: the `show_data` argument 624to `traverse_commit_list()` and `traverse_commit_list_filtered()`. Finally, 625`str` contains the name of the object, which ends up being something like 626`foo.txt` (blob), `bar/baz` (tree), or `v1.2.3` (tag). 627 628To help assure us that we aren't double-counting commits, we'll include some 629complaining if a commit object is routed through our non-commit callback; we'll 630also complain if we see an invalid object type. Since those two cases should be 631unreachable, and would only change in the event of a semantic change to the Git 632codebase, we complain by using `BUG()` - which is a signal to a developer that 633the change they made caused unintended consequences, and the rest of the 634codebase needs to be updated to understand that change. `BUG()` is not intended 635to be seen by the public, so it is not localized. 636 637Our main object walk implementation is substantially different from our commit 638walk implementation, so let's make a new function to perform the object walk. We 639can perform setup which is applicable to all objects here, too, to keep separate 640from setup which is applicable to commit-only walks. 641 642We'll start by enabling all types of objects in the `struct rev_info`. We'll 643also turn on `tree_blobs_in_commit_order`, which means that we will walk a 644commit's tree and everything it points to immediately after we find each commit, 645as opposed to waiting for the end and walking through all trees after the commit 646history has been discovered. With the appropriate settings configured, we are 647ready to call `prepare_revision_walk()`. 648 649---- 650static void walken_object_walk(struct rev_info *rev) 651{ 652 rev->tree_objects = 1; 653 rev->blob_objects = 1; 654 rev->tag_objects = 1; 655 rev->tree_blobs_in_commit_order = 1; 656 657 if (prepare_revision_walk(rev)) 658 die(_("revision walk setup failed")); 659 660 commit_count = 0; 661 tag_count = 0; 662 blob_count = 0; 663 tree_count = 0; 664---- 665 666Let's start by calling just the unfiltered walk and reporting our counts. 667Complete your implementation of `walken_object_walk()`. 668We'll also need to include the `list-objects.h` header. 669 670---- 671#include "list-objects.h" 672 673... 674 675 traverse_commit_list(rev, walken_show_commit, walken_show_object, NULL); 676 677 printf("commits %d\nblobs %d\ntags %d\ntrees %d\n", commit_count, 678 blob_count, tag_count, tree_count); 679} 680---- 681 682NOTE: This output is intended to be machine-parsed. Therefore, we are not 683sending it to `trace_printf()`, and we are not localizing it - we need scripts 684to be able to count on the formatting to be exactly the way it is shown here. 685If we were intending this output to be read by humans, we would need to localize 686it with `_()`. 687 688Finally, we'll ask `cmd_walken()` to use the object walk instead. Discussing 689command line options is out of scope for this tutorial, so we'll just hardcode 690a branch we can change at compile time. Where you call `final_rev_info_setup()` 691and `walken_commit_walk()`, instead branch like so: 692 693---- 694 if (1) { 695 add_head_to_pending(&rev); 696 walken_object_walk(&rev); 697 } else { 698 final_rev_info_setup(argc, argv, prefix, &rev); 699 walken_commit_walk(&rev); 700 } 701---- 702 703NOTE: For simplicity, we've avoided all the filters and sorts we applied in 704`final_rev_info_setup()` and simply added `HEAD` to our pending queue. If you 705want, you can certainly use the filters we added before by moving 706`final_rev_info_setup()` out of the conditional and removing the call to 707`add_head_to_pending()`. 708 709Now we can try to run our command! It should take noticeably longer than the 710commit walk, but an examination of the output will give you an idea why. Your 711output should look similar to this example, but with different counts: 712 713---- 714Object walk completed. Found 55733 commits, 100274 blobs, 0 tags, and 104210 trees. 715---- 716 717This makes sense. We have more trees than commits because the Git project has 718lots of subdirectories which can change, plus at least one tree per commit. We 719have no tags because we started on a commit (`HEAD`) and while tags can point to 720commits, commits can't point to tags. 721 722NOTE: You will have different counts when you run this yourself! The number of 723objects grows along with the Git project. 724 725=== Adding a Filter 726 727There are a handful of filters that we can apply to the object walk laid out in 728`Documentation/rev-list-options.adoc`. These filters are typically useful for 729operations such as creating packfiles or performing a partial clone. They are 730defined in `list-objects-filter-options.h`. For the purposes of this tutorial we 731will use the "tree:1" filter, which causes the walk to omit all trees and blobs 732which are not directly referenced by commits reachable from the commit in 733`pending` when the walk begins. (`pending` is the list of objects which need to 734be traversed during a walk; you can imagine a breadth-first tree traversal to 735help understand. In our case, that means we omit trees and blobs not directly 736referenced by `HEAD` or `HEAD`'s history, because we begin the walk with only 737`HEAD` in the `pending` list.) 738 739For now, we are not going to track the omitted objects, so we'll replace those 740parameters with `NULL`. For the sake of simplicity, we'll add a simple 741build-time branch to use our filter or not. Preface the line calling 742`traverse_commit_list()` with the following, which will remind us which kind of 743walk we've just performed: 744 745---- 746 if (0) { 747 /* Unfiltered: */ 748 trace_printf(_("Unfiltered object walk.\n")); 749 } else { 750 trace_printf( 751 _("Filtered object walk with filterspec 'tree:1'.\n")); 752 753 parse_list_objects_filter(&rev->filter, "tree:1"); 754 } 755 traverse_commit_list(rev, walken_show_commit, 756 walken_show_object, NULL); 757---- 758 759The `rev->filter` member is usually built directly from a command 760line argument, so the module provides an easy way to build one from a string. 761Even though we aren't taking user input right now, we can still build one with 762a hardcoded string using `parse_list_objects_filter()`. 763 764With the filter spec "tree:1", we are expecting to see _only_ the root tree for 765each commit; therefore, the tree object count should be less than or equal to 766the number of commits. (For an example of why that's true: `git commit --revert` 767points to the same tree object as its grandparent.) 768 769=== Counting Omitted Objects 770 771We also have the capability to enumerate all objects which were omitted by a 772filter, like with `git log --filter=<spec> --filter-print-omitted`. To do this, 773change `traverse_commit_list()` to `traverse_commit_list_filtered()`, which is 774able to populate an `omitted` list. Asking for this list of filtered objects 775may cause performance degradations, however, because in this case, despite 776filtering objects, the possibly much larger set of all reachable objects must 777be processed in order to populate that list. 778 779First, add the `struct oidset` and related items we will use to iterate it: 780 781---- 782#include "oidset.h" 783 784... 785 786static void walken_object_walk( 787 ... 788 789 struct oidset omitted; 790 struct oidset_iter oit; 791 struct object_id *oid = NULL; 792 int omitted_count = 0; 793 oidset_init(&omitted, 0); 794 795 ... 796---- 797 798Replace the call to `traverse_commit_list()` with 799`traverse_commit_list_filtered()` and pass a pointer to the `omitted` oidset 800defined and initialized above: 801 802---- 803 ... 804 805 traverse_commit_list_filtered(rev, 806 walken_show_commit, walken_show_object, NULL, &omitted); 807 808 ... 809---- 810 811Then, after your traversal, the `oidset` traversal is pretty straightforward. 812Count all the objects within and modify the print statement: 813 814---- 815 /* Count the omitted objects. */ 816 oidset_iter_init(&omitted, &oit); 817 818 while ((oid = oidset_iter_next(&oit))) 819 omitted_count++; 820 821 printf("commits %d\nblobs %d\ntags %d\ntrees %d\nomitted %d\n", 822 commit_count, blob_count, tag_count, tree_count, omitted_count); 823---- 824 825By running your walk with and without the filter, you should find that the total 826object count in each case is identical. You can also time each invocation of 827the `walken` subcommand, with and without `omitted` being passed in, to confirm 828to yourself the runtime impact of tracking all omitted objects. 829 830=== Changing the Order 831 832Finally, let's demonstrate that you can also reorder walks of all objects, not 833just walks of commits. First, we'll make our handlers chattier - modify 834`walken_show_commit()` and `walken_show_object()` to print the object as they 835go: 836 837---- 838#include "hex.h" 839 840... 841 842static void walken_show_commit(struct commit *cmt, void *buf) 843{ 844 trace_printf("commit: %s\n", oid_to_hex(&cmt->object.oid)); 845 commit_count++; 846} 847 848static void walken_show_object(struct object *obj, const char *str, void *buf) 849{ 850 trace_printf("%s: %s\n", type_name(obj->type), oid_to_hex(&obj->oid)); 851 852 ... 853} 854---- 855 856NOTE: Since we will be examining this output directly as humans, we'll use 857`trace_printf()` here. Additionally, since this change introduces a significant 858number of printed lines, using `trace_printf()` will allow us to easily silence 859those lines without having to recompile. 860 861(Leave the counter increment logic in place.) 862 863With only that change, run again (but save yourself some scrollback): 864 865---- 866$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | head -n 10 867---- 868 869Take a look at the top commit with `git show` and the object ID you printed; it 870should be the same as the output of `git show HEAD`. 871 872Next, let's change a setting on our `struct rev_info` within 873`walken_object_walk()`. Find where you're changing the other settings on `rev`, 874such as `rev->tree_objects` and `rev->tree_blobs_in_commit_order`, and add the 875`reverse` setting at the bottom: 876 877---- 878 ... 879 880 rev->tree_objects = 1; 881 rev->blob_objects = 1; 882 rev->tag_objects = 1; 883 rev->tree_blobs_in_commit_order = 1; 884 rev->reverse = 1; 885 886 ... 887---- 888 889Now, run again, but this time, let's grab the last handful of objects instead 890of the first handful: 891 892---- 893$ make 894$ GIT_TRACE=1 ./bin-wrappers/git walken 2>&1 | tail -n 10 895---- 896 897The last commit object given should have the same OID as the one we saw at the 898top before, and running `git show <oid>` with that OID should give you again 899the same results as `git show HEAD`. Furthermore, if you run and examine the 900first ten lines again (with `head` instead of `tail` like we did before applying 901the `reverse` setting), you should see that now the first commit printed is the 902initial commit, `e83c5163`. 903 904== Wrapping Up 905 906Let's review. In this tutorial, we: 907 908- Built a commit walk from the ground up 909- Enabled a grep filter for that commit walk 910- Changed the sort order of that filtered commit walk 911- Built an object walk (tags, commits, trees, and blobs) from the ground up 912- Learned how to add a filter-spec to an object walk 913- Changed the display order of the filtered object walk