just playing with tangled
at docs-prerelease 146 lines 6.8 kB view raw view rendered
1# Git submodule storage 2 3## Objective 4 5Decide what approach(es) to Git submodule storage we should pursue. 6The decision will be recorded in [./git-submodules.md](./git-submodules.md). 7 8## Use cases to consider 9 10The submodule storage format should support the workflows specified in the 11[submodules roadmap](./git-submodules.md). It should be obvious how "Phase 1" 12requirements will be supported, and we should have an idea of how "Phases 2,3,X" 13might be supported. 14 15Notable use cases and workflows are noted below. 16 17### Fetching submodule commits 18 19Git's protocol is designed for communicating between copies of the same 20repository. Notably, a Git fetch calculates the list of required objects by 21performing reachability checks between the refs on the local and the remote 22side. We should expect that this will only work well if the submodule repository 23is stored as a local Git repository. 24 25Rolling our own Git fetch is too complex to be worth the effort. 26 27### "jj op restore" and operation log format 28 29We want `jj op restore` to restore to an "expected" state in the submodule. 30There is a potential distinction between running `jj op restore` in the 31superproject vs in the submodule, and the expected behavior may be different in 32each case, e.g. in the superproject, it might be enough to restore the submodule 33working copy, but in the submodule, refs also need to be restored. 34 35Currently, the operation log only references objects and refs in the 36superproject, so it is likely that proposed approaches will need to extend this 37format. It is also worth considering that submodules may be added, updated or 38removed in superproject commits, thus the list of submodules is likely to change 39over the repository's lifetime. 40 41### Nested submodules 42 43Git submodules may contain submodules themselves, so our chosen storage schemes 44should support that. 45 46We should consider limiting the recursion depth to avoid nasty edge cases (e.g. 47cyclical submodules.) that might surprise users. 48 49### Supporting future extensions 50 51There are certain extensions we may want to make in the future, but we don't 52have a timeline for them today. Proposed approaches should take these 53extensions into account (e.g. the approach should be theoretically extensible), 54but a full proposal for implementing them is not necessary. 55 56These extensions are: 57 58- Non-git subrepos 59- Colocated Git repos 60- The superproject using a non-git backend 61 62## Proposed design 63 64Git submodules will be stored as full jj repos. In the code, jj commands will 65only interact with the submodule's repo as an entire unit, e.g. it cannot query 66the submodule's commit backend directly. A well-abstracted submodule will extend 67well to non-git backends and non-git subrepos. 68 69The main challenge with this approach is that the submodule repo can be in a 70state that is internally valid (when considering only the submodule's repo), but 71invalid when considering the superproject-submodule system. This will be managed 72by requiring all submodule interactions go through the superproject so that 73superproject-submodule coordination can occur. For example, jj will not allow 74the user to work on the submodule's repo without going through the superproject 75(unlike Git). 76 77The notable workflows could be addressed like so: 78 79### Fetching submodule commits 80 81The submodule would fetch using the equivalent of `jj git fetch`. It remains to 82be decided how a "recursive" fetch should work, especially if a newly fetched 83superproject commit references an unfetched submodule commit. A reasonable 84approximation would be to fetch all branches in the submodule, and then, if the 85submodule commit is still missing, gracefully handle it. 86 87### "jj op restore" and operation log format 88 89As full repos, each submodule will have its own operation log. We will continue 90to use the existing operation log format, where each operation log tracks their 91own repo's commits. As commands are run in the superproject, corresponding 92commands will be run in the submodule as necessary, e.g. checking out a 93superproject commit will cause a submodule commit to also be checked out. 94 95Since there is no association between a superproject operation and a submodule 96operation, `jj op restore` in the superproject will not restore the submodule to 97a previous operation. Instead, the appropriate submodule operation(s) will be 98created. This is sufficient to preserve the superproject-submodule relationship; 99it precludes "recursive" restore (e.g. restoring branches in the superproject 100and submodules) but it seems unlikely that we will need such a thing. 101 102### Nested submodules 103 104Since submodules are full repos, they can contain submodules themselves. Nesting 105is unlikely to complicate any of the core features, since the top-level 106superproject/submodule relationship is almost identical to the submodule/nested 107submodule relationship. 108 109### Extending to colocated Git repos 110 111Git expects submodules to be in `.git/modules`, so it will not understand this 112storage format. To support colocated Git repos, we will have to change Git to 113allow a submodule's gitdir to be in an alternate location (e.g. we could add a 114new `submodule.<name>.gitdir` config option). This is a simple change, so it 115should be feasible. 116 117## Alternatives considered 118 119### Git repos in the main Git backend 120 121Since the Git backend contains a Git repository, an 'obvious' default would be 122to store them in the Git superproject the same way Git does, i.e. in 123`.git/modules`. Since Git submodules are full repositories that can have 124submodules, this storage scheme naturally extends to nested submodules. 125 126Most of the work in storing submodules and querying them would be well-isolated 127to the Git backend, which gives us a lot of flexibility to make changes without 128affecting the rest of jj. However, the operation log will need a significant 129rework since it isn't designed to reference submodules, and handling edge cases 130(e.g. a submodule being added/removed, nested submodules) will be tricky. 131 132This is rejected because handling that operation log complexity isn't worth it 133when very little of the work extends to non-Git backends. 134 135### Store Git submodules as alternate Git backends 136 137Teach jj to use multiple commit backends and store Git submodules as Git 138backends. Since submodules are separate from the 'main' backend, a repository 139can use whatever backend it wants as its 'main' one, while still having Git 140submodules in the 'alternate' Git backends. 141 142This approach extends fairly well to non-Git submodules (which would be stored 143in non-Git commit backends). However, this requires significantly reworking the 144operation log to account for multiple commit backends. It is also not clear how 145nested submodules will be supported since there isn't an obvious way to 146represent a nested submodule's relationship to its superproject.