docs/technical/concurrency.md at ig/vimdiffwarn

ilyagr.bsky.social / jj
fork atom
just playing with tangled
fork atom
jj / docs / technical / concurrency.md
at ig/vimdiffwarn 127 lines 7.3 kB view raw view rendered
wrap content
Martin von Zweigbergk docs: fix a mention of the deleted `jj checkout` command 1y ago
fe06ef47
  1# Concurrency
  2
  3## Introduction
  4
  5Concurrent editing is a key feature of DVCSs -- that's why they're called
  6*Distributed* Version Control Systems. A DVCS that didn't let users edit files
  7and create commits on separate machines at the same time wouldn't be much
  8of a distributed VCS.
  9
 10When conflicting changes are made in different clones, a DVCS will have to deal
 11with that when you push or pull. For example, when using Mercurial, if the
 12remote has updated a bookmark called `main` (Mercurial's bookmarks are similar
 13to a Git's branches) and you had updated the same bookmark locally but made it
 14point to a different target, Mercurial would add a bookmark called `main@origin`
 15to indicate the conflict. Git instead prevents the conflict by renaming pulled
 16branches to `origin/main` whether or not there was a conflict. However, most
 17DVCSs treat local concurrency quite differently, typically by using lock files
 18to prevent concurrent edits. Unlike those DVCSs, Jujutsu treats concurrent edits
 19the same whether they're made locally or remotely.
 20
 21One problem with using lock files is that they don't work when the clone is in a
 22distributed file system. Most clones are of course not stored in distributed
 23file systems, but it is a *big* problem when they are (Mercurial repos
 24frequently get corrupted, for example).
 25
 26Another problem with using lock files is related to complexity of
 27implementation. The simplest way of using lock files is to take coarse-grained
 28locks early: every command that may modify the repo takes a lock at the very
 29beginning. However, that means that operations that wouldn't actually conflict
 30would still have to wait for each other. The user experience can be improved by
 31using finer-grained locks and/or taking the locks later. The drawback of that is
 32complexity. For example, you need to verify that any assumptions you made before
 33locking are still valid after you take the lock.
 34
 35To avoid depending on lock files, Jujutsu takes a different approach by
 36accepting that concurrent changes can always happen. It instead exposes any
 37conflicting changes to the user, much like other DVCSs do for conflicting
 38changes made remotely.
 39
 40### Syncing with `rsync`, NFS, Dropbox, etc
 41
 42Jujutsu's lock-free concurrency means that it's possible to update copies of the
 43clone on different machines and then let `rsync` (or Dropbox, or NFS, etc.)
 44merge them. The working copy may mismatch what's supposed to be checked out, but
 45no changes to the repo will be lost (added commits, moved bookmarks, etc.). If
 46conflicting changes were made, they will appear as conflicts. For example, if a
 47bookmark was moved to two different locations, they will appear in `jj log` in
 48both locations but with a "?" after the name, and `jj status` will also inform
 49the user about the conflict.
 50
 51Note that, for now, there are known bugs in this area. Most notably, with the
 52Git backend, [repository corruption is possible because the backend is not
 53entirely lock-free](https://github.com/jj-vcs/jj/issues/2193). If you know
 54about the bug, it is relatively easy to recover from.
 55
 56Moreover, such use of Jujutsu is not currently thoroughly tested,
 57especially in the context of [co-located
 58repositories](../glossary.md#co-located-repos). While the contents of commits
 59should be safe, concurrent modification of a repository from different computers
 60might conceivably lose some bookmark pointers. Note that, unlike in pure
 61Git, losing a bookmark pointer does not lead to losing commits.
 62
 63
 64## Operation log
 65
 66The most important piece in the lock-free design is the "operation log". That is
 67what allows us to detect and merge divergent operations.
 68
 69The operation log is similar to a commit DAG (such as in
 70[Git's object model](https://git-scm.com/book/en/v2/Git-Internals-Git-Objects)),
 71but each commit object is instead an "operation" and each tree object is instead
 72a "view". The view object contains the set of visible head commits, bookmarks,
 73tags, and the working-copy commit in each workspace. The operation object
 74contains a pointer to the view object (like how commit objects point to tree
 75objects), pointers to parent operation(s) (like how commit objects point to
 76parent commit(s)), and metadata about the operation. These types are defined
 77in `op_store.proto` The operation log is normally linear.
 78It becomes non-linear if there are divergent operations.
 79
 80When a command starts, it loads the repo at the latest operation. Because the
 81associated view object completely defines the repo state, the running command
 82will not see any changes made by other processes thereafter. When the operation
 83completes, it is written with the start operation as parent. The operation
 84cannot fail to commit (except for disk failures and such). It is left for the
 85next command to notice if there were divergent operations. It will have to be
 86able to do that anyway since the concurrent operation could have arrived via a
 87distributed file system. This model -- where each operation sees a consistent
 88view of the repo and is guaranteed to be able to commit their changes -- greatly
 89simplifies the implementation of commands.
 90
 91It is possible to load the repo at a particular operation with
 92`jj --at-operation=<operation ID> <command>`. If the command is mutational, that
 93will result in a fork in the operation log. That works exactly the same as if
 94any later operations had not existed when the command started. In other words,
 95running commands on a repo loaded at an earlier operation works the same way as
 96if the operations had been concurrent. This can be useful for simulating
 97divergent operations.
 98
 99### Merging divergent operations
100
101If Jujutsu tries to load the repo and finds multiple heads in the operation log,
102it will do a 3-way merge of the view objects based on their common ancestor
103(possibly several 3-way merges if there were more than two heads). Conflicts
104are recorded in the resulting view object. For example, if bookmark `main` was
105moved from commit A to commit B in one operation and moved to commit C in a
106concurrent operation, then `main` will be recorded as "moved from A to B or C".
107See the `RefTarget` definition in `op_store.proto`.
108
109Because we allow bookmarks (etc.) to be in a conflicted state rather than just
110erroring out when there are multiple heads, the user can continue to use the
111repo, including performing further operations on the repo. Of course, some
112commands will fail when using a conflicted bookmark. For example,
113`jj new main` when `main` is in a conflicted state will result in an error
114telling you that `main` resolved to multiple revisions.
115
116### Storage
117
118The operation objects and view objects are stored in content-addressed storage
119just like Git commits are. That makes them safe to write without locking.
120
121We also need a way of finding the current head of the operation log. We do that
122by keeping the ID of the current head(s) as a file in a directory. The ID is the
123name of the file; it has no contents. When an operation completes, we add a file
124pointing to the new operation and then remove the file pointing to the old
125operation. Writing the new file is what makes the operation visible (if the old
126file didn't get properly deleted, then future readers will take care of that).
127This scheme ensures that transactions are atomic.