Git fork
at reftables-rust 424 lines 15 kB view raw
1gitformat-index(5) 2================== 3 4NAME 5---- 6gitformat-index - Git index format 7 8SYNOPSIS 9-------- 10[verse] 11$GIT_DIR/index 12 13DESCRIPTION 14----------- 15 16Git index format 17 18== The Git index file has the following format 19 20 All binary numbers are in network byte order. 21 In a repository using the traditional SHA-1, checksums and object IDs 22 (object names) mentioned below are all computed using SHA-1. Similarly, 23 in SHA-256 repositories, these values are computed using SHA-256. 24 Version 2 is described here unless stated otherwise. 25 26 - A 12-byte header consisting of 27 28 4-byte signature: 29 The signature is { 'D', 'I', 'R', 'C' } (stands for "dircache") 30 31 4-byte version number: 32 The current supported versions are 2, 3 and 4. 33 34 32-bit number of index entries. 35 36 - A number of sorted index entries (see below). 37 38 - Extensions 39 40 Extensions are identified by signature. Optional extensions can 41 be ignored if Git does not understand them. 42 43 4-byte extension signature. If the first byte is 'A'..'Z' the 44 extension is optional and can be ignored. 45 46 32-bit size of the extension 47 48 Extension data 49 50 - Hash checksum over the content of the index file before this checksum. 51 52== Index entry 53 54 Index entries are sorted in ascending order on the name field, 55 interpreted as a string of unsigned bytes (i.e. memcmp() order, no 56 localization, no special casing of directory separator '/'). Entries 57 with the same name are sorted by their stage field. 58 59 An index entry typically represents a file. However, if sparse-checkout 60 is enabled in cone mode (`core.sparseCheckoutCone` is enabled) and the 61 `extensions.sparseIndex` extension is enabled, then the index may 62 contain entries for directories outside of the sparse-checkout definition. 63 These entries have mode `040000`, include the `SKIP_WORKTREE` bit, and 64 the path ends in a directory separator. 65 66 32-bit ctime seconds, the last time a file's metadata changed 67 this is stat(2) data 68 69 32-bit ctime nanosecond fractions 70 this is stat(2) data 71 72 32-bit mtime seconds, the last time a file's data changed 73 this is stat(2) data 74 75 32-bit mtime nanosecond fractions 76 this is stat(2) data 77 78 32-bit dev 79 this is stat(2) data 80 81 32-bit ino 82 this is stat(2) data 83 84 32-bit mode, split into (high to low bits) 85 86 16-bit unused, must be zero 87 88 4-bit object type 89 valid values in binary are 1000 (regular file), 1010 (symbolic link) 90 and 1110 (gitlink) 91 92 3-bit unused, must be zero 93 94 9-bit unix permission. Only 0755 and 0644 are valid for regular files. 95 Symbolic links and gitlinks have value 0 in this field. 96 97 32-bit uid 98 this is stat(2) data 99 100 32-bit gid 101 this is stat(2) data 102 103 32-bit file size 104 This is the on-disk size from stat(2), truncated to 32-bit. 105 106 Object name for the represented object 107 108 A 16-bit 'flags' field split into (high to low bits) 109 110 1-bit assume-valid flag 111 112 1-bit extended flag (must be zero in version 2) 113 114 2-bit stage (during merge) 115 116 12-bit name length if the length is less than 0xFFF; otherwise 0xFFF 117 is stored in this field. 118 119 (Version 3 or later) A 16-bit field, only applicable if the 120 "extended flag" above is 1, split into (high to low bits). 121 122 1-bit reserved for future 123 124 1-bit skip-worktree flag (used by sparse checkout) 125 126 1-bit intent-to-add flag (used by "git add -N") 127 128 13-bit unused, must be zero 129 130 Entry path name (variable length) relative to top level directory 131 (without leading slash). '/' is used as path separator. The special 132 path components ".", ".." and ".git" (without quotes) are disallowed. 133 Trailing slash is also disallowed. 134 135 The exact encoding is undefined, but the '.' and '/' characters 136 are encoded in 7-bit ASCII and the encoding cannot contain a NUL 137 byte (iow, this is a UNIX pathname). 138 139 (Version 4) In version 4, the entry path name is prefix-compressed 140 relative to the path name for the previous entry (the very first 141 entry is encoded as if the path name for the previous entry is an 142 empty string). At the beginning of an entry, an integer N in the 143 variable width encoding (the same encoding as the offset is encoded 144 for OFS_DELTA pack entries; see linkgit:gitformat-pack[5]) is stored, followed 145 by a NUL-terminated string S. Removing N bytes from the end of the 146 path name for the previous entry, and replacing it with the string S 147 yields the path name for this entry. 148 149 1-8 nul bytes as necessary to pad the entry to a multiple of eight bytes 150 while keeping the name NUL-terminated. 151 152 (Version 4) In version 4, the padding after the pathname does not 153 exist. 154 155 Interpretation of index entries in split index mode is completely 156 different. See below for details. 157 158== Extensions 159 160=== Cache tree 161 162 Since the index does not record entries for directories, the cache 163 entries cannot describe tree objects that already exist in the object 164 database for regions of the index that are unchanged from an existing 165 commit. The cache tree extension stores a recursive tree structure that 166 describes the trees that already exist and completely match sections of 167 the cache entries. This speeds up tree object generation from the index 168 for a new commit by only computing the trees that are "new" to that 169 commit. It also assists when comparing the index to another tree, such 170 as `HEAD^{tree}`, since sections of the index can be skipped when a tree 171 comparison demonstrates equality. 172 173 The recursive tree structure uses nodes that store a number of cache 174 entries, a list of subnodes, and an object ID (OID). The OID references 175 the existing tree for that node, if it is known to exist. The subnodes 176 correspond to subdirectories that themselves have cache tree nodes. The 177 number of cache entries corresponds to the number of cache entries in 178 the index that describe paths within that tree's directory. 179 180 The extension tracks the full directory structure in the cache tree 181 extension, but this is generally smaller than the full cache entry list. 182 183 When a path is updated in index, Git invalidates all nodes of the 184 recursive cache tree corresponding to the parent directories of that 185 path. We store these tree nodes as being "invalid" by using "-1" as the 186 number of cache entries. Invalid nodes still store a span of index 187 entries, allowing Git to focus its efforts when reconstructing a full 188 cache tree. 189 190 The signature for this extension is { 'T', 'R', 'E', 'E' }. 191 192 A series of entries fill the entire extension; each of which 193 consists of: 194 195 - NUL-terminated path component (relative to its parent directory); 196 197 - ASCII decimal number of entries in the index that is covered by the 198 tree this entry represents (entry_count); 199 200 - A space (ASCII 32); 201 202 - ASCII decimal number that represents the number of subtrees this 203 tree has; 204 205 - A newline (ASCII 10); and 206 207 - Object name for the object that would result from writing this span 208 of index as a tree. 209 210 An entry can be in an invalidated state and is represented by having 211 a negative number in the entry_count field. In this case, there is no 212 object name and the next entry starts immediately after the newline. 213 When writing an invalid entry, -1 should always be used as entry_count. 214 215 The entries are written out in the top-down, depth-first order. The 216 first entry represents the root level of the repository, followed by the 217 first subtree--let's call this A--of the root level (with its name 218 relative to the root level), followed by the first subtree of A (with 219 its name relative to A), and so on. The specified number of subtrees 220 indicates when the current level of the recursive stack is complete. 221 222=== Resolve undo 223 224 A conflict is represented in the index as a set of higher stage entries. 225 When a conflict is resolved (e.g. with "git add path"), these higher 226 stage entries will be removed and a stage-0 entry with proper resolution 227 is added. 228 229 When these higher stage entries are removed, they are saved in the 230 resolve undo extension, so that conflicts can be recreated (e.g. with 231 "git checkout -m"), in case users want to redo a conflict resolution 232 from scratch. 233 234 The signature for this extension is { 'R', 'E', 'U', 'C' }. 235 236 A series of entries fill the entire extension; each of which 237 consists of: 238 239 - NUL-terminated pathname the entry describes (relative to the root of 240 the repository, i.e. full pathname); 241 242 - Three NUL-terminated ASCII octal numbers, entry mode of entries in 243 stage 1 to 3 (a missing stage is represented by "0" in this field); 244 and 245 246 - At most three object names of the entry in stages from 1 to 3 247 (nothing is written for a missing stage). 248 249=== Split index 250 251 In split index mode, the majority of index entries could be stored 252 in a separate file. This extension records the changes to be made on 253 top of that to produce the final index. 254 255 The signature for this extension is { 'l', 'i', 'n', 'k' }. 256 257 The extension consists of: 258 259 - Hash of the shared index file. The shared index file path 260 is $GIT_DIR/sharedindex.<hash>. If all bits are zero, the 261 index does not require a shared index file. 262 263 - An ewah-encoded delete bitmap, each bit represents an entry in the 264 shared index. If a bit is set, its corresponding entry in the 265 shared index will be removed from the final index. Note, because 266 a delete operation changes index entry positions, but we do need 267 original positions in replace phase, it's best to just mark 268 entries for removal, then do a mass deletion after replacement. 269 270 - An ewah-encoded replace bitmap, each bit represents an entry in 271 the shared index. If a bit is set, its corresponding entry in the 272 shared index will be replaced with an entry in this index 273 file. All replaced entries are stored in sorted order in this 274 index. The first "1" bit in the replace bitmap corresponds to the 275 first index entry, the second "1" bit to the second entry and so 276 on. Replaced entries may have empty path names to save space. 277 278 The remaining index entries after replaced ones will be added to the 279 final index. These added entries are also sorted by entry name then 280 stage. 281 282== Untracked cache 283 284 Untracked cache saves the untracked file list and necessary data to 285 verify the cache. The signature for this extension is { 'U', 'N', 286 'T', 'R' }. 287 288 The extension starts with 289 290 - A sequence of NUL-terminated strings, preceded by the size of the 291 sequence in variable width encoding. Each string describes the 292 environment where the cache can be used. 293 294 - Stat data of $GIT_DIR/info/exclude. See "Index entry" section from 295 ctime field until "file size". 296 297 - Stat data of core.excludesFile 298 299 - 32-bit dir_flags (see struct dir_struct) 300 301 - Hash of $GIT_DIR/info/exclude. A null hash means the file 302 does not exist. 303 304 - Hash of core.excludesFile. A null hash means the file does 305 not exist. 306 307 - NUL-terminated string of per-dir exclude file name. This usually 308 is ".gitignore". 309 310 - The number of following directory blocks, variable width 311 encoding. If this number is zero, the extension ends here with a 312 following NUL. 313 314 - A number of directory blocks in depth-first-search order, each 315 consists of 316 317 - The number of untracked entries, variable width encoding. 318 319 - The number of sub-directory blocks, variable width encoding. 320 321 - The directory name terminated by NUL. 322 323 - A number of untracked file/dir names terminated by NUL. 324 325The remaining data of each directory block is grouped by type: 326 327 - An ewah bitmap, the n-th bit marks whether the n-th directory has 328 valid untracked cache entries. 329 330 - An ewah bitmap, the n-th bit records "check-only" bit of 331 read_directory_recursive() for the n-th directory. 332 333 - An ewah bitmap, the n-th bit indicates whether hash and stat data 334 is valid for the n-th directory and exists in the next data. 335 336 - An array of stat data. The n-th data corresponds with the n-th 337 "one" bit in the previous ewah bitmap. 338 339 - An array of hashes. The n-th hash corresponds with the n-th "one" bit 340 in the previous ewah bitmap. 341 342 - One NUL. 343 344== File System Monitor cache 345 346 The file system monitor cache tracks files for which the core.fsmonitor 347 hook has told us about changes. The signature for this extension is 348 { 'F', 'S', 'M', 'N' }. 349 350 The extension starts with 351 352 - 32-bit version number: the current supported versions are 1 and 2. 353 354 - (Version 1) 355 64-bit time: the extension data reflects all changes through the given 356 time which is stored as the nanoseconds elapsed since midnight, 357 January 1, 1970. 358 359 - (Version 2) 360 A null terminated string: an opaque token defined by the file system 361 monitor application. The extension data reflects all changes relative 362 to that token. 363 364 - 32-bit bitmap size: the size of the CE_FSMONITOR_VALID bitmap. 365 366 - An ewah bitmap, the n-th bit indicates whether the n-th index entry 367 is not CE_FSMONITOR_VALID. 368 369== End of Index Entry 370 371 The End of Index Entry (EOIE) is used to locate the end of the variable 372 length index entries and the beginning of the extensions. Code can take 373 advantage of this to quickly locate the index extensions without having 374 to parse through all of the index entries. 375 376 Because it must be able to be loaded before the variable length cache 377 entries and other index extensions, this extension must be written last. 378 The signature for this extension is { 'E', 'O', 'I', 'E' }. 379 380 The extension consists of: 381 382 - 32-bit offset to the end of the index entries 383 384 - Hash over the extension types and their sizes (but not 385 their contents). E.g. if we have "TREE" extension that is N-bytes 386 long, "REUC" extension that is M-bytes long, followed by "EOIE", 387 then the hash would be: 388 389 Hash("TREE" + <binary-representation-of-N> + 390 "REUC" + <binary-representation-of-M>) 391 392== Index Entry Offset Table 393 394 The Index Entry Offset Table (IEOT) is used to help address the CPU 395 cost of loading the index by enabling multi-threading the process of 396 converting cache entries from the on-disk format to the in-memory format. 397 The signature for this extension is { 'I', 'E', 'O', 'T' }. 398 399 The extension consists of: 400 401 - 32-bit version (currently 1) 402 403 - A number of index offset entries each consisting of: 404 405 - 32-bit offset from the beginning of the file to the first cache entry 406 in this block of entries. 407 408 - 32-bit count of cache entries in this block 409 410== Sparse Directory Entries 411 412 When using sparse-checkout in cone mode, some entire directories within 413 the index can be summarized by pointing to a tree object instead of the 414 entire expanded list of paths within that tree. An index containing such 415 entries is a "sparse index". Index format versions 4 and less were not 416 implemented with such entries in mind. Thus, for these versions, an 417 index containing sparse directory entries will include this extension 418 with signature { 's', 'd', 'i', 'r' }. Like the split-index extension, 419 tools should avoid interacting with a sparse index unless they understand 420 this extension. 421 422GIT 423--- 424Part of the linkgit:git[1] suite