Documentation/gitformat-chunk.adoc at reftables-rust

freshlybakedca.ke / git
fork atom
Git fork
fork atom
git / Documentation / gitformat-chunk.adoc
at reftables-rust 133 lines 5.6 kB view raw
wrap content
brian m. carlson doc: use .adoc extension for AsciiDoc files 1y ago
1f010d6b
  1gitformat-chunk(5)
  2==================
  3
  4NAME
  5----
  6gitformat-chunk - Chunk-based file formats
  7
  8SYNOPSIS
  9--------
 10
 11Used by linkgit:gitformat-commit-graph[5] and the "MIDX" format (see
 12the pack format documentation in linkgit:gitformat-pack[5]).
 13
 14DESCRIPTION
 15-----------
 16
 17Some file formats in Git use a common concept of "chunks" to describe
 18sections of the file. This allows structured access to a large file by
 19scanning a small "table of contents" for the remaining data. This common
 20format is used by the `commit-graph` and `multi-pack-index` files. See
 21the `multi-pack-index` format in linkgit:gitformat-pack[5] and
 22the `commit-graph` format in linkgit:gitformat-commit-graph[5] for
 23how they use the chunks to describe structured data.
 24
 25A chunk-based file format begins with some header information custom to
 26that format. That header should include enough information to identify
 27the file type, format version, and number of chunks in the file. From this
 28information, that file can determine the start of the chunk-based region.
 29
 30The chunk-based region starts with a table of contents describing where
 31each chunk starts and ends. This consists of (C+1) rows of 12 bytes each,
 32where C is the number of chunks. Consider the following table:
 33
 34  | Chunk ID (4 bytes) | Chunk Offset (8 bytes) |
 35  |--------------------|------------------------|
 36  | ID[0]              | OFFSET[0]              |
 37  | ...                | ...                    |
 38  | ID[C]              | OFFSET[C]              |
 39  | 0x0000             | OFFSET[C+1]            |
 40
 41Each row consists of a 4-byte chunk identifier (ID) and an 8-byte offset.
 42Each integer is stored in network-byte order.
 43
 44The chunk identifier `ID[i]` is a label for the data stored within this
 45file from `OFFSET[i]` (inclusive) to `OFFSET[i+1]` (exclusive). Thus, the
 46size of the `i`th chunk is equal to the difference between `OFFSET[i+1]`
 47and `OFFSET[i]`. This requires that the chunk data appears contiguously
 48in the same order as the table of contents.
 49
 50The final entry in the table of contents must be four zero bytes. This
 51confirms that the table of contents is ending and provides the offset for
 52the end of the chunk-based data.
 53
 54Note: The chunk-based format expects that the file contains _at least_ a
 55trailing hash after `OFFSET[C+1]`.
 56
 57Functions for working with chunk-based file formats are declared in
 58`chunk-format.h`. Using these methods provide extra checks that assist
 59developers when creating new file formats.
 60
 61Writing chunk-based file formats
 62--------------------------------
 63
 64To write a chunk-based file format, create a `struct chunkfile` by
 65calling `init_chunkfile()` and pass a `struct hashfile` pointer. The
 66caller is responsible for opening the `hashfile` and writing header
 67information so the file format is identifiable before the chunk-based
 68format begins.
 69
 70Then, call `add_chunk()` for each chunk that is intended for writing. This
 71populates the `chunkfile` with information about the order and size of
 72each chunk to write. Provide a `chunk_write_fn` function pointer to
 73perform the write of the chunk data upon request.
 74
 75Call `write_chunkfile()` to write the table of contents to the `hashfile`
 76followed by each of the chunks. This will verify that each chunk wrote
 77the expected amount of data so the table of contents is correct.
 78
 79Finally, call `free_chunkfile()` to clear the `struct chunkfile` data. The
 80caller is responsible for finalizing the `hashfile` by writing the trailing
 81hash and closing the file.
 82
 83Reading chunk-based file formats
 84--------------------------------
 85
 86To read a chunk-based file format, the file must be opened as a
 87memory-mapped region. The chunk-format API expects that the entire file
 88is mapped as a contiguous memory region.
 89
 90Initialize a `struct chunkfile` pointer with `init_chunkfile(NULL)`.
 91
 92After reading the header information from the beginning of the file,
 93including the chunk count, call `read_table_of_contents()` to populate
 94the `struct chunkfile` with the list of chunks, their offsets, and their
 95sizes.
 96
 97Extract the data information for each chunk using `pair_chunk()` or
 98`read_chunk()`:
 99
100* `pair_chunk()` assigns a given pointer with the location inside the
101  memory-mapped file corresponding to that chunk's offset. If the chunk
102  does not exist, then the pointer is not modified.
103
104* `read_chunk()` takes a `chunk_read_fn` function pointer and calls it
105  with the appropriate initial pointer and size information. The function
106  is not called if the chunk does not exist. Use this method to read chunks
107  if you need to perform immediate parsing or if you need to execute logic
108  based on the size of the chunk.
109
110After calling these methods, call `free_chunkfile()` to clear the
111`struct chunkfile` data. This will not close the memory-mapped region.
112Callers are expected to own that data for the timeframe the pointers into
113the region are needed.
114
115Examples
116--------
117
118These file formats use the chunk-format API, and can be used as examples
119for future formats:
120
121* *commit-graph:* see `write_commit_graph_file()` and `parse_commit_graph()`
122  in `commit-graph.c` for how the chunk-format API is used to write and
123  parse the commit-graph file format documented in
124  the commit-graph file format in linkgit:gitformat-commit-graph[5].
125
126* *multi-pack-index:* see `write_midx_internal()` and `load_multi_pack_index()`
127  in `midx.c` for how the chunk-format API is used to write and
128  parse the multi-pack-index file format documented in
129  the multi-pack-index file format section of linkgit:gitformat-pack[5].
130
131GIT
132---
133Part of the linkgit:git[1] suite