Git fork

index-pack: allow revisiting REF_DELTA chains

As detailed in the previous changes to t5309-pack-delta-cycles.sh, the
logic within 'git index-pack' to analyze an incoming thin packfile with
REF_DELTAs is suspect. The algorithm is overly cautious around delta
cycles, and that leads in fact to failing even when there is no cycle.

This change adjusts the algorithm to no longer fail in these cases. In
fact, these cycle cases will no longer fail but more importantly the
valid cases will no longer fail, either. The resulting packfile from the
--fix-thin operation will not have cycles either since REF_DELTAs are
forbidden from the on-disk format and OFS_DELTAs are impossible to write
as a cycle.

The crux of the matter is how the algorithm works when the REF_DELTAs
point to base objects that exist in the local repository. When reading
the thin packfile, the object IDs for the delta objects are unknown so
we do not have the delta chain structure automatically. Instead, we need
to start somewhere by selecting a delta whose base is inside our current
object database.

Consider the case where the packfile has two REF_DELTA objects, A and B,
and the delta chain looks like "A depends on B" and "B depends on C" for
some third object C, where C is already in the current repository. The
algorithm _should_ start with all objects that depend on C, finding B,
and then moving on to all objects depending on B, finding A.

However, if the repository also already has object B, then the delta
chain can be analyzed in a different order. The deltas with base B can
be analyzed first, finding A, and then the deltas with base C are
analyzed, finding B. The algorithm currently continues to look for
objects that depend on B, finding A again. This fails due to A's
'real_type' member already being overwritten from OBJ_REF_DELTA to the
correct object type.

This scenario is possible in a typical 'git fetch' where the client does
not advertise B as a 'have' but requests A as a 'want' (and C is noticed
as a common object based on other 'have's). The reason this isn't
typically seen is that most Git servers use OFS_DELTAs to represent
deltas within a packfile. However, if a server uses only REF_DELTAs,
then this kind of issue can occur. There is nothing in the explicit
packfile format that states this use of inter-pack REF_DELTA is
incorrect, only that REF_DELTAs should not be used in the on-disk
representation to avoid cycles.

This die() was introduced in ab791dd138 (index-pack: fix race condition
with duplicate bases, 2014-08-29). Several refactors have adjusted the
error message and the surrounding logic, but this issue has existed for
a longer time as that was only a conversion from an assert().

The tests in t5309 originated in 3b910d0c5e (add tests for indexing
packs with delta cycles, 2013-08-23) and b2ef3d9ebb (test index-pack on
packs with recoverable delta cycles, 2013-08-23). These changes make
note that the current behavior of handling "resolvable" cycles is mostly
a documentation-only test, not that this behavior is the best way for
Git to handle the situation.

The fix here is somewhat complicated due to the amount of state being
adjusted by the loop within threaded_second_pass(). Instead of trying to
resume the start of the loop while adjusting the necessary context, I
chose to scan the REF_DELTAs depending on the current 'parent' and skip
any that have already been processed. This necessarily leaves us in a
state where 'child' and 'child_obj' could be left as NULL and that must
be handled later. There is also some careful handling around skipping
REF_DELTAs when there are also OFS_DELTAs depending on that parent.
There may be value in extending 'test-tool pack-deltas' to allow writing
OFS_DELTAs in order to exercise this logic across the delta types.

Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Junio C Hamano <gitster@pobox.com>

authored by

Derrick Stolee and committed by
Junio C Hamano
98f8854c fd7fd7af

+41 -29
+32 -26
builtin/index-pack.c
··· 1109 1109 set_thread_data(data); 1110 1110 for (;;) { 1111 1111 struct base_data *parent = NULL; 1112 - struct object_entry *child_obj; 1113 - struct base_data *child; 1112 + struct object_entry *child_obj = NULL; 1113 + struct base_data *child = NULL; 1114 1114 1115 1115 counter_lock(); 1116 1116 display_progress(progress, nr_resolved_deltas); ··· 1137 1137 parent = list_first_entry(&work_head, struct base_data, 1138 1138 list); 1139 1139 1140 - if (parent->ref_first <= parent->ref_last) { 1140 + while (parent->ref_first <= parent->ref_last) { 1141 1141 int offset = ref_deltas[parent->ref_first++].obj_no; 1142 1142 child_obj = objects + offset; 1143 - if (child_obj->real_type != OBJ_REF_DELTA) 1144 - die("REF_DELTA at offset %"PRIuMAX" already resolved (duplicate base %s?)", 1145 - (uintmax_t) child_obj->idx.offset, 1146 - oid_to_hex(&parent->obj->idx.oid)); 1143 + if (child_obj->real_type != OBJ_REF_DELTA) { 1144 + child_obj = NULL; 1145 + continue; 1146 + } 1147 1147 child_obj->real_type = parent->obj->real_type; 1148 - } else { 1148 + break; 1149 + } 1150 + 1151 + if (!child_obj && parent->ofs_first <= parent->ofs_last) { 1149 1152 child_obj = objects + 1150 1153 ofs_deltas[parent->ofs_first++].obj_no; 1151 1154 assert(child_obj->real_type == OBJ_OFS_DELTA); ··· 1178 1181 } 1179 1182 work_unlock(); 1180 1183 1181 - if (parent) { 1182 - child = resolve_delta(child_obj, parent); 1183 - if (!child->children_remaining) 1184 - FREE_AND_NULL(child->data); 1185 - } else { 1186 - child = make_base(child_obj, NULL); 1187 - if (child->children_remaining) { 1188 - /* 1189 - * Since this child has its own delta children, 1190 - * we will need this data in the future. 1191 - * Inflate now so that future iterations will 1192 - * have access to this object's data while 1193 - * outside the work mutex. 1194 - */ 1195 - child->data = get_data_from_pack(child_obj); 1196 - child->size = child_obj->size; 1184 + if (child_obj) { 1185 + if (parent) { 1186 + child = resolve_delta(child_obj, parent); 1187 + if (!child->children_remaining) 1188 + FREE_AND_NULL(child->data); 1189 + } else{ 1190 + child = make_base(child_obj, NULL); 1191 + if (child->children_remaining) { 1192 + /* 1193 + * Since this child has its own delta children, 1194 + * we will need this data in the future. 1195 + * Inflate now so that future iterations will 1196 + * have access to this object's data while 1197 + * outside the work mutex. 1198 + */ 1199 + child->data = get_data_from_pack(child_obj); 1200 + child->size = child_obj->size; 1201 + } 1197 1202 } 1198 1203 } 1199 1204 1200 1205 work_lock(); 1201 1206 if (parent) 1202 1207 parent->retain_data--; 1203 - if (child->data) { 1208 + 1209 + if (child && child->data) { 1204 1210 /* 1205 1211 * This child has its own children, so add it to 1206 1212 * work_head. ··· 1209 1215 base_cache_used += child->size; 1210 1216 prune_base_data(NULL); 1211 1217 free_base_data(child); 1212 - } else { 1218 + } else if (child) { 1213 1219 /* 1214 1220 * This child does not have its own children. It may be 1215 1221 * the last descendant of its ancestors; free those
+9 -3
t/t5309-pack-delta-cycles.sh
··· 60 60 test_expect_success 'failover to an object in another pack' ' 61 61 clear_packs && 62 62 git index-pack --stdin <ab.pack && 63 - test_must_fail git index-pack --stdin --fix-thin <cycle.pack 63 + 64 + # This cycle does not fail since the existence of A & B in 65 + # the repo allows us to resolve the cycle. 66 + git index-pack --stdin --fix-thin <cycle.pack 64 67 ' 65 68 66 69 test_expect_success 'failover to a duplicate object in the same pack' ' ··· 72 75 pack_obj $A 73 76 } >recoverable.pack && 74 77 pack_trailer recoverable.pack && 75 - test_must_fail git index-pack --fix-thin --stdin <recoverable.pack 78 + 79 + # This cycle does not fail since the existence of a full copy 80 + # of A in the pack allows us to resolve the cycle. 81 + git index-pack --fix-thin --stdin <recoverable.pack 76 82 ' 77 83 78 - test_expect_failure 'index-pack works with thin pack A->B->C with B on disk' ' 84 + test_expect_success 'index-pack works with thin pack A->B->C with B on disk' ' 79 85 git init server && 80 86 ( 81 87 cd server &&