My working unpac repository
at opam/upstream/seq 1366 lines 46 kB view raw
1(**************************************************************************) 2(* *) 3(* OCaml *) 4(* *) 5(* Gabriel Scherer, projet Partout, INRIA Paris-Saclay *) 6(* *) 7(* Copyright 2022 Institut National de Recherche en Informatique et *) 8(* en Automatique. *) 9(* *) 10(* All rights reserved. This file is distributed under the terms of *) 11(* the GNU Lesser General Public License version 2.1, with the *) 12(* special exception on linking described in the file LICENSE. *) 13(* *) 14(**************************************************************************) 15 16(* {2 The type ['a t]} 17 18 A dynamic array is represented using a backing array [arr] and 19 a [length]. It behaves as an array of size [length] -- the indices 20 from [0] to [length - 1] included contain user-provided values and 21 can be [get] and [set] -- but the length may also change in the 22 future by adding or removing elements at the end. 23 24 We use the following concepts; 25 - capacity: the length of the backing array: 26 [Array.length arr] 27 - live space: the portion of the backing array with 28 indices from [0] to [length - 1] included. 29 - empty space: the portion of the backing array 30 from [length] to the end of the backing array. 31 32 {2 Dummies} 33 34 We should not keep a user-provided value in the empty space, as 35 this could extend its lifetime and may result in memory leaks of 36 arbitrary size. Functions that remove elements from the dynamic 37 array, such as [pop_last] or [truncate], must really erase the 38 element from the backing array. 39 40 To do so, we use an unsafe/magical [dummy] in the empty array. This 41 dummy is *not* type-safe, it is not a valid value of type ['a], so 42 we must be very careful never to return it to the user. After 43 accessing any element of the array, we must check that it is not 44 the dummy. In particular, this dummy must be distinct from any 45 other value the user could provide -- we ensure this by using 46 a dynamically-allocated mutable reference as our dummy. 47 48 {2 Invariants and valid states} 49 50 We enforce the invariant that [length >= 0] at all times. 51 we rely on this invariant for optimization. 52 53 The following conditions define what we call a "valid" dynarray: 54 - valid length: [length <= Array.length arr] 55 - no missing element in the live space: 56 forall i, [0 <= i < length] implies [arr.(i) != dummy] 57 - no element in the empty space: 58 forall i, [length <= i < Array.length arr] implies [arr.(i) == dummy] 59 60 Unfortunately, we cannot easily enforce validity as an invariant in 61 presence of concurrent updates. We can thus observe dynarrays in 62 "invalid states". Our implementation may raise exceptions or return 63 incorrect results on observing invalid states, but of course it 64 must preserve memory safety. 65 66 {3 Dummies and flat float arrays} 67 68 OCaml performs a dynamic optimization of the representation of 69 float arrays, which is incompatible with our use of a dummy 70 value: if we initialize an array with user-provided elements, 71 it may get an optimized into a "flat float array", and 72 writing our non-float dummy into it would crash. 73 74 To avoid interactions between unsafe dummies and flat float arrays, 75 we ensure that the arrays that we use are never initialized with 76 floating point values. In that case we will always get a non-flat 77 array, and storing float values inside those is safe 78 (if less efficient). We call this the 'no-flat-float' invariant. 79 80 {3 Marshalling dummies} 81 82 There is a risk of interaction between dummies and 83 marshalling. If we use a global dynamically-allocated dummy 84 for the whole module, we are not robust to a user marshalling 85 a dynarray and unmarshalling it inside another program with 86 a different global dummy. 87 88 The trick is to store the dummy that we use in the dynarray 89 metadata record. Marshalling the dynarray will then preserve the 90 physical equality between this dummy field and dummy elements in 91 the array, as expected. 92 93 This reasoning assumes that marshalling does not use the 94 [No_sharing] flag. To ensure that users do not marshal dummies 95 with [No_sharing], we use a recursive/cyclic dummy that would make 96 such marshalling loop forever. (This is not nice, but better than 97 segfaulting later for obscure reasons.) 98*) 99 100(** The [Dummy] module encapsulates the low-level magic we use 101 for dummies, providing a strongly-typed API that: 102 - makes it explicit where dummies are used 103 - makes it hard to mistakenly mix data using distinct dummies, 104 which would be unsound *) 105module Dummy : sig 106 107 (** {4 Dummies} *) 108 109 type 'stamp dummy 110 (** The type of dummies is parametrized by a ['stamp] variable, 111 so that two dummies with different stamps cannot be confused 112 together. *) 113 114 type fresh_dummy = Fresh : 'stamp dummy -> fresh_dummy 115 val fresh : unit -> fresh_dummy 116 (** The type of [fresh] enforces a fresh/unknown/opaque stamp for 117 the returned dummy, distinct from all previous stamps. *) 118 119 120 (** {4 Values or dummies} *) 121 122 type ('a, 'stamp) with_dummy 123 (** a value of type [('a, 'stamp) with_dummy] is either a proper 124 value of type ['a] or a dummy with stamp ['stamp]. *) 125 126 val of_val : 'a -> ('a, 'stamp) with_dummy 127 val of_dummy : 'stamp dummy -> ('a, 'stamp) with_dummy 128 129 val is_dummy : ('a, 'stamp) with_dummy -> 'stamp dummy -> bool 130 val unsafe_get : ('a, 'stamp) with_dummy -> 'a 131 (** [unsafe_get v] can only be called safely if [is_dummy v dummy] 132 is [false]. 133 134 We could instead provide 135 [val find : ('a, 'stamp) with_dummy -> ('a, 'stamp dummy) result] 136 but this would involve intermediary allocations. 137 138 {[match find x with 139 | None -> ... 140 | Some v -> ...]} 141 can instead be written 142 {[if Dummy.is_dummy x 143 then ... 144 else let v = Dummy.unsafe_get x in ...]} 145 *) 146 147 (** {4 Arrays of values or dummies} *) 148 module Array : sig 149 val make : 150 int -> 'a -> dummy:'stamp dummy -> 151 ('a, 'stamp) with_dummy array 152 153 val init : 154 int -> (int -> 'a) -> dummy:'stamp dummy -> 155 ('a, 'stamp) with_dummy array 156 157 val copy_from_array : 158 'a array -> dummy:'stamp dummy -> ('a, 'stamp) with_dummy array 159 160 val unsafe_nocopy_from_array : 161 'a array -> dummy:'stamp dummy -> ('a, 'stamp) with_dummy array 162 (** [unsafe_nocopy] assumes that the input array was created 163 locally and will not be used anymore (in the spirit of 164 [Bytes.unsafe_to_string]), and avoids a copy of the input 165 array when possible. *) 166 167 exception Dummy_found of int 168 169 val unsafe_nocopy_to_array : 170 ('a, 'stamp) with_dummy array -> dummy:'stamp dummy -> 'a array 171 (** Assumes, without checking, that the input array was created locally and 172 will not be used anymore. Performs no copy except when the elements are 173 floats. Raises [Dummy_found i] if there is a dummy at any index [i]. *) 174 175 val blit_array : 176 'a array -> int -> 177 ('a, 'stamp) with_dummy array -> int -> 178 len:int -> 179 unit 180 181 val blit : 182 ('a, 'stamp1) with_dummy array -> 'stamp1 dummy -> int -> 183 ('a, 'stamp2) with_dummy array -> 'stamp2 dummy -> int -> 184 len:int -> 185 unit 186 (** Raises [Dummy_found i] if there is a dummy at any index [i] in 187 the source region. *) 188 189 val prefix : 190 ('a, 'stamp) with_dummy array -> 191 int -> 192 ('a, 'stamp) with_dummy array 193 194 val extend : 195 ('a, 'stamp) with_dummy array -> 196 length:int -> 197 dummy:'stamp dummy -> 198 new_capacity:int -> 199 ('a, 'stamp) with_dummy array 200 end 201end = struct 202 (* We want to use a cyclic value so that No_sharing marshalling 203 fails loudly, but we want also comparison of dynarrays to work 204 as expected, and not loop forever. 205 206 Our approach is to use an object value that contains a cycle. 207 Objects are compared by their unique id, so comparison is not 208 structural and will not loop on the cycle, but marshalled 209 by content, so marshalling without sharing will fail on the cycle. 210 211 (It is a bit tricky to build an object that does not contain 212 functional values where marshalling fails, see [fresh ()] below 213 for how we do it.) *) 214 type 'stamp dummy = < > 215 type fresh_dummy = Fresh : 'stamp dummy -> fresh_dummy 216 217 let fresh () = 218 (* dummies and marshalling: we intentionally 219 use a cyclic value here. *) 220 let r = ref None in 221 ignore 222 (* hack: this primitive is required by the object expression below, 223 ensure that 'make depend' notices it. *) 224 CamlinternalOO.create_object_opt; 225 let dummy = object 226 val x = r 227 end in 228 r := Some dummy; 229 Fresh dummy 230 231 (* Use an abstract type to prevent the compiler from assuming anything about 232 the representation of [with_dummy] values. 233 234 Representation: We explicitly use "%opaque" primitives when converting 235 to/from [with_dummy] types and/or arrays of [with_dummy] types, because 236 using "transparent" identity (e.g. `Obj.magic`) might break assumptions 237 that the compiler makes about value representations (for instance, a value 238 of type [(int, 'stamp) with_dummy array] could contain blocks, while a 239 value of type [int array] certainly does not). 240 241 While it would be possible to use transparent identity in {b some} places, 242 it would require careful reasoning to make sure it is safe to do so 243 (especially in a forward-compatible way) and it is not clear the benefit 244 is worth the effort. *) 245 type ('a, 'stamp) with_dummy 246 247 external of_val : 'a -> ('a, 'stamp) with_dummy = "%opaque" 248 249 external of_dummy : 'stamp dummy -> ('a, 'stamp) with_dummy = "%opaque" 250 251 let is_dummy v dummy = 252 v == of_dummy dummy 253 254 (* Safety: the argument must not be the ['stamp dummy]. *) 255 external unsafe_get : ('a, 'stamp) with_dummy -> 'a = "%opaque" 256 257 module Array = struct 258 let make n x ~dummy = 259 if Obj.(tag (repr x) <> double_tag) then 260 Array.make n (of_val x) 261 else begin 262 let arr = Array.make n (of_dummy dummy) in 263 Array.fill arr 0 n (of_val x); 264 arr 265 end 266 267 (* Safety: must not be called on float arrays. *) 268 external unsafe_nocopy_from_non_float_array : 269 'a array -> ('a, 'stamp) with_dummy array 270 = "%opaque" 271 272 let copy_from_array a ~dummy = 273 if Obj.(tag (repr a) <> double_array_tag) then 274 unsafe_nocopy_from_non_float_array (Array.copy a) 275 else begin 276 let n = Array.length a in 277 let arr = Array.make n (of_dummy dummy) in 278 for i = 0 to n - 1 do 279 Array.unsafe_set arr i 280 (of_val (Array.unsafe_get a i)); 281 done; 282 arr 283 end 284 285 let unsafe_nocopy_from_array a ~dummy = 286 if Obj.(tag (repr a) <> double_array_tag) then 287 unsafe_nocopy_from_non_float_array a 288 else copy_from_array a ~dummy 289 290 exception Dummy_found of int 291 292 (* Safety: the argument must not contain any dummies, and must not contain 293 floats. *) 294 external unsafe_nocopy_to_non_float_array : 295 ('a, 'stamp) with_dummy array -> 'a array 296 = "%opaque" 297 298 let unsafe_nocopy_to_array a ~dummy = 299 let n = Array.length a in 300 if n = 0 || Obj.(tag (repr a.(0)) <> double_tag) then begin 301 for i = 0 to n - 1 do 302 if is_dummy (Array.unsafe_get a i) dummy then raise (Dummy_found i) 303 done; 304 unsafe_nocopy_to_non_float_array a 305 end else begin 306 let a' = Array.make n (unsafe_get a.(0)) in 307 for i = 1 to n - 1 do 308 let v = Array.unsafe_get a i in 309 if is_dummy v dummy then raise (Dummy_found i); 310 Array.unsafe_set a' i (unsafe_get v) 311 done; 312 a' 313 end 314 315 let init n f ~dummy = 316 let arr = Array.make n (of_dummy dummy) in 317 for i = 0 to n - 1 do 318 Array.unsafe_set arr i (of_val (f i)) 319 done; 320 arr 321 322 let blit_array src src_pos dst dst_pos ~len = 323 if Obj.(tag (repr src) <> double_array_tag) then 324 Array.blit 325 (unsafe_nocopy_from_non_float_array src) 326 src_pos dst 327 dst_pos len 328 else begin 329 for i = 0 to len - 1 do 330 dst.(dst_pos + i) <- of_val src.(src_pos + i) 331 done; 332 end 333 334 (* Safety: both arrays must have the same dummy, i.e. the ['stamp1 dummy] 335 and the ['stamp2 dummy] must be physically equal. *) 336 external unsafe_cast_stamp_array : 337 ('a, 'stamp1) with_dummy array -> ('a, 'stamp2) with_dummy array 338 = "%opaque" 339 340 let blit src src_dummy src_pos dst dst_dummy dst_pos ~len = 341 if src_dummy == dst_dummy then 342 Array.blit (unsafe_cast_stamp_array src) src_pos dst dst_pos len 343 else begin 344 if len < 0 345 || src_pos < 0 346 || src_pos + len < 0 (* overflow check *) 347 || src_pos + len > Array.length src 348 || dst_pos < 0 349 || dst_pos + len < 0 (* overflow check *) 350 || dst_pos + len > Array.length dst 351 then begin 352 (* We assume that the caller has already checked this and 353 will raise a proper error. The check here is only for 354 memory safety, it should not be reached and it is okay if 355 the error is uninformative. *) 356 assert false; 357 end; 358 (* We failed the check [src_dummy == dst_dummy] above, so we 359 know that in fact [src != dst] -- two dynarrays with 360 distinct dummies cannot share the same backing arrays. 361 362 We use [Obj.repr] for the comparison since [src] and [dst] have 363 different dummies. *) 364 assert (Obj.repr src != Obj.repr dst); 365 (* In particular, the source and destination arrays cannot 366 overlap, so we can always copy in ascending order without 367 risking overwriting an element needed later. 368 369 We also must check for dummies (invalid state) in the source 370 array: having two different dummies in the same array would be 371 memory unsafe. *) 372 for i = 0 to len - 1 do 373 let v = Array.unsafe_get src (src_pos + i) in 374 (* The combination of [of_val] and [unsafe_get] below allows to change 375 the stamp mark, which is only safe on a non-dummy value. *) 376 if is_dummy v src_dummy then 377 raise (Dummy_found (src_pos + i)); 378 Array.unsafe_set dst (dst_pos + i) 379 (of_val (unsafe_get v)); 380 done 381 end 382 383 let prefix arr n = 384 (* Note: the safety of the [Array.sub] call below, with respect to 385 our 'no-flat-float' invariant, relies on the fact that 386 [Array.sub] checks the tag of the input array, not whether the 387 elements themselves are float. 388 389 To avoid relying on this undocumented property we could use 390 [Array.make length dummy] and then set values in a loop, but this 391 would result in [caml_modify] rather than [caml_initialize]. *) 392 Array.sub arr 0 n 393 394 let extend arr ~length ~dummy ~new_capacity = 395 (* 'no-flat-float' invariant: we initialise the array with our 396 non-float dummy to get a non-flat array. *) 397 let new_arr = Array.make new_capacity (of_dummy dummy) in 398 Array.blit arr 0 new_arr 0 length; 399 new_arr 400 end 401end 402 403type 'a t = Pack : ('a, 'stamp) t_ -> 'a t [@@unboxed] 404and ('a, 'stamp) t_ = { 405 mutable length : int; 406 mutable arr : ('a, 'stamp) Dummy.with_dummy array; 407 dummy : 'stamp Dummy.dummy; 408} 409 410let global_dummy = Dummy.fresh () 411(* We need to ensure that dummies are never exposed to the user as 412 values of type ['a]. Including the dummy in the dynarray metadata 413 is necessary for marshalling to behave correctly, but there is no 414 obligation to create a fresh dummy for each new dynarray, we can 415 use a global dummy. 416 417 On the other hand, unmarshalling may precisely return a dynarray 418 with another dummy: we cannot assume that all dynarrays use this 419 global dummy. The existential hiding of the dummy ['stamp] 420 parameter helps us to avoid this assumption. *) 421 422module Error = struct 423 let[@inline never] index_out_of_bounds fname ~i ~length = 424 if length = 0 then 425 Printf.ksprintf invalid_arg 426 "Dynarray.%s: index %d out of bounds (empty dynarray)" 427 fname i 428 else 429 Printf.ksprintf invalid_arg 430 "Dynarray.%s: index %d out of bounds (0..%d)" 431 fname i (length - 1) 432 433 let[@inline never] negative_length_requested fname n = 434 Printf.ksprintf invalid_arg 435 "Dynarray.%s: negative length %d requested" 436 fname n 437 438 let[@inline never] negative_capacity_requested fname n = 439 Printf.ksprintf invalid_arg 440 "Dynarray.%s: negative capacity %d requested" 441 fname n 442 443 let[@inline never] requested_length_out_of_bounds fname requested_length = 444 Printf.ksprintf invalid_arg 445 "Dynarray.%s: cannot grow to requested length %d (max_array_length is %d)" 446 fname requested_length Sys.max_array_length 447 448 (* When observing an invalid state ([missing_element], 449 [invalid_length]), we do not give the name of the calling function 450 in the error message, as the error is related to invalid operations 451 performed earlier, and not to the callsite of the function 452 itself. *) 453 454 let invalid_state_description = 455 "Invalid dynarray (unsynchronized concurrent length change)" 456 457 let[@inline never] missing_element ~i ~length = 458 Printf.ksprintf invalid_arg 459 "%s: missing element at position %d < length %d" 460 invalid_state_description 461 i length 462 463 let[@inline never] invalid_length ~length ~capacity = 464 Printf.ksprintf invalid_arg 465 "%s: length %d > capacity %d" 466 invalid_state_description 467 length capacity 468 469 let[@inline never] length_change_during_iteration fname ~expected ~observed = 470 Printf.ksprintf invalid_arg 471 "Dynarray.%s: a length change from %d to %d occurred during iteration" 472 fname expected observed 473 474 (* When an [Empty] element is observed unexpectedly at index [i], 475 it may be either an out-of-bounds access or an invalid-state situation 476 depending on whether [i <= length]. *) 477 let[@inline never] unexpected_empty_element fname ~i ~length = 478 if i < length then 479 missing_element ~i ~length 480 else 481 index_out_of_bounds fname ~i ~length 482 483 let[@inline never] empty_dynarray fname = 484 Printf.ksprintf invalid_arg 485 "Dynarray.%s: empty array" fname 486 487 let[@inline never] different_lengths f ~length1 ~length2 = 488 Printf.ksprintf invalid_arg 489 "Dynarray.%s: array length mismatch: %d <> %d" 490 f length1 length2 491end 492 493(* Detecting iterator invalidation. 494 495 See {!iter} below for a detailed usage example. 496*) 497let check_same_length fname (Pack a) ~length = 498 let length_a = a.length in 499 if length <> length_a then 500 Error.length_change_during_iteration fname 501 ~expected:length ~observed:length_a 502 503(** Careful unsafe access. *) 504 505(* Postcondition on non-exceptional return: 506 [length <= Array.length arr] *) 507let[@inline always] check_valid_length length arr = 508 let capacity = Array.length arr in 509 if length > capacity then 510 Error.invalid_length ~length ~capacity 511 512(* Precondition: [0 <= i < length <= Array.length arr] 513 514 This precondition is typically guaranteed by knowing 515 [0 <= i < length] and calling [check_valid_length length arr].*) 516let[@inline always] unsafe_get arr ~dummy ~i ~length = 517 let v = Array.unsafe_get arr i in 518 if Dummy.is_dummy v dummy 519 then Error.missing_element ~i ~length 520 else Dummy.unsafe_get v 521 522(** {1:dynarrays Dynamic arrays} *) 523 524let create () = 525 let Dummy.Fresh dummy = global_dummy in 526 Pack { 527 length = 0; 528 arr = [| |]; 529 dummy = dummy; 530 } 531 532let make n x = 533 if n < 0 then Error.negative_length_requested "make" n; 534 let Dummy.Fresh dummy = global_dummy in 535 let arr = Dummy.Array.make n x ~dummy in 536 Pack { 537 length = n; 538 arr; 539 dummy; 540 } 541 542let init (type a) n (f : int -> a) : a t = 543 if n < 0 then Error.negative_length_requested "init" n; 544 let Dummy.Fresh dummy = global_dummy in 545 let arr = Dummy.Array.init n f ~dummy in 546 Pack { 547 length = n; 548 arr; 549 dummy; 550 } 551 552let get (type a) (Pack a : a t) i = 553 (* This implementation will propagate an [Invalid_argument] exception 554 from array lookup if the index is out of the backing array, 555 instead of using our own [Error.index_out_of_bounds]. This is 556 allowed by our specification, and more efficient -- no need to 557 check that [length a <= capacity a] in the fast path. *) 558 let v = a.arr.(i) in 559 if Dummy.is_dummy v a.dummy 560 then Error.unexpected_empty_element "get" ~i ~length:a.length 561 else Dummy.unsafe_get v 562 563let set (Pack a) i x = 564 let {arr; length; _} = a in 565 if i >= length then Error.index_out_of_bounds "set" ~i ~length 566 else arr.(i) <- Dummy.of_val x 567 568let length (Pack a) = a.length 569 570let is_empty (Pack a) = (a.length = 0) 571 572let copy (type a) (Pack {length; arr; dummy} : a t) : a t = 573 check_valid_length length arr; 574 (* use [length] as the new capacity to make 575 this an O(length) operation. *) 576 let arr = Dummy.Array.prefix arr length in 577 Pack { length; arr; dummy } 578 579let get_last (Pack a) = 580 let {arr; length; dummy} = a in 581 check_valid_length length arr; 582 (* We know [length <= capacity a]. *) 583 if length = 0 then Error.empty_dynarray "get_last"; 584 (* We know [length > 0]. *) 585 unsafe_get arr ~dummy ~i:(length - 1) ~length 586 587let find_last (Pack a) = 588 let {arr; length; dummy} = a in 589 check_valid_length length arr; 590 (* We know [length <= capacity a]. *) 591 if length = 0 then None 592 else 593 (* We know [length > 0]. *) 594 Some (unsafe_get arr ~dummy ~i:(length - 1) ~length) 595 596(** {1:removing Removing elements} *) 597 598let pop_last (Pack a) = 599 let {arr; length; dummy} = a in 600 check_valid_length length arr; 601 (* We know [length <= capacity a]. *) 602 if length = 0 then raise Not_found; 603 let last = length - 1 in 604 (* We know [length > 0] so [last >= 0]. *) 605 let v = unsafe_get arr ~dummy ~i:last ~length in 606 Array.unsafe_set arr last (Dummy.of_dummy dummy); 607 a.length <- last; 608 v 609 610let pop_last_opt a = 611 match pop_last a with 612 | exception Not_found -> None 613 | x -> Some x 614 615let remove_last (Pack a) = 616 let last = a.length - 1 in 617 if last >= 0 then begin 618 a.length <- last; 619 a.arr.(last) <- Dummy.of_dummy a.dummy; 620 end 621 622let truncate (Pack a) n = 623 if n < 0 then Error.negative_length_requested "truncate" n; 624 let {arr; length; dummy} = a in 625 if length <= n then () 626 else begin 627 a.length <- n; 628 Array.fill arr n (length - n) (Dummy.of_dummy dummy) 629 end 630 631let clear a = truncate a 0 632 633 634(** {1:capacity Backing array and capacity} *) 635 636let capacity (Pack a) = Array.length a.arr 637 638let next_capacity n = 639 let n' = 640 (* For large values of n, we use 1.5 as our growth factor. 641 642 For smaller values of n, we grow more aggressively to avoid 643 reallocating too much when accumulating elements into an empty 644 array. 645 646 The constants "512 words" and "8 words" below are taken from 647 https://github.com/facebook/folly/blob/ 648 c06c0f41d91daf1a6a5f3fc1cd465302ac260459/folly/FBVector.h#L1128-L1157 649 *) 650 if n <= 512 then n * 2 651 else n + n / 2 652 in 653 (* jump directly from 0 to 8 *) 654 min (max 8 n') Sys.max_array_length 655 656let ensure_capacity (Pack a) capacity_request = 657 let arr = a.arr in 658 let cur_capacity = Array.length arr in 659 if capacity_request < 0 then 660 Error.negative_capacity_requested "ensure_capacity" capacity_request 661 else if cur_capacity >= capacity_request then 662 (* This is the fast path, the code up to here must do as little as 663 possible. (This is why we don't use [let {arr; length} = a] as 664 usual, the length is not needed in the fast path.)*) 665 () 666 else begin 667 if capacity_request > Sys.max_array_length then 668 Error.requested_length_out_of_bounds "ensure_capacity" capacity_request; 669 let new_capacity = 670 (* We use either the next exponential-growth strategy, 671 or the requested strategy, whichever is bigger. 672 673 Compared to only using the exponential-growth strategy, this 674 lets us use less memory by avoiding any overshoot whenever 675 the capacity request is noticeably larger than the current 676 capacity. 677 678 Compared to only using the requested capacity, this avoids 679 losing the amortized guarantee: we allocated "exponentially 680 or more", so the amortization holds. In particular, notice 681 that repeated calls to [ensure_capacity a (length a + 1)] 682 will have amortized-linear rather than quadratic complexity. 683 *) 684 max (next_capacity cur_capacity) capacity_request in 685 assert (new_capacity > 0); 686 let new_arr = 687 Dummy.Array.extend arr ~length:a.length ~dummy:a.dummy ~new_capacity in 688 a.arr <- new_arr; 689 (* postcondition: *) 690 assert (0 <= capacity_request); 691 assert (capacity_request <= Array.length new_arr); 692 end 693 694let ensure_extra_capacity a extra_capacity_request = 695 ensure_capacity a (length a + extra_capacity_request) 696 697let fit_capacity (Pack a) = 698 if Array.length a.arr = a.length 699 then () 700 else a.arr <- Dummy.Array.prefix a.arr a.length 701 702let set_capacity (Pack a) n = 703 if n < 0 then 704 Error.negative_capacity_requested "set_capacity" n; 705 let arr = a.arr in 706 let cur_capacity = Array.length arr in 707 if n < cur_capacity then begin 708 a.length <- min a.length n; 709 a.arr <- Dummy.Array.prefix arr n; 710 end 711 else if n > cur_capacity then begin 712 a.arr <- 713 Dummy.Array.extend arr ~length:a.length ~dummy:a.dummy ~new_capacity:n; 714 end 715 716let reset (Pack a) = 717 a.length <- 0; 718 a.arr <- [||] 719 720(** {1:adding Adding elements} *) 721 722(* We chose an implementation of [add_last a x] that behaves correctly 723 in presence of asynchronous / re-entrant code execution around 724 allocations and poll points: if another thread or a callback gets 725 executed on allocation, we add the element at the new end of the 726 dynamic array. 727 728 (We do not give the same guarantees in presence of concurrent 729 parallel updates, which are much more expensive to protect 730 against.) 731*) 732 733(* [add_last_if_room a v] only writes the value if there is room, and 734 returns [false] otherwise. *) 735let[@inline] add_last_if_room (Pack a) v = 736 let {arr; length; _} = a in 737 (* we know [0 <= length] *) 738 if length >= Array.length arr then false 739 else begin 740 (* we know [0 <= length < Array.length arr] *) 741 a.length <- length + 1; 742 Array.unsafe_set arr length (Dummy.of_val v); 743 true 744 end 745 746let add_last a x = 747 if add_last_if_room a x then () 748 else begin 749 (* slow path *) 750 let rec grow_and_add a x = 751 ensure_extra_capacity a 1; 752 if not (add_last_if_room a x) 753 then grow_and_add a x 754 in grow_and_add a x 755 end 756 757let rec append_list a li = 758 match li with 759 | [] -> () 760 | x :: xs -> add_last a x; append_list a xs 761 762let append_iter a iter b = 763 iter (fun x -> add_last a x) b 764 765let append_seq a seq = 766 Seq.iter (fun x -> add_last a x) seq 767 768(* blitting *) 769 770let blit_assume_room 771 (Pack src) src_pos src_length 772 (Pack dst) dst_pos dst_length 773 blit_length 774= 775 (* The caller of [blit_assume_room] typically calls 776 [ensure_capacity] right before. This could run asynchronous 777 code. We want to fail reliably on any asynchronous length change, 778 as it may invalidate the source and target ranges provided by the 779 user. So we double-check that the lengths have not changed. *) 780 let src_arr = src.arr in 781 let dst_arr = dst.arr in 782 check_same_length "blit" (Pack src) ~length:src_length; 783 check_same_length "blit" (Pack dst) ~length:dst_length; 784 if dst_pos + blit_length > dst_length then begin 785 dst.length <- dst_pos + blit_length; 786 end; 787 try 788 (* note: [src] and [dst] may be equal when self-blitting, so 789 [src.length] may have been mutated here. *) 790 Dummy.Array.blit 791 src_arr src.dummy src_pos 792 dst_arr dst.dummy dst_pos 793 ~len:blit_length 794 with Dummy.Array.Dummy_found i -> 795 Error.missing_element ~i ~length:src_length 796 797let blit ~src ~src_pos ~dst ~dst_pos ~len = 798 let src_length = length src in 799 let dst_length = length dst in 800 if len < 0 then 801 Printf.ksprintf invalid_arg 802 "Dynarray.blit: invalid blit length (%d)" 803 len; 804 if src_pos < 0 || src_pos + len > src_length then 805 Printf.ksprintf invalid_arg 806 "Dynarray.blit: invalid source region (%d..%d) \ 807 in source dynarray of length %d" 808 src_pos (src_pos + len) src_length; 809 if dst_pos < 0 || dst_pos > dst_length then 810 Printf.ksprintf invalid_arg 811 "Dynarray.blit: invalid target region (%d..%d) \ 812 in target dynarray of length %d" 813 dst_pos (dst_pos + len) dst_length; 814 ensure_capacity dst (dst_pos + len); 815 blit_assume_room 816 src src_pos src_length 817 dst dst_pos dst_length 818 len 819 820(* append_array: same [..._if_room] and loop logic as [add_last]. *) 821 822let append_array_if_room (Pack a) b = 823 let {arr; length = length_a; _} = a in 824 let length_b = Array.length b in 825 if length_a + length_b > Array.length arr then false 826 else begin 827 (* Note: we intentionally update the length *before* filling the 828 elements. This "reserve before fill" approach provides better 829 behavior than "fill then notify" in presence of reentrant 830 modifications (which may occur on [blit] below): 831 832 - If some code asynchronously adds new elements after this 833 length update, they will go after the space we just reserved, 834 and in particular no addition will be lost. If instead we 835 updated the length after the loop, any asynchronous addition 836 during the loop could be erased or erase one of our additions, 837 silently, without warning the user. 838 839 - If some code asynchronously iterates on the dynarray, or 840 removes elements, or otherwise tries to access the 841 reserved-but-not-yet-filled space, it will get a clean "missing 842 element" error. This is worse than with the fill-then-notify 843 approach where the new elements would only become visible 844 (to iterators, for removal, etc.) altogether at the end of 845 loop. 846 847 To summarise, "reserve before fill" is better on add-add races, 848 and "fill then notify" is better on add-remove or add-iterate 849 races. But the key difference is the failure mode: 850 reserve-before fails on add-remove or add-iterate races with 851 a clean error, while notify-after fails on add-add races with 852 silently disappearing data. *) 853 a.length <- length_a + length_b; 854 Dummy.Array.blit_array b 0 arr length_a ~len:length_b; 855 true 856 end 857 858let append_array a b = 859 if append_array_if_room a b then () 860 else begin 861 (* slow path *) 862 let rec grow_and_append a b = 863 ensure_extra_capacity a (Array.length b); 864 if not (append_array_if_room a b) 865 then grow_and_append a b 866 in grow_and_append a b end 867 868(* append: same [..._if_room] and loop logic as [add_last]. *) 869 870(* It is a programming error to mutate the length of [b] during a call 871 to [append a b]. To detect this mistake we keep track of the length 872 of [b] throughout the computation and check it that does not 873 change. 874*) 875let append_if_room (Pack a) b ~length_b = 876 let {arr = arr_a; length = length_a; _} = a in 877 if length_a + length_b > Array.length arr_a then false 878 else begin 879 (* blit [0..length_b-1] 880 into [length_a..length_a+length_b-1]. *) 881 blit_assume_room 882 b 0 length_b 883 (Pack a) length_a length_a 884 length_b; 885 check_same_length "append" b ~length:length_b; 886 true 887 end 888 889let append a b = 890 let length_b = length b in 891 if append_if_room a b ~length_b then () 892 else begin 893 (* slow path *) 894 let rec grow_and_append a b ~length_b = 895 ensure_extra_capacity a length_b; 896 (* Eliding the [check_same_length] call below would be wrong in 897 the case where [a] and [b] are aliases of each other, we 898 would get into an infinite loop instead of failing. 899 900 We could push the call to [append_if_room] itself, but we 901 prefer to keep it in the slow path. *) 902 check_same_length "append" b ~length:length_b; 903 if not (append_if_room a b ~length_b) 904 then grow_and_append a b ~length_b 905 in grow_and_append a b ~length_b 906 end 907 908 909 910(** {1:iteration Iteration} *) 911 912(* The implementation choice that we made for iterators is the one 913 that maximizes efficiency by avoiding repeated bound checking: we 914 check the length of the dynamic array once at the beginning, and 915 then only operate on that portion of the dynarray, ignoring 916 elements added in the meantime. 917 918 The specification states that it is a programming error to mutate 919 the length of the array during iteration. We check for this and 920 raise an error on size change. 921 Note that we may still miss some transient state changes that cancel 922 each other and leave the length unchanged at the next check. 923*) 924 925let iter_ fname f a = 926 let Pack {arr; length; dummy} = a in 927 (* [check_valid_length length arr] is used for memory safety, it 928 guarantees that the backing array has capacity at least [length], 929 allowing unsafe array access. 930 931 [check_same_length] is used for correctness, it lets the function 932 fail more often if we discover the programming error of mutating 933 the length during iteration. 934 935 We could, naively, call [check_same_length] at each iteration of 936 the loop (before or after, or both). However, notice that this is 937 not necessary to detect the removal of elements from [a]: if 938 elements have been removed by the time the [for] loop reaches 939 them, then [unsafe_get] will itself fail with an [Invalid_argument] 940 exception. We only need to detect the addition of new elements to 941 [a] during iteration, and for this it is enough to call 942 [check_same_length] once at the end. 943 944 Calling [check_same_length] more often could catch more 945 programming errors, but the only errors that we miss with this 946 optimization are those that keep the array size constant -- 947 additions and deletions that cancel each other. We consider this 948 an acceptable tradeoff. 949 *) 950 check_valid_length length arr; 951 for i = 0 to length - 1 do 952 f (unsafe_get arr ~dummy ~i ~length); 953 done; 954 check_same_length fname a ~length 955 956let iter f a = 957 iter_ "iter" f a 958 959let iteri f a = 960 let Pack {arr; length; dummy} = a in 961 check_valid_length length arr; 962 for i = 0 to length - 1 do 963 f i (unsafe_get arr ~dummy ~i ~length); 964 done; 965 check_same_length "iteri" a ~length 966 967let rev_iter f a = 968 let Pack {arr; length; dummy} = a in 969 check_valid_length length arr; 970 for i = length - 1 downto 0 do 971 f (unsafe_get arr ~dummy ~i ~length); 972 done; 973 check_same_length "rev_iter" a ~length 974 975let rev_iteri f a = 976 let Pack {arr; length; dummy} = a in 977 check_valid_length length arr; 978 for i = length - 1 downto 0 do 979 f i (unsafe_get arr ~dummy ~i ~length); 980 done; 981 check_same_length "rev_iteri" a ~length 982 983let map f a = 984 let Pack {arr = arr_in; length; dummy} = a in 985 check_valid_length length arr_in; 986 let arr_out = Array.make length (Dummy.of_dummy dummy) in 987 for i = 0 to length - 1 do 988 Array.unsafe_set arr_out i 989 (Dummy.of_val (f (unsafe_get arr_in ~dummy ~i ~length))) 990 done; 991 let res = Pack { 992 length; 993 arr = arr_out; 994 dummy; 995 } in 996 check_same_length "map" a ~length; 997 res 998 999let mapi f a = 1000 let Pack {arr = arr_in; length; dummy} = a in 1001 check_valid_length length arr_in; 1002 let arr_out = Array.make length (Dummy.of_dummy dummy) in 1003 for i = 0 to length - 1 do 1004 Array.unsafe_set arr_out i 1005 (Dummy.of_val (f i (unsafe_get arr_in ~dummy ~i ~length))) 1006 done; 1007 let res = Pack { 1008 length; 1009 arr = arr_out; 1010 dummy; 1011 } in 1012 check_same_length "mapi" a ~length; 1013 res 1014 1015let fold_left f acc a = 1016 let Pack {arr; length; dummy} = a in 1017 check_valid_length length arr; 1018 let r = ref acc in 1019 for i = 0 to length - 1 do 1020 let v = unsafe_get arr ~dummy ~i ~length in 1021 r := f !r v; 1022 done; 1023 check_same_length "fold_left" a ~length; 1024 !r 1025 1026let fold_right f a acc = 1027 let Pack {arr; length; dummy} = a in 1028 check_valid_length length arr; 1029 let r = ref acc in 1030 for i = length - 1 downto 0 do 1031 let v = unsafe_get arr ~dummy ~i ~length in 1032 r := f v !r; 1033 done; 1034 check_same_length "fold_right" a ~length; 1035 !r 1036 1037let exists p a = 1038 let Pack {arr; length; dummy} = a in 1039 check_valid_length length arr; 1040 let rec loop p arr dummy i length = 1041 if i = length then false 1042 else 1043 p (unsafe_get arr ~dummy ~i ~length) 1044 || loop p arr dummy (i + 1) length 1045 in 1046 let res = loop p arr dummy 0 length in 1047 check_same_length "exists" a ~length; 1048 res 1049 1050let for_all p a = 1051 let Pack {arr; length; dummy} = a in 1052 check_valid_length length arr; 1053 let rec loop p arr dummy i length = 1054 if i = length then true 1055 else 1056 p (unsafe_get arr ~dummy ~i ~length) 1057 && loop p arr dummy (i + 1) length 1058 in 1059 let res = loop p arr dummy 0 length in 1060 check_same_length "for_all" a ~length; 1061 res 1062 1063let exists2 p a1 a2 = 1064 let Pack {arr = arr1; length = length1; dummy = dummy1} = a1 in 1065 let Pack {arr = arr2; length = length2; dummy = dummy2} = a2 in 1066 check_valid_length length1 arr1; 1067 check_valid_length length2 arr2; 1068 if length1 <> length2 then 1069 Error.different_lengths "exists2" ~length1 ~length2; 1070 let rec loop p arr1 dummy1 arr2 dummy2 i length = 1071 if i = length then false 1072 else 1073 p (unsafe_get arr1 ~dummy:dummy1 ~i ~length) 1074 (unsafe_get arr2 ~dummy:dummy2 ~i ~length) 1075 || loop p arr1 dummy1 arr2 dummy2 (i + 1) length 1076 in 1077 let res = loop p arr1 dummy1 arr2 dummy2 0 length1 in 1078 check_same_length "exists2" a1 ~length:length1; 1079 check_same_length "exists2" a2 ~length:length2; 1080 res 1081 1082let for_all2 p a1 a2 = 1083 let Pack {arr = arr1; length = length1; dummy = dummy1} = a1 in 1084 let Pack {arr = arr2; length = length2; dummy = dummy2} = a2 in 1085 check_valid_length length1 arr1; 1086 check_valid_length length2 arr2; 1087 if length1 <> length2 then 1088 Error.different_lengths "for_all2" ~length1 ~length2; 1089 let rec loop p arr1 dummy1 arr2 dummy2 i length = 1090 if i = length then true 1091 else 1092 p (unsafe_get arr1 ~dummy:dummy1 ~i ~length) 1093 (unsafe_get arr2 ~dummy:dummy2 ~i ~length) 1094 && loop p arr1 dummy1 arr2 dummy2 (i + 1) length 1095 in 1096 let res = loop p arr1 dummy1 arr2 dummy2 0 length1 in 1097 check_same_length "for_all2" a1 ~length:length1; 1098 check_same_length "for_all2" a2 ~length:length2; 1099 res 1100 1101let filter f a = 1102 let b = create () in 1103 iter_ "filter" (fun x -> if f x then add_last b x) a; 1104 b 1105 1106let filter_map f a = 1107 let b = create () in 1108 iter_ "filter_map" (fun x -> 1109 match f x with 1110 | None -> () 1111 | Some y -> add_last b y 1112 ) a; 1113 b 1114 1115let mem x a = 1116 let Pack {arr; length; dummy} = a in 1117 check_valid_length length arr; 1118 let rec loop i = 1119 if i = length then false 1120 else if Stdlib.compare (unsafe_get arr ~dummy ~i ~length) x = 0 then 1121 true 1122 else loop (succ i) 1123 in 1124 let res = loop 0 in 1125 check_same_length "mem" a ~length; 1126 res 1127 1128let memq x a = 1129 let Pack {arr; length; dummy} = a in 1130 check_valid_length length arr; 1131 let rec loop i = 1132 if i = length then false 1133 else if (unsafe_get arr ~dummy ~i ~length) == x then 1134 true 1135 else loop (succ i) 1136 in 1137 let res = loop 0 in 1138 check_same_length "memq" a ~length; 1139 res 1140 1141let find_opt p a = 1142 let Pack {arr; length; dummy} = a in 1143 check_valid_length length arr; 1144 let rec loop i = 1145 if i = length then None 1146 else 1147 let x = unsafe_get arr ~dummy ~i ~length in 1148 if p x then Some x 1149 else loop (succ i) 1150 in 1151 let res = loop 0 in 1152 check_same_length "find_opt" a ~length; 1153 res 1154 1155let find_index p a = 1156 let Pack {arr; length; dummy} = a in 1157 check_valid_length length arr; 1158 let rec loop i = 1159 if i = length then None 1160 else 1161 let x = unsafe_get arr ~dummy ~i ~length in 1162 if p x then Some i 1163 else loop (succ i) 1164 in 1165 let res = loop 0 in 1166 check_same_length "find_index" a ~length; 1167 res 1168 1169let find_map p a = 1170 let Pack {arr; length; dummy} = a in 1171 check_valid_length length arr; 1172 let rec loop i = 1173 if i = length then None 1174 else 1175 match p (unsafe_get arr ~dummy ~i ~length) with 1176 | None -> loop (succ i) 1177 | Some _ as r -> r 1178 in 1179 let res = loop 0 in 1180 check_same_length "find_map" a ~length; 1181 res 1182 1183let find_mapi p a = 1184 let Pack {arr; length; dummy} = a in 1185 check_valid_length length arr; 1186 let rec loop i = 1187 if i = length then None 1188 else 1189 match p i (unsafe_get arr ~dummy ~i ~length) with 1190 | None -> loop (succ i) 1191 | Some _ as r -> r 1192 in 1193 let res = loop 0 in 1194 check_same_length "find_mapi" a ~length; 1195 res 1196 1197let equal eq a1 a2 = 1198 let Pack {arr = arr1; length = length; dummy = dum1} = a1 in 1199 let Pack {arr = arr2; length = len2; dummy = dum2} = a2 in 1200 if length <> len2 then false 1201 else begin 1202 check_valid_length length arr1; 1203 check_valid_length length arr2; 1204 let rec loop i = 1205 if i = length then true 1206 else 1207 eq 1208 (unsafe_get arr1 ~dummy:dum1 ~i ~length) 1209 (unsafe_get arr2 ~dummy:dum2 ~i ~length) 1210 && loop (i + 1) 1211 in 1212 let r = loop 0 in 1213 check_same_length "equal" a1 ~length; 1214 check_same_length "equal" a2 ~length; 1215 r 1216 end 1217 1218let compare cmp a1 a2 = 1219 let Pack {arr = arr1; length = length; dummy = dum1} = a1 in 1220 let Pack {arr = arr2; length = len2; dummy = dum2} = a2 in 1221 if length <> len2 then length - len2 1222 else begin 1223 check_valid_length length arr1; 1224 check_valid_length length arr2; 1225 let rec loop i = 1226 if i = length then 0 1227 else 1228 let c = 1229 cmp 1230 (unsafe_get arr1 ~dummy:dum1 ~i ~length) 1231 (unsafe_get arr2 ~dummy:dum2 ~i ~length) 1232 in 1233 if c <> 0 then c 1234 else loop (i + 1) 1235 in 1236 let r = loop 0 in 1237 check_same_length "compare" a1 ~length; 1238 check_same_length "compare" a2 ~length; 1239 r 1240 end 1241 1242(** {1:conversions Conversions to other data structures} *) 1243 1244(* The eager [to_*] conversion functions behave similarly to iterators 1245 in presence of updates during computation. The [*_reentrant] 1246 functions obey their more permissive specification, which tolerates 1247 any concurrent update. *) 1248 1249let of_array a = 1250 let length = Array.length a in 1251 let Dummy.Fresh dummy = global_dummy in 1252 let arr = Dummy.Array.copy_from_array a ~dummy in 1253 Pack { 1254 length; 1255 arr; 1256 dummy; 1257 } 1258 1259let to_array a = 1260 let Pack {arr; length; dummy} = a in 1261 check_valid_length length arr; 1262 let res = Array.init length (fun i -> 1263 unsafe_get arr ~dummy ~i ~length 1264 ) in 1265 check_same_length "to_array" a ~length; 1266 res 1267 1268let of_list li = 1269 let a = Array.of_list li in 1270 let length = Array.length a in 1271 let Dummy.Fresh dummy = global_dummy in 1272 let arr = Dummy.Array.unsafe_nocopy_from_array a ~dummy in 1273 Pack { 1274 length; 1275 arr; 1276 dummy; 1277 } 1278 1279let to_list a = 1280 let Pack {arr; length; dummy} = a in 1281 check_valid_length length arr; 1282 let l = ref [] in 1283 for i = length - 1 downto 0 do 1284 l := unsafe_get arr ~dummy ~i ~length :: !l 1285 done; 1286 check_same_length "to_list" a ~length; 1287 !l 1288 1289let of_seq seq = 1290 let init = create() in 1291 append_seq init seq; 1292 init 1293 1294let to_seq a = 1295 let Pack {arr; length; dummy} = a in 1296 check_valid_length length arr; 1297 let rec aux i = fun () -> 1298 check_same_length "to_seq" a ~length; 1299 if i >= length then Seq.Nil 1300 else begin 1301 let v = unsafe_get arr ~dummy ~i ~length in 1302 Seq.Cons (v, aux (i + 1)) 1303 end 1304 in 1305 aux 0 1306 1307let to_seq_reentrant a = 1308 let rec aux i = fun () -> 1309 if i >= length a then Seq.Nil 1310 else begin 1311 let v = get a i in 1312 Seq.Cons (v, aux (i + 1)) 1313 end 1314 in 1315 aux 0 1316 1317let to_seq_rev a = 1318 let Pack {arr; length; dummy} = a in 1319 check_valid_length length arr; 1320 let rec aux i = fun () -> 1321 check_same_length "to_seq_rev" a ~length; 1322 if i < 0 then Seq.Nil 1323 else begin 1324 let v = unsafe_get arr ~dummy ~i ~length in 1325 Seq.Cons (v, aux (i - 1)) 1326 end 1327 in 1328 aux (length - 1) 1329 1330let to_seq_rev_reentrant a = 1331 let rec aux i = fun () -> 1332 if i < 0 then Seq.Nil 1333 else if i >= length a then 1334 (* If some elements have been removed in the meantime, we skip 1335 those elements and continue with the new end of the array. *) 1336 aux (length a - 1) () 1337 else begin 1338 let v = get a i in 1339 Seq.Cons (v, aux (i - 1)) 1340 end 1341 in 1342 aux (length a - 1) 1343 1344external unsafe_iarray_of_array : 'a array -> 'a iarray = "%opaque" 1345 1346let unsafe_to_iarray ~capacity (f : 'a t -> unit) = 1347 let a = create () in 1348 set_capacity a capacity; 1349 f a; 1350 let Pack {arr; length; dummy} = a in 1351 reset a; 1352 (* At this point further updates to [a] (from this domain) will not mutate 1353 [arr]. *) 1354 let capacity = Array.length arr in 1355 check_valid_length length arr; 1356 let values : ('a, _) Dummy.with_dummy array = 1357 if length = capacity then 1358 arr 1359 else (* length < capacity: make a copy *) 1360 Dummy.Array.prefix arr length 1361 in 1362 let values : 'a array = 1363 try Dummy.Array.unsafe_nocopy_to_array values ~dummy 1364 with Dummy.Array.Dummy_found i -> Error.missing_element ~i ~length 1365 in 1366 unsafe_iarray_of_array values