A batteries included HTTP/1.1 client in OCaml
at main 479 lines 18 kB view raw view rendered
1# HTTP/2 Connection Pooling Integration Plan 2 3## Problem Statement 4 5The current architecture has two separate connection management systems that conflict: 6 71. **Conpool**: Manages TCP/TLS connections with one-connection-per-request semantics 82. **H2_adapter**: Has its own hashtable cache for HTTP/2 client state 9 10This creates several issues: 11 12### Issue 1: Duplicate Connection Management 13 14``` 15Current Flow: 16 17Requests.make_request 18 19 ├── Conpool.connection_with_info ──────┐ 20 │ (gets TCP/TLS connection) │ 21 │ │ 22 │ ┌─────────────────────────────────┐ │ 23 │ │ If ALPN = "h2": │ │ 24 │ │ H2_adapter.request │◄┘ 25 │ │ │ │ 26 │ │ ├── Own hashtable cache │ ← Duplicates Conpool! 27 │ │ └── get_or_create_client │ 28 │ └─────────────────────────────────┘ 29 30 └── When switch closes → Connection returned to Conpool 31 BUT H2_adapter cache still references it! 32``` 33 34### Issue 2: No True Multiplexing 35 36HTTP/2's key advantage is stream multiplexing - multiple concurrent requests on one connection. Currently: 37 38- Each `Requests.get/post/...` call gets a fresh connection from Conpool 39- Even though HTTP/2 could handle multiple streams on one connection 40- We pay full connection setup cost per request 41 42### Issue 3: Race Conditions 43 44- `H2_adapter` uses `Mutex.t` (Unix/pthreads mutex) 45- This blocks the entire OS thread in Eio, breaking cooperative scheduling 46- Should use `Eio.Mutex.t` for proper Eio integration 47 48### Issue 4: Connection Lifecycle Mismatch 49 50- Conpool expects: get connection → use → release back to pool 51- HTTP/2 expects: establish connection → keep open → multiplex many requests → eventually close 52- These models are fundamentally different 53 54--- 55 56## Proposed Solution: Protocol-Aware Connection Abstraction 57 58### Design Goals 59 601. **Unified API**: Users don't need to know if they're using HTTP/1.1 or HTTP/2 612. **True HTTP/2 Multiplexing**: Multiple concurrent requests share one connection 623. **Eio-Native**: Use Eio concurrency primitives throughout 634. **Backward Compatible**: Existing Requests API unchanged 645. **Efficient Resource Use**: Minimize connection count for HTTP/2 65 66### Architecture Overview 67 68``` 69 Requests Session 70 71 72 ┌────────────────────┐ 73 │ Protocol Router │ 74 │ (chooses handler) │ 75 └─────────┬──────────┘ 76 77 ┌───────────────┴───────────────┐ 78 │ │ 79 ▼ ▼ 80 ┌─────────────────────┐ ┌─────────────────────┐ 81 │ HTTP/1.1 Handler │ │ HTTP/2 Handler │ 82 │ │ │ │ 83 │ Uses Conpool as-is │ │ H2_connection_pool │ 84 │ (1 conn = 1 req) │ │ (1 conn = N reqs) │ 85 └─────────────────────┘ └─────────────────────┘ 86 │ │ 87 ▼ ▼ 88 ┌──────────┐ ┌────────────────┐ 89 │ Conpool │ │ H2_connection │ 90 │ (TCP/TLS)│ │ (multiplexed) │ 91 └──────────┘ └────────────────┘ 92``` 93 94### Key Components 95 96#### 1. `H2_connection_pool` - HTTP/2 Connection Manager 97 98A new module that manages HTTP/2 connections with proper multiplexing: 99 100```ocaml 101(** HTTP/2 Connection Pool. 102 103 Unlike Conpool which manages one-request-per-connection for HTTP/1.1, 104 this module manages long-lived HTTP/2 connections with stream multiplexing. 105 106 Each endpoint (host:port) has at most one HTTP/2 connection with multiple 107 streams. When MAX_CONCURRENT_STREAMS is reached, requests queue until 108 a stream slot becomes available. *) 109 110type t 111 112val create : sw:Eio.Switch.t -> clock:_ Eio.Time.clock -> unit -> t 113(** Create a new HTTP/2 connection pool. *) 114 115type endpoint = { 116 host : string; 117 port : int; 118} 119 120val request : 121 t -> 122 endpoint:endpoint -> 123 establish:(unit -> Eio.Flow.two_way_ty Eio.Resource.t) -> 124 meth:Method.t -> 125 uri:Uri.t -> 126 headers:Headers.t -> 127 body:Body.t -> 128 Response.t 129(** Make an HTTP/2 request, multiplexing on existing connection if available. 130 131 @param establish Function to create new TCP/TLS connection if needed. 132 This is called at most once per endpoint. 133 @raise Error.H2_protocol_error on protocol errors *) 134``` 135 136#### 2. `H2_multiplexed_connection` - Per-Endpoint Connection State 137 138```ocaml 139(** A single multiplexed HTTP/2 connection to an endpoint. *) 140type t = { 141 flow : Eio.Flow.two_way_ty Eio.Resource.t; 142 client : H2_client.t; 143 mutex : Eio.Mutex.t; (* Eio mutex for stream allocation *) 144 mutable active_streams : int; 145 max_concurrent_streams : int; (* From SETTINGS *) 146 stream_available : Eio.Condition.t; (* Signal when stream frees *) 147 mutable closed : bool; 148} 149 150val acquire_stream : t -> unit 151(** Block until a stream slot is available, then reserve it. *) 152 153val release_stream : t -> unit 154(** Release a stream slot, signaling waiters. *) 155 156val request : t -> H2_protocol.request -> H2_protocol.response 157(** Make a request on this connection. *) 158``` 159 160#### 3. Protocol Router Integration 161 162Modify `Requests.make_request` to route based on cached protocol knowledge: 163 164```ocaml 165(* In requests.ml *) 166 167type protocol_hint = 168 | Unknown (* First request to this endpoint *) 169 | Definitely_h1 (* Server doesn't support H2 *) 170 | Definitely_h2 (* ALPN negotiated H2 *) 171 172(* Protocol hint cache per endpoint *) 173let protocol_hints : (string, protocol_hint) Hashtbl.t = Hashtbl.create 64 174 175let make_request_internal ... = 176 let endpoint_key = Printf.sprintf "%s:%d" host port in 177 178 match Hashtbl.find_opt protocol_hints endpoint_key with 179 | Some Definitely_h2 -> 180 (* Use HTTP/2 pool directly - no need for ALPN *) 181 H2_connection_pool.request h2_pool ~endpoint ... 182 183 | Some Definitely_h1 -> 184 (* Use HTTP/1.1 via Conpool *) 185 Conpool.with_connection http_pool endpoint (fun flow -> 186 Http_client.make_request ...) 187 188 | None | Some Unknown -> 189 (* First request - use Conpool to discover protocol via ALPN *) 190 let conn_info = Conpool.connection_with_info ~sw pool endpoint in 191 match conn_info.tls_epoch with 192 | Some { alpn_protocol = Some "h2"; _ } -> 193 Hashtbl.replace protocol_hints endpoint_key Definitely_h2; 194 (* Hand off connection to H2 pool, make request *) 195 H2_connection_pool.adopt_connection h2_pool ~endpoint conn_info.flow; 196 H2_connection_pool.request h2_pool ~endpoint ... 197 | _ -> 198 Hashtbl.replace protocol_hints endpoint_key Definitely_h1; 199 Http_client.make_request ... conn_info.flow 200``` 201 202--- 203 204## Implementation Phases 205 206### Phase 1: Fix Immediate Issues (Low Risk) 207 208**Goal**: Fix race conditions and improve current implementation without architectural changes. 209 2101. **Replace Unix.Mutex with Eio.Mutex in H2_adapter** 211 - File: `lib/h2/h2_adapter.ml` 212 - Change `Mutex.t` to `Eio.Mutex.t` 213 - Update `with_mutex` to use `Eio.Mutex.use_rw` 214 2152. **Add connection validity checks** 216 - Before reusing cached H2_client, verify the underlying flow is still open 217 - Handle GOAWAY frames properly 218 2193. **Add tests for concurrent HTTP/2 requests** 220 - Test multiple fibers making requests to same endpoint 221 - Verify no race conditions 222 223**Estimated scope**: ~50 lines changed, low risk 224 225### Phase 2: H2_connection_pool Module (Medium Risk) 226 227**Goal**: Create proper HTTP/2 connection pooling with multiplexing. 228 2291. **Create `lib/h2/h2_connection_pool.ml[i]`** 230 - Connection state per endpoint 231 - Stream slot management with Eio.Condition 232 - Automatic connection establishment on first request 233 - Connection health monitoring 234 2352. **Create `lib/h2/h2_multiplexed_connection.ml[i]`** 236 - Wrap H2_client with stream counting 237 - MAX_CONCURRENT_STREAMS enforcement 238 - Proper cleanup on connection close 239 2403. **Add comprehensive tests** 241 - Multiplexing: N concurrent requests on 1 connection 242 - Stream exhaustion: >MAX_CONCURRENT_STREAMS requests 243 - Connection failure: mid-request disconnection 244 - GOAWAY handling 245 246**Estimated scope**: ~400 lines new code, 2 new modules 247 248### Phase 3: Protocol Router Integration (Medium Risk) 249 250**Goal**: Integrate H2_connection_pool with Requests session. 251 2521. **Add H2_connection_pool to Requests.t** 253 ```ocaml 254 type t = T : { 255 ... 256 h2_pool : H2_connection_pool.t; (* NEW *) 257 protocol_hints : (string, protocol_hint) Hashtbl.t; (* NEW *) 258 } -> t 259 ``` 260 2612. **Modify connection routing in make_request** 262 - Check protocol hints before connecting 263 - Update hints based on ALPN results 264 - Route HTTP/2 to H2_connection_pool 265 2663. **Handle protocol downgrade** 267 - If server sends HTTP_1_1_REQUIRED error, update hint 268 - Retry request via HTTP/1.1 269 2704. **Update statistics tracking** 271 - Add HTTP/2 connection/stream stats 272 - Expose via `Requests.stats` 273 274**Estimated scope**: ~200 lines changed in requests.ml 275 276### Phase 4: Connection Handoff (Higher Risk) 277 278**Goal**: Seamlessly transfer connections from Conpool to H2_pool. 279 2801. **Add connection "adoption" to H2_connection_pool** 281 - Accept an already-established TLS flow 282 - Perform H2 handshake 283 - Add to pool for future reuse 284 2852. **Prevent Conpool from reclaiming H2 connections** 286 - When connection is handed to H2_pool, don't return it to Conpool 287 - This requires careful lifetime management 288 2893. **Handle edge cases** 290 - Connection fails during adoption 291 - Server rejects H2 after ALPN (rare but possible) 292 - TLS session resumption 293 294**Estimated scope**: ~150 lines, careful lifetime management needed 295 296### Phase 5: Optimizations (Lower Priority) 297 2981. **Preemptive connection establishment** 299 - For known H2 endpoints, establish connection before first request 300 - Reduces latency for subsequent requests 301 3022. **Connection warming** 303 - Maintain minimum connections to frequently-used endpoints 304 - Background PING to keep connections alive 305 3063. **Load balancing across connections** 307 - For very high throughput, allow multiple H2 connections per endpoint 308 - Distribute streams across connections 309 310--- 311 312## Data Structures 313 314### H2_connection_pool State 315 316``` 317┌─────────────────────────────────────────────────────────────┐ 318│ H2_connection_pool.t │ 319├─────────────────────────────────────────────────────────────┤ 320│ connections: (endpoint, H2_multiplexed_connection.t) Hashtbl│ 321│ mutex: Eio.Mutex.t │ 322│ sw: Eio.Switch.t │ 323│ clock: Eio.Time.clock │ 324└─────────────────────────────────────────────────────────────┘ 325326 │ per endpoint 327328┌─────────────────────────────────────────────────────────────┐ 329│ H2_multiplexed_connection.t │ 330├─────────────────────────────────────────────────────────────┤ 331│ flow: Eio.Flow.two_way │ 332│ client: H2_client.t │ 333│ hpack_encoder: H2_hpack.Encoder.t │ 334│ hpack_decoder: H2_hpack.Decoder.t │ 335│ active_streams: int (mutable) │ 336│ max_concurrent_streams: int (from SETTINGS) │ 337│ stream_available: Eio.Condition.t │ 338│ closed: bool (mutable) │ 339│ reader_fiber: unit Eio.Fiber.t (reads frames) │ 340│ last_stream_id: int32 (for GOAWAY) │ 341└─────────────────────────────────────────────────────────────┘ 342``` 343 344### Request Flow with Multiplexing 345 346``` 347Fiber A: GET /users ───┐ 348349Fiber B: GET /posts ───┼───► H2_connection_pool.request 350 │ │ 351Fiber C: GET /comments ───┘ │ 352353 ┌─────────────────────┐ 354 │ acquire_stream() │ 355 │ (blocks if at max) │ 356 └─────────┬───────────┘ 357358 ┌───────────────────┼───────────────────┐ 359 │ │ │ 360 ▼ ▼ ▼ 361 Stream 1 Stream 3 Stream 5 362 (Fiber A) (Fiber B) (Fiber C) 363 │ │ │ 364 └───────────────────┼───────────────────┘ 365366367 ┌─────────────────────┐ 368 │ Single TCP/TLS conn │ 369 │ to example.com:443 │ 370 └─────────────────────┘ 371``` 372 373--- 374 375## API Changes 376 377### New Types in Requests 378 379```ocaml 380(** HTTP/2 connection statistics *) 381type h2_stats = { 382 connections : int; (** Active HTTP/2 connections *) 383 total_streams : int; (** Total streams opened *) 384 active_streams : int; (** Currently active streams *) 385 max_concurrent : int; (** Max concurrent streams (from SETTINGS) *) 386} 387 388(** Extended session statistics *) 389type stats = { 390 (* existing fields *) 391 requests_made : int; 392 total_time : float; 393 retries_count : int; 394 (* new HTTP/2 fields *) 395 h2_stats : h2_stats option; (** HTTP/2 statistics, if any H2 connections *) 396} 397``` 398 399### Backward Compatibility 400 401- All existing `Requests.*` functions unchanged 402- Same API for HTTP/1.1 and HTTP/2 (protocol transparent) 403- New `h2_stats` field is optional, existing code ignores it 404 405--- 406 407## Testing Strategy 408 409### Unit Tests 410 4111. **H2_connection_pool tests** 412 - Create pool, make request, verify stream counting 413 - MAX_CONCURRENT_STREAMS enforcement 414 - Connection reuse verification 415 4162. **Stream slot management** 417 - Concurrent requests within limit 418 - Blocking when at limit 419 - Proper cleanup on request completion 420 421### Integration Tests 422 4231. **Protocol selection** 424 - ALPN negotiation (mock TLS) 425 - Protocol hint caching 426 - Fallback to HTTP/1.1 427 4282. **Concurrent requests** 429 - N fibers making requests to same H2 endpoint 430 - Verify single connection used 431 - Verify all requests complete 432 4333. **Error handling** 434 - Connection drops mid-stream 435 - GOAWAY during request 436 - Server-side RST_STREAM 437 438### Stress Tests 439 4401. **High concurrency** 441 - 100 concurrent requests to same endpoint 442 - Verify multiplexing works 443 - Measure latency vs HTTP/1.1 444 4452. **Connection churn** 446 - Repeated connect/disconnect cycles 447 - No resource leaks 448 449--- 450 451## Risks and Mitigations 452 453| Risk | Likelihood | Impact | Mitigation | 454|------|------------|--------|------------| 455| Stream deadlock | Medium | High | Timeout on stream acquisition, tests | 456| Memory leak | Medium | Medium | Careful resource cleanup, Eio.Switch | 457| Race conditions | Medium | High | Eio.Mutex throughout, no Unix.Mutex | 458| Performance regression | Low | Medium | Benchmark before/after | 459| Compatibility issues | Low | Medium | Extensive testing with real servers | 460 461--- 462 463## Success Criteria 464 4651. **Correctness**: All existing tests pass 4662. **Multiplexing**: Concurrent H2 requests share single connection 4673. **Performance**: H2 requests to same host faster than H1.1 4684. **Resource efficiency**: Connection count reduced for H2 hosts 4695. **No regressions**: HTTP/1.1 behavior unchanged 470 471--- 472 473## References 474 475- RFC 9113 Section 5.1.2 (Stream Concurrency) 476- RFC 9113 Section 6.5.2 (SETTINGS_MAX_CONCURRENT_STREAMS) 477- RFC 9113 Section 6.8 (GOAWAY) 478- Eio documentation (structured concurrency, Mutex, Condition) 479- Current H2_client implementation