A batteries included HTTP/1.1 client in OCaml
at main 391 lines 15 kB view raw view rendered
1# HTTP RFC Specification Compliance TODO 2 3This document tracks RFC compliance issues identified in the ocaml-requests library. 4Generated from comprehensive analysis against RFC 9110, 9111, 9112, 7235, 7617, 7616, 6750, 6265, 3986, 2818, and 8446. 5 6## Current Compliance Summary 7 8| RFC | Area | Compliance | Notes | 9|-----|------|------------|-------| 10| RFC 9110 | HTTP Semantics | 95%+ | Excellent - all methods, status codes, headers | 11| RFC 9112 | HTTP/1.1 Syntax | 90%+ | Excellent - security validation complete | 12| RFC 9111 | HTTP Caching | 85%+ | Good - full age calc, heuristic freshness, Vary | 13| RFC 7617/6750/7616 | Authentication | 90%+ | Excellent - userhash, auth-int, bearer form | 14| RFC 6265 | Cookies | 70-80% | Good - delegated to Cookeio | 15| RFC 3986 | URI | 95%+ | Excellent - inlined with full parsing | 16 17--- 18 19## Section 1: URI Library Inlining ✓ COMPLETE 20 21**Goal:** Inline the third_party/uri library into requests, replacing Angstrom-based parsing with Eio.Buf_read combinators for consistency with the HTTP parsing stack. 22 23**Current Status:** IMPLEMENTED in `lib/uri.ml` and `lib/uri.mli` 24 25The URI library has been fully inlined with string-based parsing (no Angstrom dependency): 26- [x] All URI parsers implemented (scheme, authority, path, query, fragment) 27- [x] IPv4 and IPv6 address parsing 28- [x] Percent encoding/decoding with component-specific character sets 29- [x] Path normalization and dot segment removal 30- [x] Reference resolution per RFC 3986 Section 5.2 31- [x] Scheme-specific normalization (HTTP/HTTPS defaults) 32- [x] Host case normalization (lowercase) 33 34--- 35 36## Section 2: P0 - Security Critical ✓ COMPLETE 37 38### 2.1 Bare CR Validation (RFC 9112) ✓ 39 40**RFC Reference:** RFC 9112 Section 2.2 41 42> "A recipient that receives whitespace between the start-line and the first header field MUST either reject the message as invalid or..." 43> "bare CR must be rejected" 44 45**Current Status:** IMPLEMENTED in `lib/http_read.ml` 46 47The `validate_no_bare_cr` function validates all relevant areas: 48- [x] Add bare CR validation in `status_line` parsing 49- [x] Add bare CR validation in `header_line` parsing 50- [x] Add bare CR validation in chunked extension parsing 51 52### 2.2 Chunk Size Overflow Protection ✓ 53 54**RFC Reference:** RFC 9112 Section 7.1 55 56**Current Status:** IMPLEMENTED in `lib/http_read.ml` 57 58The `chunk_size` function limits hex digits to 16 (`max_chunk_size_hex_digits`): 59- [x] Add length check before parsing chunk size 60- [x] Protection in both `chunk_size` and `Chunked_body_source.read_chunk_size` 61 62### 2.3 Request Smuggling Prevention ✓ 63 64**RFC Reference:** RFC 9112 Section 6.3 65 66> "If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length." 67 68**Current Status:** IMPLEMENTED in `lib/http_read.ml` 69 70- [x] Transfer-Encoding takes precedence over Content-Length 71- [x] Add explicit logging/warning when both present 72- [ ] Consider rejecting requests with conflicting headers in strict mode 73 74--- 75 76## Section 3: P1 - High Priority ✓ COMPLETE 77 78### 3.1 Content-Length Validation ✓ 79 80**RFC Reference:** RFC 9110 Section 8.6 81 82> "Any Content-Length field value greater than or equal to zero is valid." 83 84**Current Status:** IMPLEMENTED in `lib/http_read.ml` 85 86- [x] Reject negative Content-Length values explicitly (in `parse_content_length`) 87- [x] Validate Content-Length matches actual body length for responses (raises `Content_length_mismatch`) 88- [x] Add `content_length_mismatch` error type (in `lib/error.ml`) 89 90### 3.2 Age Header Calculation ✓ 91 92**RFC Reference:** RFC 9111 Section 4.2.3 93 94> "age_value = delta-seconds" 95> "The Age header field conveys the sender's estimate of the time since the response was generated" 96 97**Current Status:** IMPLEMENTED in `lib/cache_control.ml` 98 99Full RFC 9111 Section 4.2.3 calculation with `age_inputs` type and `calculate_age` function: 100- [x] Add full RFC 9111 Section 4.2.3 age calculation 101- [x] Track `request_time` and `response_time` in cache entries 102- [x] Add `is_fresh` function using calculated age vs max-age 103 104### 3.3 Heuristic Freshness ✓ 105 106**RFC Reference:** RFC 9111 Section 4.2.2 107 108> "A cache MAY calculate a heuristic expiration time" 109> "a typical setting of this value might be 10% of the time since the response's Last-Modified field value" 110 111**Current Status:** IMPLEMENTED in `lib/cache_control.ml` 112 113- [x] Add `heuristic_freshness` function 114- [x] Use 10% of (now - Last-Modified) as default (`default_heuristic_fraction`) 115- [ ] Add Warning 113 "Heuristic expiration" for stale responses 116- [x] Add configurable `max_heuristic_age` parameter (`default_max_heuristic_age`) 117 118### 3.4 Digest Auth Enhancements ✓ 119 120**RFC Reference:** RFC 7616 Section 3.4 121 122**Current Status:** IMPLEMENTED in `lib/auth.ml` 123 124- [x] Add `userhash` parameter support (in `digest_challenge` and `build_digest_header`) 125- [x] Add SHA-256 support (`hash_string` function) 126- [x] Add `auth-int` qop (in `compute_digest_response` with body hash) 127- [ ] Add `nextnonce` handling for pipelining 128- [x] Add `stale=true` handling (`digest_is_stale` function) 129 130### 3.5 Bearer Token Form Parameter ✓ 131 132**RFC Reference:** RFC 6750 Section 2.2 133 134> "Clients MAY use the form-encoded body parameter access_token" 135 136**Current Status:** IMPLEMENTED in `lib/auth.ml` 137 138- [x] Add `Bearer_form of { token : string }` variant to auth type 139- [x] Serialize as `access_token=TOKEN` via `get_bearer_form_body` 140- [x] `is_bearer_form` predicate and `bearer_form` constructor 141 142--- 143 144## Section 4: P2 - Medium Priority (Mostly Complete) 145 146### 4.1 Warning Header (Deprecated but Present) 147 148**RFC Reference:** RFC 9111 Section 5.5 (obsoleted) 149 150**Note:** Warning header is obsolete in HTTP but may still be received. 151 152- [ ] Parse Warning header values if present in responses 153- [ ] Generate Warning 110 "Response is Stale" when serving stale cached content 154- [ ] Generate Warning 112 "Disconnected operation" when offline 155 156### 4.2 Vary Header Support ✓ 157 158**RFC Reference:** RFC 9111 Section 4.1 159 160> "A cache MUST use the Vary header field to select the representation" 161 162**Current Status:** IMPLEMENTED in `lib/cache.ml` 163 164- [x] Parse Vary header from responses (`parse_vary` function) 165- [x] Match Vary headers for cache lookup (`vary_matches` function) 166- [x] Store request headers needed for Vary matching (`vary_headers` in entry type) 167 168### 4.3 Connection Header Parsing ✓ 169 170**RFC Reference:** RFC 9110 Section 7.6.1 171 172> "the connection option 'close' signals that the sender is going to close the connection after the current request/response" 173 174**Current Status:** IMPLEMENTED in `lib/headers.ml` 175 176- [x] Parse full comma-separated Connection header values (`parse_connection_header`) 177- [x] Remove hop-by-hop headers listed in Connection (`remove_hop_by_hop`) 178- [x] Handle `Connection: keep-alive` for HTTP/1.0 (`connection_keep_alive`) 179- [x] Handle `Connection: close` (`connection_close`) 180 181### 4.4 Transfer-Encoding Validation ✓ 182 183**RFC Reference:** RFC 9112 Section 6.1 184 185> "A server MUST NOT apply a transfer coding to a response to a HEAD request" 186 187**Current Status:** IMPLEMENTED in `lib/http_read.ml` 188 189- [x] Log warning for Transfer-Encoding in response to HEAD (`validate_no_transfer_encoding`) 190- [x] Log warning for Transfer-Encoding in 1xx, 204, 304 responses 191- [ ] Add test cases for invalid Transfer-Encoding responses 192 193### 4.5 Host Header Validation ✓ 194 195**RFC Reference:** RFC 9110 Section 7.2 196 197> "A client MUST send a Host header field in all HTTP/1.1 request messages" 198 199**Current Status:** IMPLEMENTED in `lib/http_write.ml` 200 201- [x] Automatically add Host header from URI if not present 202- [x] Verify Host header matches URI authority (logs warning if mismatch) 203- [x] Handle Host header for CONNECT requests (uses authority-form host:port) 204 205--- 206 207## Section 5: P3 - Low Priority / Nice to Have 208 209### 5.1 Trailer Headers ✓ 210 211**RFC Reference:** RFC 9110 Section 6.5 212 213> "Trailer allows the sender to include additional fields at the end of a chunked message" 214 215**Current Status:** IMPLEMENTED in `lib/http_read.ml` 216 217- [x] Parse Trailer header (`parse_trailers` function, lines 315-348) 218- [x] Collect trailer fields after final chunk 219- [x] Validate trailers don't include forbidden fields (`forbidden_trailer_headers` list) 220- [x] Log warnings for forbidden headers and skip them 221 222### 5.2 TE Header ✓ 223 224**RFC Reference:** RFC 9110 Section 10.1.4 225 226> "The TE header field describes what transfer codings... the client is willing to accept" 227 228**Current Status:** IMPLEMENTED in `lib/headers.ml` and `lib/header_name.ml` 229 230- [x] TE header type support (`Te` variant in header_name.ml) 231- [x] Send `TE: trailers` when trailers are supported (`Headers.te_trailers` function) 232- [x] Generic `Headers.te` function for other TE values 233- [ ] Parse TE header from incoming requests (server-side, not needed for client) 234 235### 5.3 Expect Continue Timeout ✓ 236 237**RFC Reference:** RFC 9110 Section 10.1.1 238 239> "A client that will wait for a 100 (Continue) response before sending the request content SHOULD use a reasonable timeout" 240 241**Current Status:** IMPLEMENTED in `lib/expect_continue.ml` and `lib/timeout.ml` 242 243- [x] Add configurable timeout for 100 Continue wait (`Timeout.t.expect_100_continue`) 244- [x] Default to reasonable timeout (1.0 second) 245- [x] Timeout implementation using `Eio.Time.with_timeout_exn` in `http_client.ml` 246- [x] On timeout, sends body anyway per RFC 9110 recommendation 247 248### 5.4 Method Properties Enforcement ✓ 249 250**RFC Reference:** RFC 9110 Section 9 251 252**Current Status:** IMPLEMENTED across multiple modules 253 254- [x] Method properties defined (`is_safe`, `is_idempotent`, `is_cacheable` in `method.ml`) 255- [x] Cache only stores GET/HEAD responses (`is_cacheable_method` in `cache.ml`) 256- [x] Retry only retries idempotent methods (GET, HEAD, PUT, DELETE, OPTIONS, TRACE in `retry.ml`) 257- [x] Debug logging when method prevents caching or retry 258- [x] Configurable `strict_method_semantics` option in `Retry.config` (raises error on violation) 259 260### 5.5 URI Normalization for Comparison 261 262**RFC Reference:** RFC 3986 Section 6.2 263 264- [ ] Add `Uri.equivalent` function for comparison after normalization 265- [ ] Case-insensitive scheme and host 266- [ ] Normalize empty path to "/" for http/https 267- [ ] Remove default port numbers 268 269### 5.6 Internationalized Resource Identifiers (IRI) 270 271**RFC Reference:** RFC 3987 272 273- [ ] Add `Uri.of_iri` for IRI to URI conversion 274- [ ] Handle UTF-8 encoding in path and query 275- [ ] Percent-encode non-ASCII characters 276 277--- 278 279## Section 6: Cookie Compliance (Delegated to Cookeio) 280 281The library delegates cookie handling to the Cookeio library. These items should be verified in that library: 282 283- [ ] Verify Cookeio handles `SameSite` attribute per RFC 6265bis 284- [ ] Verify `__Host-` and `__Secure-` cookie prefixes 285- [ ] Verify Public Suffix List usage for domain matching 286- [ ] Verify cookie path matching rules 287 288--- 289 290## Section 7: Implementation Order 291 292### Phase 1: Security Fixes (P0) ✓ COMPLETE 2931. ✓ Bare CR validation 2942. ✓ Chunk size overflow protection 2953. ✓ Request smuggling logging 296 297### Phase 2: URI Library Inlining ✓ COMPLETE 2981. ✓ Inlined URI library with string-based parsing 2992. ✓ Pct module (percent encoding) 3003. ✓ Path module (normalization) 3014. ✓ Reference resolution and canonicalization 302 303### Phase 3: Core RFC 9111 Compliance ✓ COMPLETE 3041. ✓ Age calculation per Section 4.2.3 3052. ✓ Heuristic freshness per Section 4.2.2 3063. ✓ Vary header support 307 308### Phase 4: Authentication Enhancements ✓ COMPLETE 3091. ✓ Digest auth userhash 3102. ✓ Digest auth auth-int qop 3113. ✓ Bearer form parameter 312 313### Phase 5: Edge Cases and Polish ✓ COMPLETE 3141. ✓ Transfer-Encoding validation 3152. ✓ Connection header parsing 3163. ✓ Trailer header support 3174. ✓ Method property enforcement 3185. ✓ Host header validation 3196. ✓ TE header support 3207. ✓ Expect 100-continue timeout 321 322--- 323 324## Previously Completed Fixes 325 326| Priority | Issue | RFC | Status | 327|----------|-------|-----|--------| 328| P0 | Bare CR validation | RFC 9112 Section 2.2 | FIXED | 329| P0 | Chunk size overflow protection | RFC 9112 Section 7.1 | FIXED | 330| P0 | Request smuggling prevention | RFC 9112 Section 6.3 | FIXED | 331| P1 | Content-Length negative validation | RFC 9110 Section 8.6 | FIXED | 332| P1 | Full age calculation | RFC 9111 Section 4.2.3 | FIXED | 333| P1 | Heuristic freshness | RFC 9111 Section 4.2.2 | FIXED | 334| P1 | Digest auth userhash | RFC 7616 Section 3.4 | FIXED | 335| P1 | Digest auth auth-int qop | RFC 7616 Section 3.4 | FIXED | 336| P1 | Bearer token form parameter | RFC 6750 Section 2.2 | FIXED | 337| P2 | Vary header support | RFC 9111 Section 4.1 | FIXED | 338| P2 | Connection header parsing | RFC 9110 Section 7.6.1 | FIXED | 339| P2 | Transfer-Encoding validation | RFC 9112 Section 6.1 | FIXED | 340| Major | URI library inlining | RFC 3986 | FIXED | 341| P2 | Host header validation | RFC 9110 Section 7.2 | FIXED | 342| P3 | Trailer headers | RFC 9110 Section 6.5 | FIXED | 343| P3 | TE header support | RFC 9110 Section 10.1.4 | FIXED | 344| P3 | Expect 100-continue timeout | RFC 9110 Section 10.1.1 | FIXED | 345| P3 | Method properties enforcement | RFC 9110 Section 9 | FIXED | 346| P2 | CONNECT authority-form | RFC 9112 Section 3.2.3 | FIXED | 347| P3 | strict_method_semantics option | RFC 9110 Section 9.2.2 | FIXED | 348| High | 303 redirect method change | RFC 9110 Section 15.4.4 | FIXED | 349| High | obs-fold header handling | RFC 9112 Section 5.2 | FIXED | 350| High | Basic auth username validation | RFC 7617 Section 2 | FIXED | 351| Medium | Close-delimited body reading | RFC 9112 Section 6.3 | FIXED | 352| Medium | Retry-After HTTP-date format | RFC 9110 Section 10.2.3 | FIXED | 353| Medium | 407 proxy auth auto-retry | RFC 7235 Section 3.2 | FIXED | 354| Medium | 417 Expectation Failed retry | RFC 9110 Section 10.1.1 | FIXED | 355| Low | Asterisk-form OPTIONS | RFC 9112 Section 3.2.4 | FIXED | 356| Low | Accept-Language header builder | RFC 9110 Section 12.5.4 | FIXED | 357 358--- 359 360## Section 8: Feature Roadmap (Non-RFC) 361 362These are feature enhancements not tied to specific RFC compliance: 363 364### 8.1 Protocol Extensions 365- [ ] HTTP/2 support (RFC 9113 - spec present in spec/) 366- [ ] Unix domain socket support 367 368### 8.2 Security Enhancements 369- [ ] Certificate/public key pinning 370 371### 8.3 API Improvements 372- [ ] Request/response middleware system 373- [ ] Progress callbacks for uploads/downloads 374- [ ] Request cancellation 375 376### 8.4 Testing 377- [ ] Expand unit test coverage for individual modules 378- [ ] Add more edge case tests for HTTP date parsing 379- [ ] Add test cases for invalid Transfer-Encoding responses 380 381### 8.5 Documentation 382- [ ] Add troubleshooting guide to README 383 384--- 385 386## Notes 387 388- The library intentionally does not implement cache storage (RFC 9111) as it provides utilities for applications to build their own caching layer 389- SOCKS5 proxy support is declared but not implemented - this is a feature gap, not a compliance issue 390- SHA-512-256 for Digest auth is not implemented due to complexity of the special initialization vectors required 391- HTTP/2 and HTTP/3 are out of scope for this library (HTTP/1.1 only)