A batteries included HTTP/1.1 client in OCaml
1# HTTP RFC Specification Compliance TODO
2
3This document tracks RFC compliance issues identified in the ocaml-requests library.
4Generated from comprehensive analysis against RFC 9110, 9111, 9112, 7235, 7617, 7616, 6750, 6265, 3986, 2818, and 8446.
5
6## Current Compliance Summary
7
8| RFC | Area | Compliance | Notes |
9|-----|------|------------|-------|
10| RFC 9110 | HTTP Semantics | 95%+ | Excellent - all methods, status codes, headers |
11| RFC 9112 | HTTP/1.1 Syntax | 90%+ | Excellent - security validation complete |
12| RFC 9111 | HTTP Caching | 85%+ | Good - full age calc, heuristic freshness, Vary |
13| RFC 7617/6750/7616 | Authentication | 90%+ | Excellent - userhash, auth-int, bearer form |
14| RFC 6265 | Cookies | 70-80% | Good - delegated to Cookeio |
15| RFC 3986 | URI | 95%+ | Excellent - inlined with full parsing |
16
17---
18
19## Section 1: URI Library Inlining ✓ COMPLETE
20
21**Goal:** Inline the third_party/uri library into requests, replacing Angstrom-based parsing with Eio.Buf_read combinators for consistency with the HTTP parsing stack.
22
23**Current Status:** IMPLEMENTED in `lib/uri.ml` and `lib/uri.mli`
24
25The URI library has been fully inlined with string-based parsing (no Angstrom dependency):
26- [x] All URI parsers implemented (scheme, authority, path, query, fragment)
27- [x] IPv4 and IPv6 address parsing
28- [x] Percent encoding/decoding with component-specific character sets
29- [x] Path normalization and dot segment removal
30- [x] Reference resolution per RFC 3986 Section 5.2
31- [x] Scheme-specific normalization (HTTP/HTTPS defaults)
32- [x] Host case normalization (lowercase)
33
34---
35
36## Section 2: P0 - Security Critical ✓ COMPLETE
37
38### 2.1 Bare CR Validation (RFC 9112) ✓
39
40**RFC Reference:** RFC 9112 Section 2.2
41
42> "A recipient that receives whitespace between the start-line and the first header field MUST either reject the message as invalid or..."
43> "bare CR must be rejected"
44
45**Current Status:** IMPLEMENTED in `lib/http_read.ml`
46
47The `validate_no_bare_cr` function validates all relevant areas:
48- [x] Add bare CR validation in `status_line` parsing
49- [x] Add bare CR validation in `header_line` parsing
50- [x] Add bare CR validation in chunked extension parsing
51
52### 2.2 Chunk Size Overflow Protection ✓
53
54**RFC Reference:** RFC 9112 Section 7.1
55
56**Current Status:** IMPLEMENTED in `lib/http_read.ml`
57
58The `chunk_size` function limits hex digits to 16 (`max_chunk_size_hex_digits`):
59- [x] Add length check before parsing chunk size
60- [x] Protection in both `chunk_size` and `Chunked_body_source.read_chunk_size`
61
62### 2.3 Request Smuggling Prevention ✓
63
64**RFC Reference:** RFC 9112 Section 6.3
65
66> "If a message is received with both a Transfer-Encoding and a Content-Length header field, the Transfer-Encoding overrides the Content-Length."
67
68**Current Status:** IMPLEMENTED in `lib/http_read.ml`
69
70- [x] Transfer-Encoding takes precedence over Content-Length
71- [x] Add explicit logging/warning when both present
72- [ ] Consider rejecting requests with conflicting headers in strict mode
73
74---
75
76## Section 3: P1 - High Priority ✓ COMPLETE
77
78### 3.1 Content-Length Validation ✓
79
80**RFC Reference:** RFC 9110 Section 8.6
81
82> "Any Content-Length field value greater than or equal to zero is valid."
83
84**Current Status:** IMPLEMENTED in `lib/http_read.ml`
85
86- [x] Reject negative Content-Length values explicitly (in `parse_content_length`)
87- [x] Validate Content-Length matches actual body length for responses (raises `Content_length_mismatch`)
88- [x] Add `content_length_mismatch` error type (in `lib/error.ml`)
89
90### 3.2 Age Header Calculation ✓
91
92**RFC Reference:** RFC 9111 Section 4.2.3
93
94> "age_value = delta-seconds"
95> "The Age header field conveys the sender's estimate of the time since the response was generated"
96
97**Current Status:** IMPLEMENTED in `lib/cache_control.ml`
98
99Full RFC 9111 Section 4.2.3 calculation with `age_inputs` type and `calculate_age` function:
100- [x] Add full RFC 9111 Section 4.2.3 age calculation
101- [x] Track `request_time` and `response_time` in cache entries
102- [x] Add `is_fresh` function using calculated age vs max-age
103
104### 3.3 Heuristic Freshness ✓
105
106**RFC Reference:** RFC 9111 Section 4.2.2
107
108> "A cache MAY calculate a heuristic expiration time"
109> "a typical setting of this value might be 10% of the time since the response's Last-Modified field value"
110
111**Current Status:** IMPLEMENTED in `lib/cache_control.ml`
112
113- [x] Add `heuristic_freshness` function
114- [x] Use 10% of (now - Last-Modified) as default (`default_heuristic_fraction`)
115- [ ] Add Warning 113 "Heuristic expiration" for stale responses
116- [x] Add configurable `max_heuristic_age` parameter (`default_max_heuristic_age`)
117
118### 3.4 Digest Auth Enhancements ✓
119
120**RFC Reference:** RFC 7616 Section 3.4
121
122**Current Status:** IMPLEMENTED in `lib/auth.ml`
123
124- [x] Add `userhash` parameter support (in `digest_challenge` and `build_digest_header`)
125- [x] Add SHA-256 support (`hash_string` function)
126- [x] Add `auth-int` qop (in `compute_digest_response` with body hash)
127- [ ] Add `nextnonce` handling for pipelining
128- [x] Add `stale=true` handling (`digest_is_stale` function)
129
130### 3.5 Bearer Token Form Parameter ✓
131
132**RFC Reference:** RFC 6750 Section 2.2
133
134> "Clients MAY use the form-encoded body parameter access_token"
135
136**Current Status:** IMPLEMENTED in `lib/auth.ml`
137
138- [x] Add `Bearer_form of { token : string }` variant to auth type
139- [x] Serialize as `access_token=TOKEN` via `get_bearer_form_body`
140- [x] `is_bearer_form` predicate and `bearer_form` constructor
141
142---
143
144## Section 4: P2 - Medium Priority (Mostly Complete)
145
146### 4.1 Warning Header (Deprecated but Present)
147
148**RFC Reference:** RFC 9111 Section 5.5 (obsoleted)
149
150**Note:** Warning header is obsolete in HTTP but may still be received.
151
152- [ ] Parse Warning header values if present in responses
153- [ ] Generate Warning 110 "Response is Stale" when serving stale cached content
154- [ ] Generate Warning 112 "Disconnected operation" when offline
155
156### 4.2 Vary Header Support ✓
157
158**RFC Reference:** RFC 9111 Section 4.1
159
160> "A cache MUST use the Vary header field to select the representation"
161
162**Current Status:** IMPLEMENTED in `lib/cache.ml`
163
164- [x] Parse Vary header from responses (`parse_vary` function)
165- [x] Match Vary headers for cache lookup (`vary_matches` function)
166- [x] Store request headers needed for Vary matching (`vary_headers` in entry type)
167
168### 4.3 Connection Header Parsing ✓
169
170**RFC Reference:** RFC 9110 Section 7.6.1
171
172> "the connection option 'close' signals that the sender is going to close the connection after the current request/response"
173
174**Current Status:** IMPLEMENTED in `lib/headers.ml`
175
176- [x] Parse full comma-separated Connection header values (`parse_connection_header`)
177- [x] Remove hop-by-hop headers listed in Connection (`remove_hop_by_hop`)
178- [x] Handle `Connection: keep-alive` for HTTP/1.0 (`connection_keep_alive`)
179- [x] Handle `Connection: close` (`connection_close`)
180
181### 4.4 Transfer-Encoding Validation ✓
182
183**RFC Reference:** RFC 9112 Section 6.1
184
185> "A server MUST NOT apply a transfer coding to a response to a HEAD request"
186
187**Current Status:** IMPLEMENTED in `lib/http_read.ml`
188
189- [x] Log warning for Transfer-Encoding in response to HEAD (`validate_no_transfer_encoding`)
190- [x] Log warning for Transfer-Encoding in 1xx, 204, 304 responses
191- [ ] Add test cases for invalid Transfer-Encoding responses
192
193### 4.5 Host Header Validation ✓
194
195**RFC Reference:** RFC 9110 Section 7.2
196
197> "A client MUST send a Host header field in all HTTP/1.1 request messages"
198
199**Current Status:** IMPLEMENTED in `lib/http_write.ml`
200
201- [x] Automatically add Host header from URI if not present
202- [x] Verify Host header matches URI authority (logs warning if mismatch)
203- [x] Handle Host header for CONNECT requests (uses authority-form host:port)
204
205---
206
207## Section 5: P3 - Low Priority / Nice to Have
208
209### 5.1 Trailer Headers ✓
210
211**RFC Reference:** RFC 9110 Section 6.5
212
213> "Trailer allows the sender to include additional fields at the end of a chunked message"
214
215**Current Status:** IMPLEMENTED in `lib/http_read.ml`
216
217- [x] Parse Trailer header (`parse_trailers` function, lines 315-348)
218- [x] Collect trailer fields after final chunk
219- [x] Validate trailers don't include forbidden fields (`forbidden_trailer_headers` list)
220- [x] Log warnings for forbidden headers and skip them
221
222### 5.2 TE Header ✓
223
224**RFC Reference:** RFC 9110 Section 10.1.4
225
226> "The TE header field describes what transfer codings... the client is willing to accept"
227
228**Current Status:** IMPLEMENTED in `lib/headers.ml` and `lib/header_name.ml`
229
230- [x] TE header type support (`Te` variant in header_name.ml)
231- [x] Send `TE: trailers` when trailers are supported (`Headers.te_trailers` function)
232- [x] Generic `Headers.te` function for other TE values
233- [ ] Parse TE header from incoming requests (server-side, not needed for client)
234
235### 5.3 Expect Continue Timeout ✓
236
237**RFC Reference:** RFC 9110 Section 10.1.1
238
239> "A client that will wait for a 100 (Continue) response before sending the request content SHOULD use a reasonable timeout"
240
241**Current Status:** IMPLEMENTED in `lib/expect_continue.ml` and `lib/timeout.ml`
242
243- [x] Add configurable timeout for 100 Continue wait (`Timeout.t.expect_100_continue`)
244- [x] Default to reasonable timeout (1.0 second)
245- [x] Timeout implementation using `Eio.Time.with_timeout_exn` in `http_client.ml`
246- [x] On timeout, sends body anyway per RFC 9110 recommendation
247
248### 5.4 Method Properties Enforcement ✓
249
250**RFC Reference:** RFC 9110 Section 9
251
252**Current Status:** IMPLEMENTED across multiple modules
253
254- [x] Method properties defined (`is_safe`, `is_idempotent`, `is_cacheable` in `method.ml`)
255- [x] Cache only stores GET/HEAD responses (`is_cacheable_method` in `cache.ml`)
256- [x] Retry only retries idempotent methods (GET, HEAD, PUT, DELETE, OPTIONS, TRACE in `retry.ml`)
257- [x] Debug logging when method prevents caching or retry
258- [x] Configurable `strict_method_semantics` option in `Retry.config` (raises error on violation)
259
260### 5.5 URI Normalization for Comparison
261
262**RFC Reference:** RFC 3986 Section 6.2
263
264- [ ] Add `Uri.equivalent` function for comparison after normalization
265- [ ] Case-insensitive scheme and host
266- [ ] Normalize empty path to "/" for http/https
267- [ ] Remove default port numbers
268
269### 5.6 Internationalized Resource Identifiers (IRI)
270
271**RFC Reference:** RFC 3987
272
273- [ ] Add `Uri.of_iri` for IRI to URI conversion
274- [ ] Handle UTF-8 encoding in path and query
275- [ ] Percent-encode non-ASCII characters
276
277---
278
279## Section 6: Cookie Compliance (Delegated to Cookeio)
280
281The library delegates cookie handling to the Cookeio library. These items should be verified in that library:
282
283- [ ] Verify Cookeio handles `SameSite` attribute per RFC 6265bis
284- [ ] Verify `__Host-` and `__Secure-` cookie prefixes
285- [ ] Verify Public Suffix List usage for domain matching
286- [ ] Verify cookie path matching rules
287
288---
289
290## Section 7: Implementation Order
291
292### Phase 1: Security Fixes (P0) ✓ COMPLETE
2931. ✓ Bare CR validation
2942. ✓ Chunk size overflow protection
2953. ✓ Request smuggling logging
296
297### Phase 2: URI Library Inlining ✓ COMPLETE
2981. ✓ Inlined URI library with string-based parsing
2992. ✓ Pct module (percent encoding)
3003. ✓ Path module (normalization)
3014. ✓ Reference resolution and canonicalization
302
303### Phase 3: Core RFC 9111 Compliance ✓ COMPLETE
3041. ✓ Age calculation per Section 4.2.3
3052. ✓ Heuristic freshness per Section 4.2.2
3063. ✓ Vary header support
307
308### Phase 4: Authentication Enhancements ✓ COMPLETE
3091. ✓ Digest auth userhash
3102. ✓ Digest auth auth-int qop
3113. ✓ Bearer form parameter
312
313### Phase 5: Edge Cases and Polish ✓ COMPLETE
3141. ✓ Transfer-Encoding validation
3152. ✓ Connection header parsing
3163. ✓ Trailer header support
3174. ✓ Method property enforcement
3185. ✓ Host header validation
3196. ✓ TE header support
3207. ✓ Expect 100-continue timeout
321
322---
323
324## Previously Completed Fixes
325
326| Priority | Issue | RFC | Status |
327|----------|-------|-----|--------|
328| P0 | Bare CR validation | RFC 9112 Section 2.2 | FIXED |
329| P0 | Chunk size overflow protection | RFC 9112 Section 7.1 | FIXED |
330| P0 | Request smuggling prevention | RFC 9112 Section 6.3 | FIXED |
331| P1 | Content-Length negative validation | RFC 9110 Section 8.6 | FIXED |
332| P1 | Full age calculation | RFC 9111 Section 4.2.3 | FIXED |
333| P1 | Heuristic freshness | RFC 9111 Section 4.2.2 | FIXED |
334| P1 | Digest auth userhash | RFC 7616 Section 3.4 | FIXED |
335| P1 | Digest auth auth-int qop | RFC 7616 Section 3.4 | FIXED |
336| P1 | Bearer token form parameter | RFC 6750 Section 2.2 | FIXED |
337| P2 | Vary header support | RFC 9111 Section 4.1 | FIXED |
338| P2 | Connection header parsing | RFC 9110 Section 7.6.1 | FIXED |
339| P2 | Transfer-Encoding validation | RFC 9112 Section 6.1 | FIXED |
340| Major | URI library inlining | RFC 3986 | FIXED |
341| P2 | Host header validation | RFC 9110 Section 7.2 | FIXED |
342| P3 | Trailer headers | RFC 9110 Section 6.5 | FIXED |
343| P3 | TE header support | RFC 9110 Section 10.1.4 | FIXED |
344| P3 | Expect 100-continue timeout | RFC 9110 Section 10.1.1 | FIXED |
345| P3 | Method properties enforcement | RFC 9110 Section 9 | FIXED |
346| P2 | CONNECT authority-form | RFC 9112 Section 3.2.3 | FIXED |
347| P3 | strict_method_semantics option | RFC 9110 Section 9.2.2 | FIXED |
348| High | 303 redirect method change | RFC 9110 Section 15.4.4 | FIXED |
349| High | obs-fold header handling | RFC 9112 Section 5.2 | FIXED |
350| High | Basic auth username validation | RFC 7617 Section 2 | FIXED |
351| Medium | Close-delimited body reading | RFC 9112 Section 6.3 | FIXED |
352| Medium | Retry-After HTTP-date format | RFC 9110 Section 10.2.3 | FIXED |
353| Medium | 407 proxy auth auto-retry | RFC 7235 Section 3.2 | FIXED |
354| Medium | 417 Expectation Failed retry | RFC 9110 Section 10.1.1 | FIXED |
355| Low | Asterisk-form OPTIONS | RFC 9112 Section 3.2.4 | FIXED |
356| Low | Accept-Language header builder | RFC 9110 Section 12.5.4 | FIXED |
357
358---
359
360## Section 8: Feature Roadmap (Non-RFC)
361
362These are feature enhancements not tied to specific RFC compliance:
363
364### 8.1 Protocol Extensions
365- [ ] HTTP/2 support (RFC 9113 - spec present in spec/)
366- [ ] Unix domain socket support
367
368### 8.2 Security Enhancements
369- [ ] Certificate/public key pinning
370
371### 8.3 API Improvements
372- [ ] Request/response middleware system
373- [ ] Progress callbacks for uploads/downloads
374- [ ] Request cancellation
375
376### 8.4 Testing
377- [ ] Expand unit test coverage for individual modules
378- [ ] Add more edge case tests for HTTP date parsing
379- [ ] Add test cases for invalid Transfer-Encoding responses
380
381### 8.5 Documentation
382- [ ] Add troubleshooting guide to README
383
384---
385
386## Notes
387
388- The library intentionally does not implement cache storage (RFC 9111) as it provides utilities for applications to build their own caching layer
389- SOCKS5 proxy support is declared but not implemented - this is a feature gap, not a compliance issue
390- SHA-512-256 for Digest auth is not implemented due to complexity of the special initialization vectors required
391- HTTP/2 and HTTP/3 are out of scope for this library (HTTP/1.1 only)