RFC 5256: IMAP SORT and THREAD Extensions - Implementation Plan#
This document provides a detailed analysis of RFC 5256 against the ocaml-imap implementation and outlines the tasks needed for full compliance.
RFC 5256 Summary#
RFC 5256 defines two IMAP extensions:
- SORT Extension: Server-side sorting of search results by various criteria
- THREAD Extension: Server-side threading of messages using algorithms like ORDEREDSUBJECT and REFERENCES
Key RFC Requirements#
SORT Command (Section 3, BASE.6.4.SORT)#
- Arguments: sort program, charset specification, searching criteria
- Returns:
* SORTresponse with message sequence numbers (or UIDs for UID SORT) - Required charsets: US-ASCII, UTF-8 (MUST implement)
- Sort criteria: ARRIVAL, CC, DATE, FROM, REVERSE, SIZE, SUBJECT, TO
- Capability:
SORT
THREAD Command (Section 3, BASE.6.4.THREAD)#
- Arguments: threading algorithm, charset specification, searching criteria
- Returns:
* THREADresponse with nested parenthesized structure - Required algorithms:
ORDEREDSUBJECT: Simple subject-based threading ("poor man's threading")REFERENCES: Full References/In-Reply-To based threading (JWZ algorithm)
- Capabilities:
THREAD=ORDEREDSUBJECT,THREAD=REFERENCES
Base Subject Extraction (Section 2.1)#
A critical algorithm for both SORT by SUBJECT and threading:
- Convert RFC 2047 encoded-words to UTF-8
- Normalize whitespace (tabs, continuations, multiple spaces -> single space)
- Remove trailing
(fwd)and whitespace (subj-trailer) - Remove leading
Re:,Fw:,Fwd:with optional[blob](subj-leader) - Remove leading
[blob]if non-empty base remains - Handle
[fwd: ... ]wrapper pattern - Result is the "base subject" for sorting/threading
Sent Date (Section 2.2)#
- Extracted from Date: header, normalized to UTC
- Invalid timezone treated as UTC
- Invalid time treated as 00:00:00
- Missing/unparseable date uses INTERNALDATE as fallback
- Tie-breaker: message sequence number
Current Implementation Status#
SORT Extension - Partially Implemented#
| Component | Status | Location |
|---|---|---|
| Sort criteria types | Done | lib/imap/sort.ml, lib/imap/sort.mli |
| Sort keys (ARRIVAL, CC, DATE, FROM, SIZE, SUBJECT, TO) | Done | lib/imap/sort.ml |
| REVERSE modifier | Done | lib/imap/sort.ml |
| SORT command (client) | Done | lib/imap/command.ml |
| UID SORT command | Done | lib/imap/command.ml |
| SORT command serialization | Done | lib/imap/write.ml |
| SORT response parsing | Done | lib/imap/read.ml |
| Client API (sort, uid_sort) | Done | lib/imap/client.ml |
| Base subject extraction | Missing | - |
| Sent date normalization | Missing | - |
| Server-side SORT handler | Missing | lib/imapd/ |
| I18NLEVEL=1 collation | Missing | - |
THREAD Extension - Not Implemented#
| Component | Status | Location |
|---|---|---|
| Thread algorithm types | Missing | - |
| THREAD command type | Missing | - |
| Thread response type | Missing | - |
| ORDEREDSUBJECT algorithm | Missing | - |
| REFERENCES algorithm | Missing | - |
| Thread response serialization | Missing | - |
| Thread response parsing | Missing | - |
| Client API | Missing | - |
| Server-side handler | Missing | - |
Supporting Infrastructure#
| Component | Status | Notes |
|---|---|---|
| Message-ID extraction | Partial | Envelope has message_id field |
| References header access | Missing | Not in Envelope type |
| In-Reply-To access | Partial | Envelope has in_reply_to field |
| RFC 2047 decoding | Missing | Needed for base subject |
| Unicode case-folding | Missing | Required for i;unicode-casemap |
Implementation Tasks#
Priority 1: Core THREAD Infrastructure#
Task 1.1: Create Thread Module Types#
File: lib/imap/thread.ml (new)
(** {0 RFC 5256 THREAD Extension}
Message threading algorithms as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256 Section 3}.
{1 Threading Algorithms}
{2 ORDEREDSUBJECT}
The ORDEREDSUBJECT algorithm ({{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256 BASE.6.4.THREAD})
groups messages by base subject and sorts by sent date. Also known as
"poor man's threading".
{2 REFERENCES}
The REFERENCES algorithm uses the References and In-Reply-To headers
to build parent/child relationships. Based on the JWZ threading algorithm
({{:http://www.jwz.org/doc/threading.html}jwz.org/doc/threading.html}).
*)
(** Threading algorithm *)
type algorithm =
| Orderedsubject
(** {{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256}
ORDEREDSUBJECT - group by base subject, sort by sent date *)
| References
(** {{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256}
REFERENCES - full parent/child threading via References header *)
| Extension of string
(** Future algorithm extensions *)
(** A thread node in the result tree *)
type 'a node =
| Message of 'a * 'a node list (** Message with children *)
| Dummy of 'a node list (** Placeholder for missing parent *)
(** Thread result: list of root-level threads *)
type 'a t = 'a node list
val pp_algorithm : Format.formatter -> algorithm -> unit
val pp : (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
File: lib/imap/thread.mli (new)
(** {0 RFC 5256 THREAD Extension}
@see <https://datatracker.ietf.org/doc/html/rfc5256> RFC 5256 *)
type algorithm =
| Orderedsubject
| References
| Extension of string
type 'a node =
| Message of 'a * 'a node list
| Dummy of 'a node list
type 'a t = 'a node list
val pp_algorithm : Format.formatter -> algorithm -> unit
val pp : (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit
Task 1.2: Add THREAD to Command Types#
File: lib/imap/command.ml
Add to type t:
| Thread of { algorithm : Thread.algorithm; charset : string; search : Search.t }
Add to type uid_command:
| Uid_thread of { algorithm : Thread.algorithm; charset : string; search : Search.t }
Task 1.3: Add THREAD Response Type#
File: lib/imap/response.ml
Add to type t:
| Thread of int64 Thread.t (** THREAD response with message numbers/UIDs *)
Task 1.4: Implement THREAD Serialization#
File: lib/imap/write.ml
let thread_algorithm w = function
| Thread.Orderedsubject -> W.string w "ORDEREDSUBJECT"
| Thread.References -> W.string w "REFERENCES"
| Thread.Extension s -> W.string w s
let write_thread w charset algorithm search =
W.string w "THREAD ";
thread_algorithm w algorithm;
sp w;
astring w charset;
sp w;
search_key w search
Task 1.5: Implement THREAD Response Parsing#
File: lib/imap/read.ml
(** Parse thread-list per RFC 5256 Section 5 formal syntax:
thread-list = "(" (thread-members / thread-nested) ")"
thread-members = nz-number *(SP nz-number) [SP thread-nested]
thread-nested = 2*thread-list *)
let rec thread_list r =
R.char '(' r;
let rec parse_members acc =
match R.peek_char r with
| Some ')' -> R.char ')' r; (List.rev acc, [])
| Some ' ' ->
sp r;
(match R.peek_char r with
| Some '(' -> (List.rev acc, thread_nested r)
| Some c when c >= '0' && c <= '9' ->
parse_members (number64 r :: acc)
| _ -> (List.rev acc, []))
| Some c when c >= '0' && c <= '9' ->
parse_members (number64 r :: acc)
| _ -> (List.rev acc, [])
in
parse_members []
and thread_nested r =
(* 2*thread-list *)
let rec loop acc =
match R.peek_char r with
| Some '(' -> loop (thread_list r :: acc)
| Some ' ' -> sp r; loop acc
| _ -> List.rev acc
in
loop []
Priority 2: Base Subject Extraction#
Task 2.1: Create Subject Module#
File: lib/imap/subject.ml (new)
(** {0 Base Subject Extraction}
Implements the base subject algorithm from
{{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.1}RFC 5256 Section 2.1}.
The base subject is used for:
- SORT by SUBJECT criterion
- ORDEREDSUBJECT threading algorithm
- REFERENCES threading subject grouping (step 5)
{1 Algorithm Steps}
Per RFC 5256 Section 2.1:
{ol
{- Convert RFC 2047 encoded-words to UTF-8, normalize whitespace}
{- Remove trailing subj-trailer (e.g., "(fwd)")}
{- Remove leading subj-leader (e.g., "Re:", "Fw: [list]")}
{- Remove leading subj-blob if non-empty base remains}
{- Repeat steps 3-4}
{- Unwrap [fwd: ... ] wrapper}
{- Result is the base subject}}
*)
(** Extract base subject from a subject line.
@param subject The original subject string (may contain RFC 2047 encoding)
@return The normalized base subject for sorting/threading
Example:
{[
base_subject "Re: [ocaml] Fwd: Meeting notes"
(* Returns: "Meeting notes" *)
]} *)
val base_subject : string -> string
(** Check if a subject indicates a reply or forward.
Per RFC 5256, a message is a reply/forward if base_subject extraction
removed any of:
- A subj-refwd (re:, fw:, fwd:)
- A "(fwd)" trailer
- A [fwd: ... ] wrapper
This is used in REFERENCES threading step 5.C.iv. *)
val is_reply_or_forward : string -> bool
Implementation notes:
- ABNF from RFC 5256 Section 5:
subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"subj-blob = "[" *BLOBCHAR "]" *WSPsubj-trailer = "(fwd)" / WSPsubj-fwd-hdr = "[fwd:"subj-fwd-trl = "]"
Task 2.2: Add RFC 2047 Decoding Support#
File: lib/imap/mime.ml (new)
(** {0 MIME Utilities}
RFC 2047 encoded-word decoding for internationalized headers.
@see <https://datatracker.ietf.org/doc/html/rfc2047> RFC 2047 *)
(** Decode RFC 2047 encoded-words in a header value.
Handles patterns like:
- =?UTF-8?B?base64data?=
- =?ISO-8859-1?Q?quoted=20printable?=
@param value Header value potentially containing encoded-words
@return Decoded UTF-8 string *)
val decode_header : string -> string
Priority 3: Sent Date Handling#
Task 3.1: Create Date Normalization Module#
File: lib/imap/date.ml (new)
(** {0 IMAP Date Handling}
Date parsing and normalization as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.2}RFC 5256 Section 2.2}.
{1 Sent Date}
The "sent date" is the Date: header value normalized to UTC:
- Valid date with timezone: normalize to UTC
- Invalid timezone: treat as UTC
- Invalid time: treat as 00:00:00
- Missing/unparseable: use INTERNALDATE
{1 Comparison}
When comparing sent dates that match exactly, the message sequence
number is used as a tie-breaker (per RFC 5256 Section 2.2). *)
(** Parsed date with optional timezone offset *)
type t
(** Parse a Date: header value.
@param date_header The raw Date: header content
@return Parsed date or None if unparseable *)
val of_header : string -> t option
(** Parse an INTERNALDATE string (IMAP format).
@param internaldate The INTERNALDATE value from server
@return Parsed date *)
val of_internaldate : string -> t
(** Get sent date for a message.
Per RFC 5256: uses Date: header if valid, falls back to INTERNALDATE.
@param date_header Optional Date: header value
@param internaldate INTERNALDATE value
@return Normalized sent date *)
val sent_date : date_header:string option -> internaldate:string -> t
(** Compare two dates.
@return negative if a < b, 0 if equal, positive if a > b *)
val compare : t -> t -> int
Priority 4: Unicode Collation#
Task 4.1: Implement i;unicode-casemap#
File: lib/imap/collation.ml (new)
(** {0 String Collation}
Collation algorithms for IMAP SORT/THREAD as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256#section-7}RFC 5256 Section 7}.
Servers MUST implement i;unicode-casemap collation per
{{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051}.
{1 i;unicode-casemap}
Simple Unicode case-insensitive comparison:
- Apply Unicode Simple Case Folding
- Compare resulting strings as binary
This is required for I18NLEVEL=1 compliance. *)
(** Case-insensitive Unicode comparison.
@return negative if a < b, 0 if equal, positive if a > b *)
val compare_unicode_casemap : string -> string -> int
(** Case-fold a string for comparison.
Applies Unicode Simple Case Folding. *)
val casefold : string -> string
Priority 5: Server-Side Implementation#
Task 5.1: Add SORT/THREAD to Server Protocol#
File: lib/imapd/protocol.ml
Add Sort and Thread to the command type:
| Sort of { charset : string; criteria : sort_criterion list; search : search_key }
| Thread of { algorithm : thread_algorithm; charset : string; search : search_key }
Task 5.2: Implement Server SORT Handler#
File: lib/imapd/client.ml
(** Handle SORT command per RFC 5256. *)
let handle_sort state ~charset ~criteria ~search ~uid =
(* 1. Search for matching messages *)
let matches = search_messages state.storage state.mailbox search in
(* 2. Fetch sort keys for each message *)
let keyed = List.map (fun msg ->
let keys = List.map (extract_sort_key msg) criteria in
(msg, keys)
) matches in
(* 3. Sort by criteria *)
let sorted = List.sort (compare_by_criteria criteria) keyed in
(* 4. Return sequence numbers or UIDs *)
let ids = List.map (fun (msg, _) ->
if uid then msg.uid else Int64.of_int msg.seq
) sorted in
(* 5. Send SORT response *)
send_response (Sort_response ids)
Task 5.3: Implement ORDEREDSUBJECT Algorithm#
File: lib/imapd/thread.ml (new)
(** ORDEREDSUBJECT threading per RFC 5256.
Algorithm:
1. Sort messages by (base_subject, sent_date)
2. Group by base_subject
3. First message of each group is root
4. Subsequent messages are children of root
5. Sort groups by sent_date of first message *)
let orderedsubject messages =
(* ... *)
Task 5.4: Implement REFERENCES Algorithm#
File: lib/imapd/thread.ml
(** REFERENCES threading per RFC 5256.
This is the JWZ threading algorithm with 6 main steps:
1. Build References relationships
2. Gather parentless messages under root
3. Prune dummy messages
4. Sort root children by sent date
5. Group by base subject
6. Sort all sibling sets by sent date *)
let references messages =
(* Step 1: Build id_table and parent/child links *)
let id_table = Hashtbl.create (List.length messages) in
(* ... 6 steps per RFC 5256 ... *)
Priority 6: Client API#
Task 6.1: Add Thread Methods to Client#
File: lib/imap/client.ml
val thread : t -> algorithm:Thread.algorithm -> charset:string -> Search.t -> int64 Thread.t
(** [thread client ~algorithm ~charset search] threads matching messages.
Requires THREAD=algorithm capability. *)
val uid_thread : t -> algorithm:Thread.algorithm -> charset:string -> Search.t -> int64 Thread.t
(** [uid_thread client ~algorithm ~charset search] like {!thread} but returns UIDs. *)
Priority 7: Capability Advertisement#
Task 7.1: Update Server Capabilities#
File: lib/imapd/client.ml
Add to capability list:
let capabilities = [
(* ... existing ... *)
"SORT";
"THREAD=ORDEREDSUBJECT";
"THREAD=REFERENCES";
]
OCamldoc Citation Templates#
For thread.ml/thread.mli#
(** {0 RFC 5256 THREAD Extension}
Message threading algorithms as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256}.
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-3> THREAD Command
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-5> Formal Syntax *)
For subject.ml/subject.mli#
(** {0 Base Subject Extraction}
Implements {{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.1}RFC 5256 Section 2.1}.
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.1> Base Subject Algorithm
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-5> ABNF for subj-* rules *)
For sort.ml/sort.mli (enhancement)#
(** {0 SORT Criteria}
Sort keys for the SORT command as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256}.
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-3> SORT Command
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.1> Base Subject (for SUBJECT key)
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.2> Sent Date (for DATE key) *)
For date.ml/date.mli#
(** {0 IMAP Date Handling}
Date parsing for SORT/THREAD as specified in
{{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.2}RFC 5256 Section 2.2}.
@see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.2> Sent Date definition
@see <https://datatracker.ietf.org/doc/html/rfc2822#section-3.3> Date-Time format *)
For collation.ml/collation.mli#
(** {0 String Collation}
Implements i;unicode-casemap collation per
{{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051}.
Required for {{:https://datatracker.ietf.org/doc/html/rfc5256#section-7}RFC 5256 Section 7}
(Internationalization Considerations).
@see <https://datatracker.ietf.org/doc/html/rfc5051> i;unicode-casemap
@see <https://datatracker.ietf.org/doc/html/rfc5255> IMAP I18N (I18NLEVEL=1) *)
Priority Summary#
| Priority | Task | Effort | Dependencies |
|---|---|---|---|
| P1 | Thread module types | Low | None |
| P1 | THREAD command/response types | Low | Thread types |
| P1 | THREAD serialization/parsing | Medium | Command types |
| P2 | Base subject extraction | Medium | None |
| P2 | RFC 2047 decoding | Medium | None |
| P3 | Sent date normalization | Medium | None |
| P4 | Unicode collation | Medium | None |
| P5 | Server SORT handler | High | P2, P3, P4 |
| P5 | Server ORDEREDSUBJECT | Medium | P2, P3 |
| P5 | Server REFERENCES | High | P2, P3, P5.1 |
| P6 | Client thread API | Low | P1 |
| P7 | Capability advertisement | Low | P5 |
Testing Requirements#
Unit Tests#
(* test/test_subject.ml *)
let test_base_subject () =
(* RFC 5256 examples *)
assert (Subject.base_subject "Re: test" = "test");
assert (Subject.base_subject "Re: Re: test" = "test");
assert (Subject.base_subject "[PATCH] Re: [ocaml] test" = "test");
assert (Subject.base_subject "Fwd: [Fwd: test]" = "test");
assert (Subject.base_subject "[fwd: wrapped]" = "wrapped");
assert (Subject.base_subject " spaced " = "spaced")
(* test/test_thread.ml *)
let test_orderedsubject () =
(* Test grouping by base subject *)
(* Test sent date ordering within groups *)
(* Test group ordering by first message date *)
let test_references () =
(* Test parent/child from References header *)
(* Test In-Reply-To fallback *)
(* Test duplicate Message-ID handling *)
(* Test loop prevention *)
(* Test dummy message pruning *)
(* Test subject grouping (step 5) *)
Integration Tests#
(* test/integration/test_sort_thread.ml *)
let test_sort_command () =
(* SORT (SUBJECT) UTF-8 ALL *)
(* SORT (DATE REVERSE SUBJECT) UTF-8 SINCE 1-Jan-2020 *)
(* UID SORT (FROM) UTF-8 ALL *)
let test_thread_command () =
(* THREAD ORDEREDSUBJECT UTF-8 ALL *)
(* THREAD REFERENCES UTF-8 SINCE 1-Jan-2020 *)
(* UID THREAD REFERENCES UTF-8 ALL *)
References#
- {{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256} - IMAP SORT and THREAD Extensions
- {{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051} - i;unicode-casemap Collation
- {{:https://datatracker.ietf.org/doc/html/rfc5255}RFC 5255} - IMAP Internationalization
- {{:https://datatracker.ietf.org/doc/html/rfc2047}RFC 2047} - MIME Encoded-Words
- {{:https://datatracker.ietf.org/doc/html/rfc2822}RFC 2822} - Internet Message Format
- {{:http://www.jwz.org/doc/threading.html}JWZ Threading} - Original REFERENCES algorithm