IMAP in OCaml

RFC 5256: IMAP SORT and THREAD Extensions - Implementation Plan#

This document provides a detailed analysis of RFC 5256 against the ocaml-imap implementation and outlines the tasks needed for full compliance.

RFC 5256 Summary#

RFC 5256 defines two IMAP extensions:

  1. SORT Extension: Server-side sorting of search results by various criteria
  2. THREAD Extension: Server-side threading of messages using algorithms like ORDEREDSUBJECT and REFERENCES

Key RFC Requirements#

SORT Command (Section 3, BASE.6.4.SORT)#

  • Arguments: sort program, charset specification, searching criteria
  • Returns: * SORT response with message sequence numbers (or UIDs for UID SORT)
  • Required charsets: US-ASCII, UTF-8 (MUST implement)
  • Sort criteria: ARRIVAL, CC, DATE, FROM, REVERSE, SIZE, SUBJECT, TO
  • Capability: SORT

THREAD Command (Section 3, BASE.6.4.THREAD)#

  • Arguments: threading algorithm, charset specification, searching criteria
  • Returns: * THREAD response with nested parenthesized structure
  • Required algorithms:
    • ORDEREDSUBJECT: Simple subject-based threading ("poor man's threading")
    • REFERENCES: Full References/In-Reply-To based threading (JWZ algorithm)
  • Capabilities: THREAD=ORDEREDSUBJECT, THREAD=REFERENCES

Base Subject Extraction (Section 2.1)#

A critical algorithm for both SORT by SUBJECT and threading:

  1. Convert RFC 2047 encoded-words to UTF-8
  2. Normalize whitespace (tabs, continuations, multiple spaces -> single space)
  3. Remove trailing (fwd) and whitespace (subj-trailer)
  4. Remove leading Re:, Fw:, Fwd: with optional [blob] (subj-leader)
  5. Remove leading [blob] if non-empty base remains
  6. Handle [fwd: ... ] wrapper pattern
  7. Result is the "base subject" for sorting/threading

Sent Date (Section 2.2)#

  • Extracted from Date: header, normalized to UTC
  • Invalid timezone treated as UTC
  • Invalid time treated as 00:00:00
  • Missing/unparseable date uses INTERNALDATE as fallback
  • Tie-breaker: message sequence number

Current Implementation Status#

SORT Extension - Partially Implemented#

Component Status Location
Sort criteria types Done lib/imap/sort.ml, lib/imap/sort.mli
Sort keys (ARRIVAL, CC, DATE, FROM, SIZE, SUBJECT, TO) Done lib/imap/sort.ml
REVERSE modifier Done lib/imap/sort.ml
SORT command (client) Done lib/imap/command.ml
UID SORT command Done lib/imap/command.ml
SORT command serialization Done lib/imap/write.ml
SORT response parsing Done lib/imap/read.ml
Client API (sort, uid_sort) Done lib/imap/client.ml
Base subject extraction Missing -
Sent date normalization Missing -
Server-side SORT handler Missing lib/imapd/
I18NLEVEL=1 collation Missing -

THREAD Extension - Not Implemented#

Component Status Location
Thread algorithm types Missing -
THREAD command type Missing -
Thread response type Missing -
ORDEREDSUBJECT algorithm Missing -
REFERENCES algorithm Missing -
Thread response serialization Missing -
Thread response parsing Missing -
Client API Missing -
Server-side handler Missing -

Supporting Infrastructure#

Component Status Notes
Message-ID extraction Partial Envelope has message_id field
References header access Missing Not in Envelope type
In-Reply-To access Partial Envelope has in_reply_to field
RFC 2047 decoding Missing Needed for base subject
Unicode case-folding Missing Required for i;unicode-casemap

Implementation Tasks#

Priority 1: Core THREAD Infrastructure#

Task 1.1: Create Thread Module Types#

File: lib/imap/thread.ml (new)

(** {0 RFC 5256 THREAD Extension}

    Message threading algorithms as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256 Section 3}.

    {1 Threading Algorithms}

    {2 ORDEREDSUBJECT}

    The ORDEREDSUBJECT algorithm ({{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256 BASE.6.4.THREAD})
    groups messages by base subject and sorts by sent date. Also known as
    "poor man's threading".

    {2 REFERENCES}

    The REFERENCES algorithm uses the References and In-Reply-To headers
    to build parent/child relationships. Based on the JWZ threading algorithm
    ({{:http://www.jwz.org/doc/threading.html}jwz.org/doc/threading.html}).
*)

(** Threading algorithm *)
type algorithm =
  | Orderedsubject
      (** {{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256}
          ORDEREDSUBJECT - group by base subject, sort by sent date *)
  | References
      (** {{:https://datatracker.ietf.org/doc/html/rfc5256#section-3}RFC 5256}
          REFERENCES - full parent/child threading via References header *)
  | Extension of string
      (** Future algorithm extensions *)

(** A thread node in the result tree *)
type 'a node =
  | Message of 'a * 'a node list  (** Message with children *)
  | Dummy of 'a node list         (** Placeholder for missing parent *)

(** Thread result: list of root-level threads *)
type 'a t = 'a node list

val pp_algorithm : Format.formatter -> algorithm -> unit
val pp : (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit

File: lib/imap/thread.mli (new)

(** {0 RFC 5256 THREAD Extension}

    @see <https://datatracker.ietf.org/doc/html/rfc5256> RFC 5256 *)

type algorithm =
  | Orderedsubject
  | References
  | Extension of string

type 'a node =
  | Message of 'a * 'a node list
  | Dummy of 'a node list

type 'a t = 'a node list

val pp_algorithm : Format.formatter -> algorithm -> unit
val pp : (Format.formatter -> 'a -> unit) -> Format.formatter -> 'a t -> unit

Task 1.2: Add THREAD to Command Types#

File: lib/imap/command.ml

Add to type t:

| Thread of { algorithm : Thread.algorithm; charset : string; search : Search.t }

Add to type uid_command:

| Uid_thread of { algorithm : Thread.algorithm; charset : string; search : Search.t }

Task 1.3: Add THREAD Response Type#

File: lib/imap/response.ml

Add to type t:

| Thread of int64 Thread.t  (** THREAD response with message numbers/UIDs *)

Task 1.4: Implement THREAD Serialization#

File: lib/imap/write.ml

let thread_algorithm w = function
  | Thread.Orderedsubject -> W.string w "ORDEREDSUBJECT"
  | Thread.References -> W.string w "REFERENCES"
  | Thread.Extension s -> W.string w s

let write_thread w charset algorithm search =
  W.string w "THREAD ";
  thread_algorithm w algorithm;
  sp w;
  astring w charset;
  sp w;
  search_key w search

Task 1.5: Implement THREAD Response Parsing#

File: lib/imap/read.ml

(** Parse thread-list per RFC 5256 Section 5 formal syntax:
    thread-list = "(" (thread-members / thread-nested) ")"
    thread-members = nz-number *(SP nz-number) [SP thread-nested]
    thread-nested = 2*thread-list *)
let rec thread_list r =
  R.char '(' r;
  let rec parse_members acc =
    match R.peek_char r with
    | Some ')' -> R.char ')' r; (List.rev acc, [])
    | Some ' ' ->
        sp r;
        (match R.peek_char r with
         | Some '(' -> (List.rev acc, thread_nested r)
         | Some c when c >= '0' && c <= '9' ->
             parse_members (number64 r :: acc)
         | _ -> (List.rev acc, []))
    | Some c when c >= '0' && c <= '9' ->
        parse_members (number64 r :: acc)
    | _ -> (List.rev acc, [])
  in
  parse_members []

and thread_nested r =
  (* 2*thread-list *)
  let rec loop acc =
    match R.peek_char r with
    | Some '(' -> loop (thread_list r :: acc)
    | Some ' ' -> sp r; loop acc
    | _ -> List.rev acc
  in
  loop []

Priority 2: Base Subject Extraction#

Task 2.1: Create Subject Module#

File: lib/imap/subject.ml (new)

(** {0 Base Subject Extraction}

    Implements the base subject algorithm from
    {{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.1}RFC 5256 Section 2.1}.

    The base subject is used for:
    - SORT by SUBJECT criterion
    - ORDEREDSUBJECT threading algorithm
    - REFERENCES threading subject grouping (step 5)

    {1 Algorithm Steps}

    Per RFC 5256 Section 2.1:

    {ol
    {- Convert RFC 2047 encoded-words to UTF-8, normalize whitespace}
    {- Remove trailing subj-trailer (e.g., "(fwd)")}
    {- Remove leading subj-leader (e.g., "Re:", "Fw: [list]")}
    {- Remove leading subj-blob if non-empty base remains}
    {- Repeat steps 3-4}
    {- Unwrap [fwd: ... ] wrapper}
    {- Result is the base subject}}
*)

(** Extract base subject from a subject line.

    @param subject The original subject string (may contain RFC 2047 encoding)
    @return The normalized base subject for sorting/threading

    Example:
    {[
      base_subject "Re: [ocaml] Fwd: Meeting notes"
      (* Returns: "Meeting notes" *)
    ]} *)
val base_subject : string -> string

(** Check if a subject indicates a reply or forward.

    Per RFC 5256, a message is a reply/forward if base_subject extraction
    removed any of:
    - A subj-refwd (re:, fw:, fwd:)
    - A "(fwd)" trailer
    - A [fwd: ... ] wrapper

    This is used in REFERENCES threading step 5.C.iv. *)
val is_reply_or_forward : string -> bool

Implementation notes:

  • ABNF from RFC 5256 Section 5:
    • subj-refwd = ("re" / ("fw" ["d"])) *WSP [subj-blob] ":"
    • subj-blob = "[" *BLOBCHAR "]" *WSP
    • subj-trailer = "(fwd)" / WSP
    • subj-fwd-hdr = "[fwd:"
    • subj-fwd-trl = "]"

Task 2.2: Add RFC 2047 Decoding Support#

File: lib/imap/mime.ml (new)

(** {0 MIME Utilities}

    RFC 2047 encoded-word decoding for internationalized headers.

    @see <https://datatracker.ietf.org/doc/html/rfc2047> RFC 2047 *)

(** Decode RFC 2047 encoded-words in a header value.

    Handles patterns like:
    - =?UTF-8?B?base64data?=
    - =?ISO-8859-1?Q?quoted=20printable?=

    @param value Header value potentially containing encoded-words
    @return Decoded UTF-8 string *)
val decode_header : string -> string

Priority 3: Sent Date Handling#

Task 3.1: Create Date Normalization Module#

File: lib/imap/date.ml (new)

(** {0 IMAP Date Handling}

    Date parsing and normalization as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.2}RFC 5256 Section 2.2}.

    {1 Sent Date}

    The "sent date" is the Date: header value normalized to UTC:
    - Valid date with timezone: normalize to UTC
    - Invalid timezone: treat as UTC
    - Invalid time: treat as 00:00:00
    - Missing/unparseable: use INTERNALDATE

    {1 Comparison}

    When comparing sent dates that match exactly, the message sequence
    number is used as a tie-breaker (per RFC 5256 Section 2.2). *)

(** Parsed date with optional timezone offset *)
type t

(** Parse a Date: header value.

    @param date_header The raw Date: header content
    @return Parsed date or None if unparseable *)
val of_header : string -> t option

(** Parse an INTERNALDATE string (IMAP format).

    @param internaldate The INTERNALDATE value from server
    @return Parsed date *)
val of_internaldate : string -> t

(** Get sent date for a message.

    Per RFC 5256: uses Date: header if valid, falls back to INTERNALDATE.

    @param date_header Optional Date: header value
    @param internaldate INTERNALDATE value
    @return Normalized sent date *)
val sent_date : date_header:string option -> internaldate:string -> t

(** Compare two dates.

    @return negative if a < b, 0 if equal, positive if a > b *)
val compare : t -> t -> int

Priority 4: Unicode Collation#

Task 4.1: Implement i;unicode-casemap#

File: lib/imap/collation.ml (new)

(** {0 String Collation}

    Collation algorithms for IMAP SORT/THREAD as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256#section-7}RFC 5256 Section 7}.

    Servers MUST implement i;unicode-casemap collation per
    {{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051}.

    {1 i;unicode-casemap}

    Simple Unicode case-insensitive comparison:
    - Apply Unicode Simple Case Folding
    - Compare resulting strings as binary

    This is required for I18NLEVEL=1 compliance. *)

(** Case-insensitive Unicode comparison.

    @return negative if a < b, 0 if equal, positive if a > b *)
val compare_unicode_casemap : string -> string -> int

(** Case-fold a string for comparison.

    Applies Unicode Simple Case Folding. *)
val casefold : string -> string

Priority 5: Server-Side Implementation#

Task 5.1: Add SORT/THREAD to Server Protocol#

File: lib/imapd/protocol.ml

Add Sort and Thread to the command type:

| Sort of { charset : string; criteria : sort_criterion list; search : search_key }
| Thread of { algorithm : thread_algorithm; charset : string; search : search_key }

Task 5.2: Implement Server SORT Handler#

File: lib/imapd/client.ml

(** Handle SORT command per RFC 5256. *)
let handle_sort state ~charset ~criteria ~search ~uid =
  (* 1. Search for matching messages *)
  let matches = search_messages state.storage state.mailbox search in

  (* 2. Fetch sort keys for each message *)
  let keyed = List.map (fun msg ->
    let keys = List.map (extract_sort_key msg) criteria in
    (msg, keys)
  ) matches in

  (* 3. Sort by criteria *)
  let sorted = List.sort (compare_by_criteria criteria) keyed in

  (* 4. Return sequence numbers or UIDs *)
  let ids = List.map (fun (msg, _) ->
    if uid then msg.uid else Int64.of_int msg.seq
  ) sorted in

  (* 5. Send SORT response *)
  send_response (Sort_response ids)

Task 5.3: Implement ORDEREDSUBJECT Algorithm#

File: lib/imapd/thread.ml (new)

(** ORDEREDSUBJECT threading per RFC 5256.

    Algorithm:
    1. Sort messages by (base_subject, sent_date)
    2. Group by base_subject
    3. First message of each group is root
    4. Subsequent messages are children of root
    5. Sort groups by sent_date of first message *)
let orderedsubject messages =
  (* ... *)

Task 5.4: Implement REFERENCES Algorithm#

File: lib/imapd/thread.ml

(** REFERENCES threading per RFC 5256.

    This is the JWZ threading algorithm with 6 main steps:

    1. Build References relationships
    2. Gather parentless messages under root
    3. Prune dummy messages
    4. Sort root children by sent date
    5. Group by base subject
    6. Sort all sibling sets by sent date *)
let references messages =
  (* Step 1: Build id_table and parent/child links *)
  let id_table = Hashtbl.create (List.length messages) in
  (* ... 6 steps per RFC 5256 ... *)

Priority 6: Client API#

Task 6.1: Add Thread Methods to Client#

File: lib/imap/client.ml

val thread : t -> algorithm:Thread.algorithm -> charset:string -> Search.t -> int64 Thread.t
(** [thread client ~algorithm ~charset search] threads matching messages.
    Requires THREAD=algorithm capability. *)

val uid_thread : t -> algorithm:Thread.algorithm -> charset:string -> Search.t -> int64 Thread.t
(** [uid_thread client ~algorithm ~charset search] like {!thread} but returns UIDs. *)

Priority 7: Capability Advertisement#

Task 7.1: Update Server Capabilities#

File: lib/imapd/client.ml

Add to capability list:

let capabilities = [
  (* ... existing ... *)
  "SORT";
  "THREAD=ORDEREDSUBJECT";
  "THREAD=REFERENCES";
]

OCamldoc Citation Templates#

For thread.ml/thread.mli#

(** {0 RFC 5256 THREAD Extension}

    Message threading algorithms as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256}.

    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-3> THREAD Command
    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-5> Formal Syntax *)

For subject.ml/subject.mli#

(** {0 Base Subject Extraction}

    Implements {{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.1}RFC 5256 Section 2.1}.

    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.1> Base Subject Algorithm
    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-5> ABNF for subj-* rules *)

For sort.ml/sort.mli (enhancement)#

(** {0 SORT Criteria}

    Sort keys for the SORT command as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256}.

    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-3> SORT Command
    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.1> Base Subject (for SUBJECT key)
    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.2> Sent Date (for DATE key) *)

For date.ml/date.mli#

(** {0 IMAP Date Handling}

    Date parsing for SORT/THREAD as specified in
    {{:https://datatracker.ietf.org/doc/html/rfc5256#section-2.2}RFC 5256 Section 2.2}.

    @see <https://datatracker.ietf.org/doc/html/rfc5256#section-2.2> Sent Date definition
    @see <https://datatracker.ietf.org/doc/html/rfc2822#section-3.3> Date-Time format *)

For collation.ml/collation.mli#

(** {0 String Collation}

    Implements i;unicode-casemap collation per
    {{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051}.

    Required for {{:https://datatracker.ietf.org/doc/html/rfc5256#section-7}RFC 5256 Section 7}
    (Internationalization Considerations).

    @see <https://datatracker.ietf.org/doc/html/rfc5051> i;unicode-casemap
    @see <https://datatracker.ietf.org/doc/html/rfc5255> IMAP I18N (I18NLEVEL=1) *)

Priority Summary#

Priority Task Effort Dependencies
P1 Thread module types Low None
P1 THREAD command/response types Low Thread types
P1 THREAD serialization/parsing Medium Command types
P2 Base subject extraction Medium None
P2 RFC 2047 decoding Medium None
P3 Sent date normalization Medium None
P4 Unicode collation Medium None
P5 Server SORT handler High P2, P3, P4
P5 Server ORDEREDSUBJECT Medium P2, P3
P5 Server REFERENCES High P2, P3, P5.1
P6 Client thread API Low P1
P7 Capability advertisement Low P5

Testing Requirements#

Unit Tests#

(* test/test_subject.ml *)
let test_base_subject () =
  (* RFC 5256 examples *)
  assert (Subject.base_subject "Re: test" = "test");
  assert (Subject.base_subject "Re: Re: test" = "test");
  assert (Subject.base_subject "[PATCH] Re: [ocaml] test" = "test");
  assert (Subject.base_subject "Fwd: [Fwd: test]" = "test");
  assert (Subject.base_subject "[fwd: wrapped]" = "wrapped");
  assert (Subject.base_subject "  spaced  " = "spaced")

(* test/test_thread.ml *)
let test_orderedsubject () =
  (* Test grouping by base subject *)
  (* Test sent date ordering within groups *)
  (* Test group ordering by first message date *)

let test_references () =
  (* Test parent/child from References header *)
  (* Test In-Reply-To fallback *)
  (* Test duplicate Message-ID handling *)
  (* Test loop prevention *)
  (* Test dummy message pruning *)
  (* Test subject grouping (step 5) *)

Integration Tests#

(* test/integration/test_sort_thread.ml *)
let test_sort_command () =
  (* SORT (SUBJECT) UTF-8 ALL *)
  (* SORT (DATE REVERSE SUBJECT) UTF-8 SINCE 1-Jan-2020 *)
  (* UID SORT (FROM) UTF-8 ALL *)

let test_thread_command () =
  (* THREAD ORDEREDSUBJECT UTF-8 ALL *)
  (* THREAD REFERENCES UTF-8 SINCE 1-Jan-2020 *)
  (* UID THREAD REFERENCES UTF-8 ALL *)

References#

  • {{:https://datatracker.ietf.org/doc/html/rfc5256}RFC 5256} - IMAP SORT and THREAD Extensions
  • {{:https://datatracker.ietf.org/doc/html/rfc5051}RFC 5051} - i;unicode-casemap Collation
  • {{:https://datatracker.ietf.org/doc/html/rfc5255}RFC 5255} - IMAP Internationalization
  • {{:https://datatracker.ietf.org/doc/html/rfc2047}RFC 2047} - MIME Encoded-Words
  • {{:https://datatracker.ietf.org/doc/html/rfc2822}RFC 2822} - Internet Message Format
  • {{:http://www.jwz.org/doc/threading.html}JWZ Threading} - Original REFERENCES algorithm