this repo has no description

SWIM Protocol Library - Usage Guide#

This library provides a production-ready implementation of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in OCaml 5. It handles cluster membership, failure detection, and messaging.

Key Features#

  • Membership: Automatic discovery and failure detection.
  • Gossip: Efficient state propagation (Alive/Suspect/Dead).
  • Messaging:
    • Broadcast: Eventual consistency (gossip-based) for cluster-wide updates.
    • Direct Send: High-throughput point-to-point UDP messaging.
  • Security: AES-256-GCM encryption.
  • Zero-Copy: Optimized buffer management for high performance.

Getting Started#

1. Define Configuration#

Start with default_config and customize as needed.

open Swim.Types

let config = {
  default_config with
  bind_port = 7946;
  node_name = Some "node-1";
  secret_key = "your-32-byte-secret-key-must-be-32-bytes"; (* 32 bytes for AES-256 *)
  encryption_enabled = true;
}

2. Create and Start a Cluster Node#

Use Cluster.create within an Eio switch.

module Cluster = Swim.Cluster

let () =
  Eio_main.run @@ fun env ->
  Eio.Switch.run @@ fun sw ->
  
  (* Create environment wrapper *)
  let env_wrap = { stdenv = env; sw } in
  
  match Cluster.create ~sw ~env:env_wrap ~config with
  | Error `Invalid_key -> failwith "Invalid secret key"
  | Ok cluster ->
      (* Start background daemons (protocol loop, UDP receiver, TCP listener) *)
      Cluster.start cluster;
      
      Printf.printf "Node started!\n%!";
      
      (* Keep running *)
      Eio.Fiber.await_cancel ()

3. Joining a Cluster#

To join an existing cluster, you need the address of at least one seed node.

let seed_nodes = ["192.168.1.10:7946"] in
match Cluster.join cluster ~seed_nodes with
| Ok () -> Printf.printf "Joined cluster successfully\n"
| Error `No_seeds_reachable -> Printf.printf "Failed to join cluster\n"

Messaging#

Broadcast (Gossip)#

Use broadcast to send data to all nodes. This uses the gossip protocol (piggybacking on membership messages). It is bandwidth-efficient but has higher latency and is eventually consistent.

Best for: Configuration updates, low-frequency state sync.

Cluster.broadcast cluster 
  ~topic:"config-update" 
  ~payload:"{\"version\": 2}"

Direct Send (Point-to-Point)#

Use send to send a message directly to a specific node via UDP. This is high-throughput and low-latency.

Best for: RPC, high-volume data transfer, direct coordination.

(* Send by Node ID *)
let target_node_id = node_id_of_string "node-2" in
Cluster.send cluster 
  ~target:target_node_id 
  ~topic:"ping" 
  ~payload:"pong"

(* Send by Address (if Node ID unknown) *)
let addr = `Udp (Eio.Net.Ipaddr.of_raw "\192\168\001\010", 7946) in
Cluster.send_to_addr cluster 
  ~addr 
  ~topic:"alert" 
  ~payload:"alert-data"

Handling Messages#

Register a callback to handle incoming messages (both broadcast and direct).

Cluster.on_message cluster (fun sender topic payload ->
  Printf.printf "Received '%s' from %s: %s\n" 
    topic 
    (node_id_to_string sender.id) 
    payload
)

Membership Events#

Listen for node lifecycle events.

Eio.Fiber.fork ~sw (fun () ->
  let stream = Cluster.events cluster in
  while true do
    match Eio.Stream.take stream with
    | Join node -> Printf.printf "Node joined: %s\n" (node_id_to_string node.id)
    | Leave node -> Printf.printf "Node left: %s\n" (node_id_to_string node.id)
    | Suspect_event node -> Printf.printf "Node suspected: %s\n" (node_id_to_string node.id)
    | Alive_event node -> Printf.printf "Node alive again: %s\n" (node_id_to_string node.id)
    | Update _ -> ()
  done
)

Configuration Options#

Field Default Description
bind_addr "0.0.0.0" Interface to bind UDP/TCP listeners.
bind_port 7946 Port for SWIM protocol.
protocol_interval 1.0 Seconds between probe rounds. Lower = faster failure detection, higher bandwidth.
probe_timeout 0.5 Seconds to wait for Ack.
indirect_checks 3 Number of peers to ask for indirect probes.
udp_buffer_size 1400 Max UDP packet size (MTU).
secret_key (zeros) 32-byte key for AES-256-GCM.
max_gossip_queue_depth 5000 Max items in broadcast queue before dropping oldest (prevents leaks).

Performance Tips#

  1. Buffer Pool: The library uses zero-copy buffer pools. Ensure send_buffer_count and recv_buffer_count are sufficient for your load (default 16).
  2. Gossip Limit: If broadcasting aggressively, max_gossip_queue_depth protects memory but may drop messages. Use Direct Send for high volume.
  3. Eio: Run within an Eio domain/switch. The library is designed for OCaml 5 multicore.