SWIM Protocol Library - Usage Guide#
This library provides a production-ready implementation of the SWIM (Scalable Weakly-consistent Infection-style Process Group Membership) protocol in OCaml 5. It handles cluster membership, failure detection, and messaging.
Key Features#
- Membership: Automatic discovery and failure detection.
- Gossip: Efficient state propagation (Alive/Suspect/Dead).
- Messaging:
- Broadcast: Eventual consistency (gossip-based) for cluster-wide updates.
- Direct Send: High-throughput point-to-point UDP messaging.
- Security: AES-256-GCM encryption.
- Zero-Copy: Optimized buffer management for high performance.
Getting Started#
1. Define Configuration#
Start with default_config and customize as needed.
open Swim.Types
let config = {
default_config with
bind_port = 7946;
node_name = Some "node-1";
secret_key = "your-32-byte-secret-key-must-be-32-bytes"; (* 32 bytes for AES-256 *)
encryption_enabled = true;
}
2. Create and Start a Cluster Node#
Use Cluster.create within an Eio switch.
module Cluster = Swim.Cluster
let () =
Eio_main.run @@ fun env ->
Eio.Switch.run @@ fun sw ->
(* Create environment wrapper *)
let env_wrap = { stdenv = env; sw } in
match Cluster.create ~sw ~env:env_wrap ~config with
| Error `Invalid_key -> failwith "Invalid secret key"
| Ok cluster ->
(* Start background daemons (protocol loop, UDP receiver, TCP listener) *)
Cluster.start cluster;
Printf.printf "Node started!\n%!";
(* Keep running *)
Eio.Fiber.await_cancel ()
3. Joining a Cluster#
To join an existing cluster, you need the address of at least one seed node.
let seed_nodes = ["192.168.1.10:7946"] in
match Cluster.join cluster ~seed_nodes with
| Ok () -> Printf.printf "Joined cluster successfully\n"
| Error `No_seeds_reachable -> Printf.printf "Failed to join cluster\n"
Messaging#
Broadcast (Gossip)#
Use broadcast to send data to all nodes. This uses the gossip protocol (piggybacking on membership messages). It is bandwidth-efficient but has higher latency and is eventually consistent.
Best for: Configuration updates, low-frequency state sync.
Cluster.broadcast cluster
~topic:"config-update"
~payload:"{\"version\": 2}"
Direct Send (Point-to-Point)#
Use send to send a message directly to a specific node via UDP. This is high-throughput and low-latency.
Best for: RPC, high-volume data transfer, direct coordination.
(* Send by Node ID *)
let target_node_id = node_id_of_string "node-2" in
Cluster.send cluster
~target:target_node_id
~topic:"ping"
~payload:"pong"
(* Send by Address (if Node ID unknown) *)
let addr = `Udp (Eio.Net.Ipaddr.of_raw "\192\168\001\010", 7946) in
Cluster.send_to_addr cluster
~addr
~topic:"alert"
~payload:"alert-data"
Handling Messages#
Register a callback to handle incoming messages (both broadcast and direct).
Cluster.on_message cluster (fun sender topic payload ->
Printf.printf "Received '%s' from %s: %s\n"
topic
(node_id_to_string sender.id)
payload
)
Membership Events#
Listen for node lifecycle events.
Eio.Fiber.fork ~sw (fun () ->
let stream = Cluster.events cluster in
while true do
match Eio.Stream.take stream with
| Join node -> Printf.printf "Node joined: %s\n" (node_id_to_string node.id)
| Leave node -> Printf.printf "Node left: %s\n" (node_id_to_string node.id)
| Suspect_event node -> Printf.printf "Node suspected: %s\n" (node_id_to_string node.id)
| Alive_event node -> Printf.printf "Node alive again: %s\n" (node_id_to_string node.id)
| Update _ -> ()
done
)
Configuration Options#
| Field | Default | Description |
|---|---|---|
bind_addr |
"0.0.0.0" | Interface to bind UDP/TCP listeners. |
bind_port |
7946 | Port for SWIM protocol. |
protocol_interval |
1.0 | Seconds between probe rounds. Lower = faster failure detection, higher bandwidth. |
probe_timeout |
0.5 | Seconds to wait for Ack. |
indirect_checks |
3 | Number of peers to ask for indirect probes. |
udp_buffer_size |
1400 | Max UDP packet size (MTU). |
secret_key |
(zeros) | 32-byte key for AES-256-GCM. |
max_gossip_queue_depth |
5000 | Max items in broadcast queue before dropping oldest (prevents leaks). |
Performance Tips#
- Buffer Pool: The library uses zero-copy buffer pools. Ensure
send_buffer_countandrecv_buffer_countare sufficient for your load (default 16). - Gossip Limit: If broadcasting aggressively,
max_gossip_queue_depthprotects memory but may drop messages. UseDirect Sendfor high volume. - Eio: Run within an Eio domain/switch. The library is designed for OCaml 5 multicore.