A fork of mtelver's day10 project

Testing Plan for day10#

Date: 2026-02-03 Status: Proposed Author: Brainstorming session

Overview#

This document describes the comprehensive testing strategy for day10, covering correctness, reliability, and compatibility. The strategy uses a tiered approach: fast tests on every commit (~2 minutes) and a full suite nightly/on-demand (~30-60 minutes).

Design Principles#

  1. Real-world testing - Container-based tests that exercise the actual build/docs pipeline
  2. Fast feedback - Most regressions caught in under 2 minutes
  3. Comprehensive coverage - Nightly runs verify edge cases and real packages
  4. Controlled fixtures - Purpose-built mini repositories for specific scenarios
  5. Fault tolerance verification - Explicit testing of failure modes

Test Architecture#

Two-Tier Strategy#

Tier Purpose Runtime Trigger
Fast Catch most regressions ~2 min Every commit
Full Comprehensive coverage ~30-60 min Nightly, on-demand

Test Types#

tests/
├── unit/                    # Pure OCaml logic tests
│   ├── solver_test.ml
│   ├── atomic_swap_test.ml
│   └── notifier_test.ml
│
├── integration/
│   └── mini_repo/           # Mini opam-repo fixtures
│       ├── simple_build/
│       ├── dep_chain/
│       └── doc_failure/
│
└── full/
    ├── real_snapshots/      # Real opam-repo snapshots
    ├── fault_injection/     # Infrastructure failure tests
    └── large_scale/         # 50+ package tests

Custom Test Harness#

Rather than alcotest or cram tests, we use a custom harness that:

  • Spins up real containers with day10
  • Runs against controlled opam-repository fixtures
  • Verifies outputs match expectations
  • Provides clear failure diagnostics
(* test_harness.mli *)
type test_result =
  | Pass
  | Fail of string
  | Skip of string

type test = {
  name : string;
  run : unit -> test_result;
}

val run_tests : test list -> unit
(** Runs tests, prints results, exits with appropriate code *)

Mini Opam-Repository Fixtures#

Purpose-built test packages for specific scenarios, stored in tests/fixtures/mini-repos/.

Fixture Structure#

tests/fixtures/mini-repos/
├── simple-success/
│   ├── opam-repository/
│   │   └── packages/
│   │       └── test-pkg/
│   │           └── test-pkg.1.0.0/
│   │               └── opam
│   ├── packages.txt
│   └── expected.json
│
├── dependency-chain/
│   └── ...  (A → B → C)
│
├── doc-failure/
│   └── ...  (build succeeds, docs fail)
│
├── build-failure/
│   └── ...  (build fails)
│
├── partial-universe/
│   └── ...  (some deps fail, others succeed)
│
└── dependency-update/
    ├── opam-repository-v1/   # pkg-a.1.0 depends on pkg-b.1.0
    ├── opam-repository-v2/   # pkg-b.2.0 added, pkg-a.1.0 now resolves to it
    ├── packages.txt
    └── expected.json

Example Fixture: simple-success#

# packages/test-pkg/test-pkg.1.0.0/opam
opam-version: "2.0"
build: ["dune" "build" "-p" name]
depends: ["ocaml" "dune"]
(* src/lib.ml *)
let hello () = "Hello from test-pkg"
// expected.json
{
  "packages": [
    {
      "name": "test-pkg",
      "version": "1.0.0",
      "build": "success",
      "docs": "success",
      "files": ["index.html", "Test_pkg/index.html"]
    }
  ]
}

Example Fixture: dependency-update#

This fixture tests the core "fresh solving" principle - that day10 picks up dependency changes rather than caching solutions forever.

Structure:

dependency-update/
├── opam-repository-v1/
│   └── packages/
│       ├── pkg-a/pkg-a.1.0.0/opam    # depends: ["pkg-b" {>= "1.0"}]
│       └── pkg-b/pkg-b.1.0.0/opam
├── opam-repository-v2/
│   └── packages/
│       ├── pkg-a/pkg-a.1.0.0/opam    # same as v1
│       └── pkg-b/
│           ├── pkg-b.1.0.0/opam
│           └── pkg-b.2.0.0/opam      # new version added
├── packages.txt                       # pkg-a.1.0.0
└── expected.json

Test sequence:

let test_dependency_update () =
  let output_dir = Temp.create () in

  (* Run 1: Against v1 repository *)
  Day10.batch
    ~opam_repository:(fixture_dir / "opam-repository-v1")
    ~packages:["pkg-a.1.0.0"]
    ~output_dir;

  (* Verify pkg-a docs link to pkg-b.1.0.0 *)
  let pkg_a_index = read_file (output_dir / "p/pkg-a/1.0.0/index.html") in
  assert (String.is_substring pkg_a_index ~substring:"pkg-b/1.0.0");

  (* Run 2: Against v2 repository (pkg-b.2.0.0 now available) *)
  Day10.batch
    ~opam_repository:(fixture_dir / "opam-repository-v2")
    ~packages:["pkg-a.1.0.0"]
    ~output_dir;

  (* Verify pkg-a docs now link to pkg-b.2.0.0 *)
  let pkg_a_index = read_file (output_dir / "p/pkg-a/1.0.0/index.html") in
  if String.is_substring pkg_a_index ~substring:"pkg-b/2.0.0" then Pass
  else Fail "pkg-a docs still reference old pkg-b version"

This is a key differentiator from ocaml-docs-ci, which would cache the original solution and never pick up pkg-b.2.0.0.

Scenario Coverage#

Fixture Tests
simple-success Basic build and doc generation works
dependency-chain A→B→C builds in correct order
doc-failure Build succeeds, docs fail, old docs preserved
build-failure Build fails, dependents skipped
partial-universe Some packages fail, others succeed independently
atomic-swap Verify .new/.old directory handling
recovery Interrupted swap recovery on restart
dependency-update Re-solve picks up new dependency versions

Fault Injection#

Testing infrastructure failure handling via container resource limits.

Fault Types#

type fault =
  | OOM_limit of int          (* bytes - container memory limit *)
  | Disk_limit of int         (* bytes - tmpfs size *)
  | Timeout of int            (* seconds - build timeout *)
  | Build_script_fail         (* inject failing build script *)
  | Odoc_fail                 (* inject failing odoc *)

val with_fault : fault -> (unit -> 'a) -> 'a

Implementation Strategy#

OOM and Disk limits: Use container cgroup limits directly:

let with_resource_limits ~memory_mb ~disk_mb f =
  (* Container already supports --memory flag *)
  (* Use tmpfs with size limit for disk *)
  let container_args = [
    "--memory"; sprintf "%dM" memory_mb;
    "--mount"; sprintf "type=tmpfs,destination=/build,tmpfs-size=%dM" disk_mb;
  ] in
  run_container ~extra_args:container_args f

Build/odoc failures: Inject wrapper scripts:

#!/bin/bash
# Fake odoc that always fails
echo "Injected odoc failure" >&2
exit 1

Fault Test Examples#

let test_oom_handling () =
  with_fault (OOM_limit (50 * 1024 * 1024)) (fun () ->
    let result = Day10.build ~package:"memory-hog.1.0.0" in
    match result with
    | Error (`Resource_exhausted _) -> Pass
    | _ -> Fail "Expected OOM to be detected"
  )

let test_disk_full_handling () =
  with_fault (Disk_limit (10 * 1024 * 1024)) (fun () ->
    let result = Day10.build ~package:"large-output.1.0.0" in
    match result with
    | Error (`Resource_exhausted _) -> Pass
    | _ -> Fail "Expected disk full to be detected"
  )

let test_timeout_handling () =
  with_fault (Timeout 5) (fun () ->
    let result = Day10.build ~package:"slow-build.1.0.0" in
    match result with
    | Error (`Timeout _) -> Pass
    | _ -> Fail "Expected timeout to be detected"
  )

Notification Testing#

Abstraction layer for testable Zulip integration.

Notifier Interface#

(* notifier.mli *)
type message = {
  stream : string;
  topic : string;
  content : string;
}

type t = {
  send : message -> unit;
}

val zulip : Zulip.Client.t -> t
(** Production notifier using ocaml-zulip *)

val mock : unit -> t * message list ref
(** Returns notifier and ref to collect sent messages *)

val null : t
(** Silent notifier for tests that don't care about notifications *)

Mock Implementation#

let mock () =
  let messages = ref [] in
  let send msg = messages := msg :: !messages in
  ({ send }, messages)

Test Usage#

let test_failure_notification () =
  let notifier, messages = Notifier.mock () in

  (* Run day10 with a package that will fail *)
  Day10.batch
    ~notifier
    ~packages:["will-fail.1.0.0"]
    ~opam_repository:fixtures_dir;

  (* Verify notification was sent *)
  match !messages with
  | [msg] ->
      assert (String.is_substring msg.content ~substring:"build failures");
      assert (String.is_substring msg.content ~substring:"will-fail.1.0.0");
      Pass
  | [] -> Fail "Expected failure notification"
  | _ -> Fail "Expected exactly one notification"

Notification Format Verification#

let test_notification_format () =
  let notifier, messages = Notifier.mock () in

  Day10.batch
    ~notifier
    ~packages:["pkg-a.1.0.0"; "pkg-b.1.0.0"; "pkg-c.1.0.0"]
    ~opam_repository:mixed_results_fixture;

  match !messages with
  | [msg] ->
      (* Verify expected format *)
      assert (String.is_prefix msg.content ~prefix:"📦 day10 run completed");
      assert (String.is_substring msg.content ~substring:"packages built");
      assert (String.is_substring msg.content ~substring:"docs generated");
      Pass
  | _ -> Fail "Expected one summary notification"

Output Validation#

Structure verification for generated documentation.

Validation Types#

type expected_file = {
  path : string;        (* relative path from package doc root *)
  required : bool;      (* false = optional *)
}

type expected_output = {
  package : string;
  version : string;
  status : [ `Success | `Build_fail | `Doc_fail | `Doc_skipped ];
  files : expected_file list;  (* only checked if status = `Success *)
}

Verification Function#

let verify_output ~html_dir expected =
  match expected.status with
  | `Success ->
      let base = html_dir / "p" / expected.package / expected.version in
      List.iter (fun file ->
        let path = base / file.path in
        if not (Sys.file_exists path) then
          if file.required then
            failf "Missing required file: %s" path
          else
            Log.warn "Missing optional file: %s" path
      ) expected.files

  | `Build_fail | `Doc_fail | `Doc_skipped ->
      (* Verify old docs still exist if they existed before *)
      ()

Standard File Expectations#

let standard_doc_files ~package ~has_lib ~has_bin =
  let files = [
    { path = "index.html"; required = true };
  ] in
  let files = if has_lib then
    { path = String.capitalize_ascii package ^ "/index.html"; required = true } :: files
  else files in
  files

Graceful Degradation Verification#

let test_graceful_degradation () =
  (* Setup: create initial successful docs *)
  let output_dir = Temp.create () in
  Day10.batch
    ~packages:["good-pkg.1.0.0"]
    ~output_dir
    ~opam_repository:success_fixture;

  let original_mtime =
    (Unix.stat (output_dir / "p/good-pkg/1.0.0/index.html")).st_mtime in

  (* Now run with a fixture where this package's docs fail *)
  Day10.batch
    ~packages:["good-pkg.1.0.0"]
    ~output_dir
    ~opam_repository:doc_failure_fixture;

  (* Verify original docs preserved *)
  let new_mtime =
    (Unix.stat (output_dir / "p/good-pkg/1.0.0/index.html")).st_mtime in

  if Float.(original_mtime = new_mtime) then Pass
  else Fail "Docs were modified despite failure"

Real Repository Snapshot Tests#

Compatibility testing against real-world packages.

Snapshot Storage#

tests/fixtures/real-snapshots/
├── README.md              # Documents snapshot selection rationale
├── 2026-01-15/            # Baseline snapshot
│   ├── opam-repository/   # git submodule or tarball
│   ├── packages.txt       # 50 representative packages
│   └── expected.json      # Expected outcomes
└── 2026-02-01/            # Post-update snapshot
    └── ...

Package Selection Criteria#

Each snapshot's packages.txt includes:

# Core packages (must always work)
dune.3.17.0
ocamlfind.1.9.6
cmdliner.1.3.0

# Common dependencies (high fan-in)
fmt.0.9.0
logs.0.7.0
astring.0.8.5

# Complex build scenarios
js_of_ocaml.5.8.2       # ppx + js output
cohttp-eio.6.0.0        # many dependencies
irmin.3.9.0             # large package

# Documentation-heavy
odoc.2.4.3              # self-documenting
lwt.5.7.0               # extensive docs

# Known edge cases
ocaml-variants.5.2.0+ox # compiler variant
conf-pkg-config.3       # conf package

Snapshot Test Runner#

let test_real_snapshot ~snapshot_dir =
  let packages = read_lines (snapshot_dir / "packages.txt") in
  let expected = Expected.load (snapshot_dir / "expected.json") in

  let result = Day10.batch
    ~opam_repository:(snapshot_dir / "opam-repository")
    ~packages
    ~output_dir:(Temp.create ())
  in

  List.iter2 (fun pkg exp ->
    match exp.status, Result.find pkg result with
    | `Success, `Success -> ()
    | `Build_fail, `Build_fail _ -> ()
    | `Doc_fail, `Doc_fail _ -> ()
    | expected, actual ->
        failf "%s: expected %s but got %s" pkg
          (show_status expected) (show_status actual)
  ) packages expected.packages

Snapshot Maintenance#

Update snapshots:

  • Quarterly (routine refresh)
  • On major OCaml release (5.3, etc.)
  • On major odoc release
  • On significant opam-repository restructuring

Each update requires:

  1. Create new snapshot directory
  2. Run full test suite
  3. Update expected.json with verified outcomes
  4. Document changes in README.md

Test Execution Strategy#

Tier 1: Fast Tests (Every Commit)#

# .github/workflows/test.yml
fast-tests:
  runs-on: ubuntu-latest
  steps:
    - uses: actions/checkout@v4
    - run: opam install . --deps-only
    - run: dune build
    - run: dune runtest              # Unit tests
    - run: ./tests/run-mini-repo-tests.sh
  timeout-minutes: 5

Tier 2: Full Suite (Nightly + On-Demand)#

full-tests:
  runs-on: ubuntu-latest
  if: |
    github.event_name == 'schedule' ||
    contains(github.event.head_commit.message, '[full-tests]')
  steps:
    - uses: actions/checkout@v4
    - run: opam install . --deps-only
    - run: dune build
    - run: ./tests/run-full-suite.sh
  timeout-minutes: 90

Test Runner Scripts#

#!/bin/bash
# tests/run-mini-repo-tests.sh

set -e

FIXTURES_DIR="tests/fixtures/mini-repos"
WORK_DIR=$(mktemp -d)
trap "rm -rf $WORK_DIR" EXIT

for fixture in "$FIXTURES_DIR"/*/; do
  name=$(basename "$fixture")
  echo "=== Testing: $name ==="

  ./_build/install/default/bin/day10 batch \
    --opam-repository "$fixture/opam-repository" \
    --output-dir "$WORK_DIR/$name/output" \
    --cache-dir "$WORK_DIR/$name/cache" \
    "$fixture/packages.txt"

  ./tests/verify-output.exe "$fixture/expected.json" "$WORK_DIR/$name/output"
done

echo "All mini-repo tests passed!"

Triggering Full Tests#

Three methods:

  1. Nightly cron - Automatic at 2 AM UTC
  2. Commit message - Include [full-tests] in commit message
  3. Manual dispatch - GitHub Actions workflow_dispatch button

Summary#

Component Purpose Tier
Unit tests Pure logic (solver, swap, notifications) Fast
Mini-repo fixtures Integration with controlled packages Fast
Notification mocks Verify Zulip integration Fast
Output validation Verify HTML structure Fast
Fault injection OOM, disk, timeout handling Full
Real snapshots Compatibility with real packages Full

Key Design Decisions#

  1. Custom test harness over alcotest/cram - better control for container-based testing
  2. Mini repos + real snapshots - fast iteration plus real-world confidence
  3. Tiered execution - 2-minute fast tests, 30-60 minute full suite
  4. Container-based fault injection - realistic resource limit testing
  5. Abstracted notifications - clean testing without mock HTTP servers
  6. Structure-only output validation - verify files exist without content diffing

Implementation Priority#

  1. Test harness infrastructure - Custom runner, fixture loading
  2. Mini-repo fixtures - Start with simple-success, dependency-chain
  3. Output validation - File existence checks
  4. Notification mocks - Abstract notifier interface
  5. Fault injection - Container resource limits
  6. Real snapshots - Create first baseline snapshot

References#