Testing Plan for day10#
Date: 2026-02-03 Status: Proposed Author: Brainstorming session
Overview#
This document describes the comprehensive testing strategy for day10, covering correctness, reliability, and compatibility. The strategy uses a tiered approach: fast tests on every commit (~2 minutes) and a full suite nightly/on-demand (~30-60 minutes).
Design Principles#
- Real-world testing - Container-based tests that exercise the actual build/docs pipeline
- Fast feedback - Most regressions caught in under 2 minutes
- Comprehensive coverage - Nightly runs verify edge cases and real packages
- Controlled fixtures - Purpose-built mini repositories for specific scenarios
- Fault tolerance verification - Explicit testing of failure modes
Test Architecture#
Two-Tier Strategy#
| Tier | Purpose | Runtime | Trigger |
|---|---|---|---|
| Fast | Catch most regressions | ~2 min | Every commit |
| Full | Comprehensive coverage | ~30-60 min | Nightly, on-demand |
Test Types#
tests/
├── unit/ # Pure OCaml logic tests
│ ├── solver_test.ml
│ ├── atomic_swap_test.ml
│ └── notifier_test.ml
│
├── integration/
│ └── mini_repo/ # Mini opam-repo fixtures
│ ├── simple_build/
│ ├── dep_chain/
│ └── doc_failure/
│
└── full/
├── real_snapshots/ # Real opam-repo snapshots
├── fault_injection/ # Infrastructure failure tests
└── large_scale/ # 50+ package tests
Custom Test Harness#
Rather than alcotest or cram tests, we use a custom harness that:
- Spins up real containers with day10
- Runs against controlled opam-repository fixtures
- Verifies outputs match expectations
- Provides clear failure diagnostics
(* test_harness.mli *)
type test_result =
| Pass
| Fail of string
| Skip of string
type test = {
name : string;
run : unit -> test_result;
}
val run_tests : test list -> unit
(** Runs tests, prints results, exits with appropriate code *)
Mini Opam-Repository Fixtures#
Purpose-built test packages for specific scenarios, stored in tests/fixtures/mini-repos/.
Fixture Structure#
tests/fixtures/mini-repos/
├── simple-success/
│ ├── opam-repository/
│ │ └── packages/
│ │ └── test-pkg/
│ │ └── test-pkg.1.0.0/
│ │ └── opam
│ ├── packages.txt
│ └── expected.json
│
├── dependency-chain/
│ └── ... (A → B → C)
│
├── doc-failure/
│ └── ... (build succeeds, docs fail)
│
├── build-failure/
│ └── ... (build fails)
│
├── partial-universe/
│ └── ... (some deps fail, others succeed)
│
└── dependency-update/
├── opam-repository-v1/ # pkg-a.1.0 depends on pkg-b.1.0
├── opam-repository-v2/ # pkg-b.2.0 added, pkg-a.1.0 now resolves to it
├── packages.txt
└── expected.json
Example Fixture: simple-success#
# packages/test-pkg/test-pkg.1.0.0/opam
opam-version: "2.0"
build: ["dune" "build" "-p" name]
depends: ["ocaml" "dune"]
(* src/lib.ml *)
let hello () = "Hello from test-pkg"
// expected.json
{
"packages": [
{
"name": "test-pkg",
"version": "1.0.0",
"build": "success",
"docs": "success",
"files": ["index.html", "Test_pkg/index.html"]
}
]
}
Example Fixture: dependency-update#
This fixture tests the core "fresh solving" principle - that day10 picks up dependency changes rather than caching solutions forever.
Structure:
dependency-update/
├── opam-repository-v1/
│ └── packages/
│ ├── pkg-a/pkg-a.1.0.0/opam # depends: ["pkg-b" {>= "1.0"}]
│ └── pkg-b/pkg-b.1.0.0/opam
├── opam-repository-v2/
│ └── packages/
│ ├── pkg-a/pkg-a.1.0.0/opam # same as v1
│ └── pkg-b/
│ ├── pkg-b.1.0.0/opam
│ └── pkg-b.2.0.0/opam # new version added
├── packages.txt # pkg-a.1.0.0
└── expected.json
Test sequence:
let test_dependency_update () =
let output_dir = Temp.create () in
(* Run 1: Against v1 repository *)
Day10.batch
~opam_repository:(fixture_dir / "opam-repository-v1")
~packages:["pkg-a.1.0.0"]
~output_dir;
(* Verify pkg-a docs link to pkg-b.1.0.0 *)
let pkg_a_index = read_file (output_dir / "p/pkg-a/1.0.0/index.html") in
assert (String.is_substring pkg_a_index ~substring:"pkg-b/1.0.0");
(* Run 2: Against v2 repository (pkg-b.2.0.0 now available) *)
Day10.batch
~opam_repository:(fixture_dir / "opam-repository-v2")
~packages:["pkg-a.1.0.0"]
~output_dir;
(* Verify pkg-a docs now link to pkg-b.2.0.0 *)
let pkg_a_index = read_file (output_dir / "p/pkg-a/1.0.0/index.html") in
if String.is_substring pkg_a_index ~substring:"pkg-b/2.0.0" then Pass
else Fail "pkg-a docs still reference old pkg-b version"
This is a key differentiator from ocaml-docs-ci, which would cache the original solution and never pick up pkg-b.2.0.0.
Scenario Coverage#
| Fixture | Tests |
|---|---|
simple-success |
Basic build and doc generation works |
dependency-chain |
A→B→C builds in correct order |
doc-failure |
Build succeeds, docs fail, old docs preserved |
build-failure |
Build fails, dependents skipped |
partial-universe |
Some packages fail, others succeed independently |
atomic-swap |
Verify .new/.old directory handling |
recovery |
Interrupted swap recovery on restart |
dependency-update |
Re-solve picks up new dependency versions |
Fault Injection#
Testing infrastructure failure handling via container resource limits.
Fault Types#
type fault =
| OOM_limit of int (* bytes - container memory limit *)
| Disk_limit of int (* bytes - tmpfs size *)
| Timeout of int (* seconds - build timeout *)
| Build_script_fail (* inject failing build script *)
| Odoc_fail (* inject failing odoc *)
val with_fault : fault -> (unit -> 'a) -> 'a
Implementation Strategy#
OOM and Disk limits: Use container cgroup limits directly:
let with_resource_limits ~memory_mb ~disk_mb f =
(* Container already supports --memory flag *)
(* Use tmpfs with size limit for disk *)
let container_args = [
"--memory"; sprintf "%dM" memory_mb;
"--mount"; sprintf "type=tmpfs,destination=/build,tmpfs-size=%dM" disk_mb;
] in
run_container ~extra_args:container_args f
Build/odoc failures: Inject wrapper scripts:
#!/bin/bash
# Fake odoc that always fails
echo "Injected odoc failure" >&2
exit 1
Fault Test Examples#
let test_oom_handling () =
with_fault (OOM_limit (50 * 1024 * 1024)) (fun () ->
let result = Day10.build ~package:"memory-hog.1.0.0" in
match result with
| Error (`Resource_exhausted _) -> Pass
| _ -> Fail "Expected OOM to be detected"
)
let test_disk_full_handling () =
with_fault (Disk_limit (10 * 1024 * 1024)) (fun () ->
let result = Day10.build ~package:"large-output.1.0.0" in
match result with
| Error (`Resource_exhausted _) -> Pass
| _ -> Fail "Expected disk full to be detected"
)
let test_timeout_handling () =
with_fault (Timeout 5) (fun () ->
let result = Day10.build ~package:"slow-build.1.0.0" in
match result with
| Error (`Timeout _) -> Pass
| _ -> Fail "Expected timeout to be detected"
)
Notification Testing#
Abstraction layer for testable Zulip integration.
Notifier Interface#
(* notifier.mli *)
type message = {
stream : string;
topic : string;
content : string;
}
type t = {
send : message -> unit;
}
val zulip : Zulip.Client.t -> t
(** Production notifier using ocaml-zulip *)
val mock : unit -> t * message list ref
(** Returns notifier and ref to collect sent messages *)
val null : t
(** Silent notifier for tests that don't care about notifications *)
Mock Implementation#
let mock () =
let messages = ref [] in
let send msg = messages := msg :: !messages in
({ send }, messages)
Test Usage#
let test_failure_notification () =
let notifier, messages = Notifier.mock () in
(* Run day10 with a package that will fail *)
Day10.batch
~notifier
~packages:["will-fail.1.0.0"]
~opam_repository:fixtures_dir;
(* Verify notification was sent *)
match !messages with
| [msg] ->
assert (String.is_substring msg.content ~substring:"build failures");
assert (String.is_substring msg.content ~substring:"will-fail.1.0.0");
Pass
| [] -> Fail "Expected failure notification"
| _ -> Fail "Expected exactly one notification"
Notification Format Verification#
let test_notification_format () =
let notifier, messages = Notifier.mock () in
Day10.batch
~notifier
~packages:["pkg-a.1.0.0"; "pkg-b.1.0.0"; "pkg-c.1.0.0"]
~opam_repository:mixed_results_fixture;
match !messages with
| [msg] ->
(* Verify expected format *)
assert (String.is_prefix msg.content ~prefix:"📦 day10 run completed");
assert (String.is_substring msg.content ~substring:"packages built");
assert (String.is_substring msg.content ~substring:"docs generated");
Pass
| _ -> Fail "Expected one summary notification"
Output Validation#
Structure verification for generated documentation.
Validation Types#
type expected_file = {
path : string; (* relative path from package doc root *)
required : bool; (* false = optional *)
}
type expected_output = {
package : string;
version : string;
status : [ `Success | `Build_fail | `Doc_fail | `Doc_skipped ];
files : expected_file list; (* only checked if status = `Success *)
}
Verification Function#
let verify_output ~html_dir expected =
match expected.status with
| `Success ->
let base = html_dir / "p" / expected.package / expected.version in
List.iter (fun file ->
let path = base / file.path in
if not (Sys.file_exists path) then
if file.required then
failf "Missing required file: %s" path
else
Log.warn "Missing optional file: %s" path
) expected.files
| `Build_fail | `Doc_fail | `Doc_skipped ->
(* Verify old docs still exist if they existed before *)
()
Standard File Expectations#
let standard_doc_files ~package ~has_lib ~has_bin =
let files = [
{ path = "index.html"; required = true };
] in
let files = if has_lib then
{ path = String.capitalize_ascii package ^ "/index.html"; required = true } :: files
else files in
files
Graceful Degradation Verification#
let test_graceful_degradation () =
(* Setup: create initial successful docs *)
let output_dir = Temp.create () in
Day10.batch
~packages:["good-pkg.1.0.0"]
~output_dir
~opam_repository:success_fixture;
let original_mtime =
(Unix.stat (output_dir / "p/good-pkg/1.0.0/index.html")).st_mtime in
(* Now run with a fixture where this package's docs fail *)
Day10.batch
~packages:["good-pkg.1.0.0"]
~output_dir
~opam_repository:doc_failure_fixture;
(* Verify original docs preserved *)
let new_mtime =
(Unix.stat (output_dir / "p/good-pkg/1.0.0/index.html")).st_mtime in
if Float.(original_mtime = new_mtime) then Pass
else Fail "Docs were modified despite failure"
Real Repository Snapshot Tests#
Compatibility testing against real-world packages.
Snapshot Storage#
tests/fixtures/real-snapshots/
├── README.md # Documents snapshot selection rationale
├── 2026-01-15/ # Baseline snapshot
│ ├── opam-repository/ # git submodule or tarball
│ ├── packages.txt # 50 representative packages
│ └── expected.json # Expected outcomes
└── 2026-02-01/ # Post-update snapshot
└── ...
Package Selection Criteria#
Each snapshot's packages.txt includes:
# Core packages (must always work)
dune.3.17.0
ocamlfind.1.9.6
cmdliner.1.3.0
# Common dependencies (high fan-in)
fmt.0.9.0
logs.0.7.0
astring.0.8.5
# Complex build scenarios
js_of_ocaml.5.8.2 # ppx + js output
cohttp-eio.6.0.0 # many dependencies
irmin.3.9.0 # large package
# Documentation-heavy
odoc.2.4.3 # self-documenting
lwt.5.7.0 # extensive docs
# Known edge cases
ocaml-variants.5.2.0+ox # compiler variant
conf-pkg-config.3 # conf package
Snapshot Test Runner#
let test_real_snapshot ~snapshot_dir =
let packages = read_lines (snapshot_dir / "packages.txt") in
let expected = Expected.load (snapshot_dir / "expected.json") in
let result = Day10.batch
~opam_repository:(snapshot_dir / "opam-repository")
~packages
~output_dir:(Temp.create ())
in
List.iter2 (fun pkg exp ->
match exp.status, Result.find pkg result with
| `Success, `Success -> ()
| `Build_fail, `Build_fail _ -> ()
| `Doc_fail, `Doc_fail _ -> ()
| expected, actual ->
failf "%s: expected %s but got %s" pkg
(show_status expected) (show_status actual)
) packages expected.packages
Snapshot Maintenance#
Update snapshots:
- Quarterly (routine refresh)
- On major OCaml release (5.3, etc.)
- On major odoc release
- On significant opam-repository restructuring
Each update requires:
- Create new snapshot directory
- Run full test suite
- Update
expected.jsonwith verified outcomes - Document changes in README.md
Test Execution Strategy#
Tier 1: Fast Tests (Every Commit)#
# .github/workflows/test.yml
fast-tests:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- run: opam install . --deps-only
- run: dune build
- run: dune runtest # Unit tests
- run: ./tests/run-mini-repo-tests.sh
timeout-minutes: 5
Tier 2: Full Suite (Nightly + On-Demand)#
full-tests:
runs-on: ubuntu-latest
if: |
github.event_name == 'schedule' ||
contains(github.event.head_commit.message, '[full-tests]')
steps:
- uses: actions/checkout@v4
- run: opam install . --deps-only
- run: dune build
- run: ./tests/run-full-suite.sh
timeout-minutes: 90
Test Runner Scripts#
#!/bin/bash
# tests/run-mini-repo-tests.sh
set -e
FIXTURES_DIR="tests/fixtures/mini-repos"
WORK_DIR=$(mktemp -d)
trap "rm -rf $WORK_DIR" EXIT
for fixture in "$FIXTURES_DIR"/*/; do
name=$(basename "$fixture")
echo "=== Testing: $name ==="
./_build/install/default/bin/day10 batch \
--opam-repository "$fixture/opam-repository" \
--output-dir "$WORK_DIR/$name/output" \
--cache-dir "$WORK_DIR/$name/cache" \
"$fixture/packages.txt"
./tests/verify-output.exe "$fixture/expected.json" "$WORK_DIR/$name/output"
done
echo "All mini-repo tests passed!"
Triggering Full Tests#
Three methods:
- Nightly cron - Automatic at 2 AM UTC
- Commit message - Include
[full-tests]in commit message - Manual dispatch - GitHub Actions workflow_dispatch button
Summary#
| Component | Purpose | Tier |
|---|---|---|
| Unit tests | Pure logic (solver, swap, notifications) | Fast |
| Mini-repo fixtures | Integration with controlled packages | Fast |
| Notification mocks | Verify Zulip integration | Fast |
| Output validation | Verify HTML structure | Fast |
| Fault injection | OOM, disk, timeout handling | Full |
| Real snapshots | Compatibility with real packages | Full |
Key Design Decisions#
- Custom test harness over alcotest/cram - better control for container-based testing
- Mini repos + real snapshots - fast iteration plus real-world confidence
- Tiered execution - 2-minute fast tests, 30-60 minute full suite
- Container-based fault injection - realistic resource limit testing
- Abstracted notifications - clean testing without mock HTTP servers
- Structure-only output validation - verify files exist without content diffing
Implementation Priority#
- Test harness infrastructure - Custom runner, fixture loading
- Mini-repo fixtures - Start with simple-success, dependency-chain
- Output validation - File existence checks
- Notification mocks - Abstract notifier interface
- Fault injection - Container resource limits
- Real snapshots - Create first baseline snapshot