Kieran's opinionated (and probably slightly dumb) nix config

feat: add template workflow

dunkirk.sh 19cd4823 86560cce

verified
+218 -8
+144
.github/workflows/deploy-service.yml
··· 1 + name: Deploy Service (reusable) 2 + 3 + on: 4 + workflow_call: 5 + inputs: 6 + service: 7 + required: true 8 + type: string 9 + description: "Service name (matches atelier.services.<name>)" 10 + host: 11 + required: false 12 + type: string 13 + default: terebithia 14 + description: "Tailscale hostname to deploy to" 15 + health_url: 16 + required: false 17 + type: string 18 + description: "URL to check after deploy (omit to skip health check)" 19 + branch: 20 + required: false 21 + type: string 22 + default: main 23 + db_path: 24 + required: false 25 + type: string 26 + description: "SQLite DB path for pre-deploy snapshot (e.g. /var/lib/cachet/data/cachet.db)" 27 + secrets: 28 + TS_OAUTH_CLIENT_ID: 29 + required: true 30 + TS_OAUTH_SECRET: 31 + required: true 32 + 33 + jobs: 34 + deploy: 35 + runs-on: ubuntu-latest 36 + environment: 37 + name: production 38 + url: ${{ inputs.health_url }} 39 + 40 + concurrency: 41 + group: deploy-${{ inputs.service }} 42 + cancel-in-progress: false 43 + 44 + steps: 45 + - name: Setup Tailscale 46 + uses: tailscale/github-action@v3 47 + with: 48 + oauth-client-id: ${{ secrets.TS_OAUTH_CLIENT_ID }} 49 + oauth-secret: ${{ secrets.TS_OAUTH_SECRET }} 50 + tags: tag:deploy 51 + use-cache: "true" 52 + 53 + - name: Configure SSH 54 + run: | 55 + mkdir -p ~/.ssh 56 + echo "StrictHostKeyChecking accept-new" >> ~/.ssh/config 57 + 58 + - name: Deploy 59 + run: | 60 + ssh ${{ inputs.service }}@${{ inputs.host }} << 'EOF' 61 + set -e 62 + cd ~/app 63 + 64 + git fetch --all 65 + git rev-parse HEAD > /tmp/${{ inputs.service }}-prev-commit 66 + 67 + # snapshot SQLite DB before any changes 68 + DB_PATH="${{ inputs.db_path }}" 69 + if [ -n "$DB_PATH" ] && [ -f "$DB_PATH" ]; then 70 + echo ":: snapshotting $DB_PATH" 71 + cp "$DB_PATH" "$DB_PATH.pre-deploy" 72 + fi 73 + 74 + git reset --hard origin/${{ inputs.branch }} 75 + bun install --frozen-lockfile 76 + sudo /run/current-system/sw/bin/systemctl restart ${{ inputs.service }}.service 77 + EOF 78 + 79 + - name: Health check 80 + if: inputs.health_url != '' 81 + run: | 82 + for i in $(seq 1 12); do 83 + echo ":: attempt $i/12" 84 + HTTP_CODE=$(curl -sf -o /dev/null -w "%{http_code}" "${{ inputs.health_url }}" 2>/dev/null || echo "000") 85 + 86 + if [ "$HTTP_CODE" = "200" ]; then 87 + echo ":: ${{ inputs.service }} is healthy" 88 + exit 0 89 + fi 90 + 91 + echo ":: HTTP $HTTP_CODE — retrying in 5s" 92 + [ $i -lt 12 ] && sleep 5 93 + done 94 + echo ":: health check failed after 60s" 95 + exit 1 96 + 97 + - name: Check systemd status 98 + if: inputs.health_url == '' 99 + run: | 100 + for i in $(seq 1 6); do 101 + echo ":: attempt $i/6" 102 + STATUS=$(ssh ${{ inputs.service }}@${{ inputs.host }} \ 103 + "systemctl is-active ${{ inputs.service }}.service" 2>/dev/null || echo "unknown") 104 + 105 + if [ "$STATUS" = "active" ]; then 106 + echo ":: ${{ inputs.service }} is active" 107 + exit 0 108 + fi 109 + 110 + echo ":: status: $STATUS — retrying in 5s" 111 + [ $i -lt 6 ] && sleep 5 112 + done 113 + echo ":: service not active after 30s" 114 + exit 1 115 + 116 + - name: Rollback on failure 117 + if: failure() 118 + run: | 119 + ssh ${{ inputs.service }}@${{ inputs.host }} << 'EOF' 120 + set -e 121 + cd ~/app 122 + 123 + PREV=$(cat /tmp/${{ inputs.service }}-prev-commit 2>/dev/null || echo "") 124 + if [ -z "$PREV" ]; then 125 + echo ":: no previous commit recorded, cannot rollback" 126 + exit 1 127 + fi 128 + 129 + echo ":: rolling back to $PREV" 130 + 131 + # restore DB snapshot if one exists 132 + DB_PATH="${{ inputs.db_path }}" 133 + if [ -n "$DB_PATH" ] && [ -f "$DB_PATH.pre-deploy" ]; then 134 + echo ":: restoring DB snapshot" 135 + sudo /run/current-system/sw/bin/systemctl stop ${{ inputs.service }}.service || true 136 + cp "$DB_PATH.pre-deploy" "$DB_PATH" 137 + fi 138 + 139 + git reset --hard "$PREV" 140 + bun install --frozen-lockfile 141 + sudo /run/current-system/sw/bin/systemctl restart ${{ inputs.service }}.service 142 + 143 + echo ":: rolled back ${{ inputs.service }} to $PREV" 144 + EOF
+69 -1
README.md
··· 246 246 atuin sync 247 247 ``` 248 248 249 + ## Deployment 250 + 251 + Two deploy paths: **infrastructure** (NixOS config changes in this repo) and **application code** (per-service repos). 252 + 253 + ### Infrastructure 254 + 255 + Pushing to `main` here triggers `.github/workflows/deploy.yaml` which runs `deploy-rs` over Tailscale to rebuild NixOS on the target machine. 256 + 257 + ```sh 258 + # manual deploy 259 + nix run 'github:serokell/deploy-rs' -- --remote-build --ssh-user kierank . 260 + ``` 261 + 262 + ### Application code 263 + 264 + Each service repo has a minimal workflow calling the reusable `.github/workflows/deploy-service.yml`. On push to `main`: 265 + 266 + 1. Connects to Tailscale (`tag:deploy`) 267 + 2. SSHes as the **service user** (e.g., `cachet@terebithia`) via Tailscale SSH 268 + 3. Snapshots the SQLite DB (if `db_path` is provided) 269 + 4. `git pull` + `bun install --frozen-lockfile` + `sudo systemctl restart` 270 + 5. Health check (HTTP URL or systemd status fallback) 271 + 6. Auto-rollback on failure (restores DB snapshot + reverts to previous commit) 272 + 273 + Per-app workflow — copy and change the `with:` values: 274 + 275 + ```yaml 276 + name: Deploy 277 + on: 278 + push: 279 + branches: [main] 280 + workflow_dispatch: 281 + jobs: 282 + deploy: 283 + uses: taciturnaxolotl/dots/.github/workflows/deploy-service.yml@main 284 + with: 285 + service: cachet 286 + health_url: https://cachet.dunkirk.sh/health 287 + db_path: /var/lib/cachet/data/cachet.db 288 + secrets: 289 + TS_OAUTH_CLIENT_ID: ${{ secrets.TS_OAUTH_CLIENT_ID }} 290 + TS_OAUTH_SECRET: ${{ secrets.TS_OAUTH_SECRET }} 291 + ``` 292 + 293 + Omit `health_url` to fall back to `systemctl is-active`. Omit `db_path` for stateless services. 294 + 295 + ### mkService 296 + 297 + `modules/lib/mkService.nix` standardizes service modules. A call to `mkService { ... }` provides: 298 + 299 + - Systemd service with initial git clone (subsequent deploys via GitHub Actions) 300 + - Caddy reverse proxy with TLS via Cloudflare DNS and optional rate limiting 301 + - Data declarations (`sqlite`, `postgres`, `files`) that feed into automatic backups 302 + - Dedicated system user with sudo for restart/stop/start (enables per-user Tailscale ACLs) 303 + - Port conflict detection, security hardening, agenix secrets 304 + 305 + Adding a new service: create a module in `modules/nixos/services/`, enable it in `machines/terebithia/default.nix`, and add a deploy workflow to the app repo. See `modules/nixos/services/cachet.nix` for a minimal example. 306 + 307 + ### Secrets (agenix) 308 + 309 + Secrets are encrypted in `secrets/*.age` and declared in `secrets/secrets.nix`. Referenced as `config.age.secrets.<name>.path` — decrypted at activation time to `/run/agenix/`. 310 + 311 + ```sh 312 + cd secrets && agenix -e myapp.age # create/edit a secret 313 + ``` 314 + 249 315 ## Backups 250 316 251 - Services are automatically backed up nightly using restic to Backblaze B2. The `atelier-backup` CLI provides an interactive TUI for managing backups: 317 + Services are automatically backed up nightly using restic to Backblaze B2. Backup targets are auto-discovered from `data.sqlite`/`data.postgres`/`data.files` declarations in mkService modules. 318 + 319 + The `atelier-backup` CLI provides an interactive TUI for managing backups: 252 320 253 321 ```bash 254 322 sudo atelier-backup # Interactive menu
+5 -7
modules/lib/mkService.nix
··· 197 197 198 198 users.groups.${name} = {}; 199 199 200 - # Allow service user to restart their own service 200 + # Allow service user to manage their own service (for CI/CD deploys) 201 201 security.sudo.extraRules = [ 202 202 { 203 203 users = [ name ]; 204 - commands = [ 205 - { 206 - command = "/run/current-system/sw/bin/systemctl restart ${name}.service"; 207 - options = [ "NOPASSWD" ]; 208 - } 209 - ]; 204 + commands = map (cmd: { 205 + command = "/run/current-system/sw/bin/systemctl ${cmd} ${name}.service"; 206 + options = [ "NOPASSWD" ]; 207 + }) [ "restart" "stop" "start" "status" ]; 210 208 } 211 209 ]; 212 210