restructure: indigo and zlay as peer just modules

+7 -8

.gitignore

··· 1 1 # terraform 2 - infra/**/.terraform/ 3 - infra/**/.terraform.lock.hcl 4 - infra/**/terraform.tfstate 5 - infra/**/terraform.tfstate.backup 6 - infra/**/*.tfvars 7 - !infra/terraform.tfvars.example 2 + **/infra/**/.terraform/ 3 + **/infra/**/.terraform.lock.hcl 4 + **/infra/**/terraform.tfstate 5 + **/infra/**/terraform.tfstate.backup 6 + **/infra/**/*.tfvars 7 + !**/infra/terraform.tfvars.example 8 8 9 9 # kubeconfig (fetched from server) 10 - kubeconfig.yaml 11 - zlay-kubeconfig.yaml 10 + **/kubeconfig.yaml 12 11 13 12 # secrets 14 13 *.secret

+15 -4

README.md

··· 59 59 60 60 ``` 61 61 . 62 + ├── indigo/ # Go relay (indigo) — justfile, deploy configs, terraform 63 + ├── zlay/ # zig relay (zlay) — justfile, deploy configs, terraform 64 + ├── shared/deploy/ # helm values shared by both deployments 65 + ├── scripts/ # uv scripts — firehose, jetstream, backfill 62 66 ├── docs/ # architecture, deployment guide, backfill 63 - ├── scripts/ # uv scripts — firehose, jetstream, backfill 64 - ├── justfile # all commands: deploy, status, logs, backfill, etc. 65 - ├── infra/ # terraform — hetzner server + k3s 66 - └── deploy/ # helm values + k8s manifests 67 + └── justfile # root — `just indigo <recipe>` / `just zlay <recipe>` 68 + ``` 69 + 70 + each relay is a `just` module with symmetric recipes: 71 + 72 + ```bash 73 + just indigo deploy # deploy Go relay 74 + just indigo status # check Go relay pods 75 + just zlay deploy # deploy zig relay 76 + just zlay status # check zig relay pods 77 + just --list # see all available recipes 67 78 ``` 68 79 69 80 ## why

deploy/cluster-issuer.yaml shared/deploy/cluster-issuer.yaml

deploy/collectiondir-servicemonitor.yaml indigo/deploy/collectiondir-servicemonitor.yaml

deploy/collectiondir-values.yaml indigo/deploy/collectiondir-values.yaml

deploy/grafana-ingress.yaml shared/deploy/grafana-ingress.yaml

deploy/ingress.yaml indigo/deploy/ingress.yaml

deploy/jetstream-ingress.yaml indigo/deploy/jetstream-ingress.yaml

deploy/jetstream-servicemonitor.yaml indigo/deploy/jetstream-servicemonitor.yaml

deploy/jetstream-values.yaml indigo/deploy/jetstream-values.yaml

deploy/monitoring-values.yaml indigo/deploy/monitoring-values.yaml

deploy/postgres-values.yaml shared/deploy/postgres-values.yaml

deploy/reconnect-cronjob.yaml indigo/deploy/reconnect-cronjob.yaml

deploy/relay-dashboard.json indigo/deploy/relay-dashboard.json

deploy/relay-servicemonitor.yaml indigo/deploy/relay-servicemonitor.yaml

deploy/relay-values.yaml indigo/deploy/relay-values.yaml

deploy/zlay-dashboard.json zlay/deploy/zlay-dashboard.json

deploy/zlay-ingress.yaml zlay/deploy/zlay-ingress.yaml

deploy/zlay-monitoring-values.yaml zlay/deploy/zlay-monitoring-values.yaml

deploy/zlay-reconnect-cronjob.yaml zlay/deploy/zlay-reconnect-cronjob.yaml

deploy/zlay-servicemonitor.yaml zlay/deploy/zlay-servicemonitor.yaml

deploy/zlay-values.yaml zlay/deploy/zlay-values.yaml

+9 -9

docs/architecture.md

··· 45 45 46 46 relays try to reconnect to PDS hosts when connections drop, but eventually give up after repeated failures (exponential backoff). PDS hosts re-announce themselves to bluesky's relay when they come back online, but not to third-party relays like ours. this causes a natural decay in connected host count over time. 47 47 48 - fix: a k8s CronJob (`deploy/reconnect-cronjob.yaml`) runs every 4 hours, fetching the [community PDS list](https://github.com/mary-ext/atproto-scraping) and sending `requestCrawl` for each host. this can also be run manually via `just reconnect`. 48 + fix: a k8s CronJob (`indigo/deploy/reconnect-cronjob.yaml`) runs every 4 hours, fetching the [community PDS list](https://github.com/mary-ext/atproto-scraping) and sending `requestCrawl` for each host. this can also be run manually via `just indigo reconnect`. 49 49 50 50 ## steady-state specs (indigo relay) 51 51 ··· 80 80 81 81 ### deployment 82 82 83 - separate Hetzner cpx41 in Hillsboro OR (`hil`), independent k3s cluster. all `zlay-*` justfile recipes use `ZLAY_KUBECONFIG`. terraform in `infra/zlay/`. 83 + separate Hetzner cpx41 in Hillsboro OR (`hil`), independent k3s cluster. terraform in `zlay/infra/`. 84 84 85 85 ```bash 86 - just zlay-init # terraform init 87 - just zlay-infra # create server 88 - just zlay-kubeconfig # pull kubeconfig 89 - just zlay-deploy # full deploy (cert-manager, postgres, relay, monitoring) 90 - just zlay-publish # build and push image 91 - just zlay-status # check pods + health 92 - just zlay-logs # tail logs 86 + just zlay init # terraform init 87 + just zlay infra # create server 88 + just zlay kubeconfig # pull kubeconfig 89 + just zlay deploy # full deploy (cert-manager, postgres, relay, monitoring) 90 + just zlay publish-remote # build and push image 91 + just zlay status # check pods + health 92 + just zlay logs # tail logs 93 93 ``` 94 94 95 95 ### collection index backfill

+3 -3

docs/backfill.md

··· 20 20 21 21 ## running the backfill 22 22 23 - the `just backfill` recipe handles port-forwarding and host list extraction: 23 + the `just indigo backfill` recipe handles port-forwarding and host list extraction: 24 24 25 25 ```bash 26 26 # backfill all connected hosts (extracts list from relay automatically) 27 - just backfill 27 + just indigo backfill 28 28 29 29 # backfill from a specific host list with custom batch size 30 - just backfill --hosts /tmp/bsky-shards.txt --batch-size 10 30 + just indigo backfill --hosts /tmp/bsky-shards.txt --batch-size 10 31 31 ``` 32 32 33 33 or run the script directly (requires a port-forward to the collectiondir):

+18 -18

docs/deploying.md

··· 25 25 ```bash 26 26 source .env 27 27 28 - just init # terraform init 29 - just infra # creates a CPX41 in Ashburn (~$30/mo) with k3s via cloud-init 30 - just kubeconfig # waits for k3s, pulls kubeconfig (~2 min) 31 - just deploy # installs cert-manager, postgresql, relay, jetstream, monitoring 28 + just indigo init # terraform init 29 + just indigo infra # creates a CPX41 in Ashburn (~$30/mo) with k3s via cloud-init 30 + just indigo kubeconfig # waits for k3s, pulls kubeconfig (~2 min) 31 + just indigo deploy # installs cert-manager, postgresql, relay, jetstream, monitoring 32 32 ``` 33 33 34 - point a DNS A record at the server IP (`just server-ip`) before running deploy, so the Let's Encrypt HTTP-01 challenge succeeds. 34 + point a DNS A record at the server IP (`just indigo server-ip`) before running deploy, so the Let's Encrypt HTTP-01 challenge succeeds. 35 35 36 36 after deploy, seed the relay with the network's PDS hosts: 37 37 38 38 ```bash 39 - just bootstrap # pulls hosts from upstream + restarts relay so slurper picks them up 39 + just indigo bootstrap # pulls hosts from upstream + restarts relay so slurper picks them up 40 40 ``` 41 41 42 42 ## available commands 43 43 44 44 ```bash 45 - just status # nodes, pods, health check 46 - just logs # tail relay logs 47 - just health # curl the public health endpoint 48 - just reconnect # re-announce all known PDS hosts to the relay 49 - just backfill # backfill collectiondir with full network data 50 - just firehose # consume the firehose (passes args through) 51 - just jetstream # consume the jetstream (passes args through) 52 - just ssh # ssh into the server 53 - just destroy # tear down everything 45 + just indigo status # nodes, pods, health check 46 + just indigo logs # tail relay logs 47 + just indigo health # curl the public health endpoint 48 + just indigo reconnect # re-announce all known PDS hosts to the relay 49 + just indigo backfill # backfill collectiondir with full network data 50 + just indigo firehose # consume the firehose (passes args through) 51 + just indigo jetstream # consume the jetstream (passes args through) 52 + just indigo ssh # ssh into the server 53 + just indigo destroy # tear down everything 54 54 ``` 55 55 56 56 ## maintenance 57 57 58 - a k8s CronJob (`deploy/reconnect-cronjob.yaml`) runs every 4 hours to re-announce PDS hosts to the relay — see [architecture](architecture.md#pds-connection-maintenance) for why this is needed. `just reconnect` runs the same logic manually. 58 + a k8s CronJob (`indigo/deploy/reconnect-cronjob.yaml`) runs every 4 hours to re-announce PDS hosts to the relay — see [architecture](architecture.md#pds-connection-maintenance) for why this is needed. `just indigo reconnect` runs the same logic manually. 59 59 60 60 ## targeted deployments 61 61 62 - `just deploy` deploys everything. for targeted updates: 62 + `just indigo deploy` deploys everything. for targeted updates: 63 63 64 - - `just deploy-monitoring` — only the monitoring stack (prometheus, grafana, dashboards, ServiceMonitors). useful for dashboard changes or prometheus config tweaks without touching the relay. 64 + - `just indigo deploy-monitoring` — only the monitoring stack (prometheus, grafana, dashboards, ServiceMonitors). useful for dashboard changes or prometheus config tweaks without touching the relay.

+301

indigo/justfile

··· 1 + # indigo (Go) relay deployment 2 + # required env vars: HCLOUD_TOKEN, RELAY_DOMAIN, RELAY_ADMIN_PASSWORD, POSTGRES_PASSWORD, LETSENCRYPT_EMAIL 3 + # optional env vars: GRAFANA_DOMAIN (default: relay-metrics.waow.tech), GRAFANA_ADMIN_PASSWORD, JETSTREAM_DOMAIN (default: jetstream.waow.tech) 4 + 5 + export KUBECONFIG := source_directory() / "kubeconfig.yaml" 6 + 7 + # --- infrastructure --- 8 + 9 + # initialize terraform 10 + init: 11 + terraform -chdir=infra init 12 + 13 + # create the hetzner server with k3s 14 + infra: 15 + terraform -chdir=infra apply -var="hcloud_token=$HCLOUD_TOKEN" 16 + 17 + # destroy all infrastructure 18 + destroy: 19 + terraform -chdir=infra destroy -var="hcloud_token=$HCLOUD_TOKEN" 20 + 21 + # get the server IP from terraform 22 + server-ip: 23 + @terraform -chdir=infra output -raw server_ip 24 + 25 + # ssh into the server 26 + ssh: 27 + ssh root@$(just server-ip) 28 + 29 + # --- cluster access --- 30 + 31 + # fetch kubeconfig from the server (run after cloud-init finishes, ~2 min) 32 + kubeconfig: 33 + #!/usr/bin/env bash 34 + set -euo pipefail 35 + IP=$(just server-ip) 36 + echo "fetching kubeconfig from $IP..." 37 + 38 + # wait for k3s to be ready 39 + until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new root@$IP test -f /run/k3s-ready 2>/dev/null; do 40 + echo " waiting for k3s..." 41 + sleep 5 42 + done 43 + 44 + scp root@$IP:/etc/rancher/k3s/k3s.yaml kubeconfig.yaml 45 + # replace localhost with public IP 46 + if [[ "$(uname)" == "Darwin" ]]; then 47 + sed -i '' "s|127.0.0.1|$IP|g" kubeconfig.yaml 48 + else 49 + sed -i "s|127.0.0.1|$IP|g" kubeconfig.yaml 50 + fi 51 + chmod 600 kubeconfig.yaml 52 + echo "kubeconfig written to kubeconfig.yaml" 53 + kubectl get nodes 54 + 55 + # --- deployment --- 56 + 57 + # deploy everything to the cluster 58 + deploy: 59 + #!/usr/bin/env bash 60 + set -euo pipefail 61 + 62 + helm repo add bjw-s https://bjw-s-labs.github.io/helm-charts 63 + helm repo add bitnami https://charts.bitnami.com/bitnami 64 + helm repo add jetstack https://charts.jetstack.io 65 + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 66 + helm repo update 67 + 68 + : "${RELAY_DOMAIN:?set RELAY_DOMAIN}" 69 + : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 70 + : "${POSTGRES_PASSWORD:?set POSTGRES_PASSWORD}" 71 + : "${LETSENCRYPT_EMAIL:?set LETSENCRYPT_EMAIL}" 72 + 73 + echo "==> creating namespace" 74 + kubectl create namespace relay --dry-run=client -o yaml | kubectl apply -f - 75 + 76 + echo "==> installing cert-manager" 77 + helm upgrade --install cert-manager jetstack/cert-manager \ 78 + --namespace cert-manager --create-namespace \ 79 + --set crds.enabled=true \ 80 + --wait 81 + 82 + echo "==> applying cluster issuer" 83 + sed "s|you@example.com|$LETSENCRYPT_EMAIL|g" ../shared/deploy/cluster-issuer.yaml \ 84 + | kubectl apply -f - 85 + 86 + echo "==> installing postgresql" 87 + helm upgrade --install relay-db bitnami/postgresql \ 88 + --namespace relay \ 89 + --values ../shared/deploy/postgres-values.yaml \ 90 + --set auth.password="$POSTGRES_PASSWORD" \ 91 + --wait 92 + 93 + echo "==> creating relay secret" 94 + kubectl create secret generic relay-secret \ 95 + --namespace relay \ 96 + --from-literal=DATABASE_URL="postgres://relay:${POSTGRES_PASSWORD}@relay-db-postgresql.relay.svc.cluster.local:5432/relay" \ 97 + --from-literal=RELAY_ADMIN_PASSWORD="$RELAY_ADMIN_PASSWORD" \ 98 + --dry-run=client -o yaml | kubectl apply -f - 99 + 100 + echo "==> installing relay" 101 + helm upgrade --install relay bjw-s/app-template \ 102 + --namespace relay \ 103 + --values deploy/relay-values.yaml \ 104 + --wait --timeout 5m 105 + 106 + echo "==> applying ingress" 107 + sed "s|RELAY_DOMAIN_PLACEHOLDER|$RELAY_DOMAIN|g" deploy/ingress.yaml \ 108 + | kubectl apply -f - 109 + 110 + GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-relay-metrics.waow.tech}" 111 + 112 + echo "==> installing monitoring stack" 113 + kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 114 + kubectl create configmap relay-dashboard \ 115 + --namespace monitoring \ 116 + --from-file=relay-dashboard.json=deploy/relay-dashboard.json \ 117 + --dry-run=client -o yaml | kubectl apply -f - 118 + helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 119 + --namespace monitoring \ 120 + --values deploy/monitoring-values.yaml \ 121 + --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 122 + --wait --timeout 5m 123 + kubectl apply -f deploy/relay-servicemonitor.yaml 124 + 125 + echo "==> applying grafana ingress" 126 + sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$GRAFANA_DOMAIN|g" ../shared/deploy/grafana-ingress.yaml \ 127 + | kubectl apply -f - 128 + 129 + echo "==> creating collectiondir secret" 130 + kubectl create secret generic collectiondir-secret \ 131 + --namespace relay \ 132 + --from-literal=COLLECTIONS_ADMIN_TOKEN="${COLLECTIONDIR_ADMIN_TOKEN:-}" \ 133 + --dry-run=client -o yaml | kubectl apply -f - 134 + 135 + echo "==> installing collectiondir" 136 + helm upgrade --install collectiondir bjw-s/app-template \ 137 + --namespace relay \ 138 + --values deploy/collectiondir-values.yaml \ 139 + --wait --timeout 5m 140 + kubectl apply -f deploy/collectiondir-servicemonitor.yaml 141 + 142 + echo "==> installing reconnect cronjob" 143 + kubectl apply -f deploy/reconnect-cronjob.yaml 144 + 145 + echo "==> installing jetstream" 146 + JETSTREAM_DOMAIN="${JETSTREAM_DOMAIN:-jetstream.waow.tech}" 147 + helm upgrade --install jetstream bjw-s/app-template \ 148 + --namespace relay \ 149 + --values deploy/jetstream-values.yaml \ 150 + --wait --timeout 5m 151 + 152 + echo "==> applying jetstream ingress" 153 + sed "s|JETSTREAM_DOMAIN_PLACEHOLDER|$JETSTREAM_DOMAIN|g" deploy/jetstream-ingress.yaml \ 154 + | kubectl apply -f - 155 + kubectl apply -f deploy/jetstream-servicemonitor.yaml 156 + 157 + echo "" 158 + echo "done. point DNS:" 159 + echo " $RELAY_DOMAIN -> $(just server-ip)" 160 + echo " $GRAFANA_DOMAIN -> $(just server-ip)" 161 + echo " $JETSTREAM_DOMAIN -> $(just server-ip)" 162 + echo "then check:" 163 + echo " curl https://$RELAY_DOMAIN/xrpc/_health" 164 + echo " curl https://$GRAFANA_DOMAIN" 165 + echo " curl https://$JETSTREAM_DOMAIN" 166 + 167 + # deploy only the monitoring stack (grafana + prometheus) 168 + deploy-monitoring: 169 + #!/usr/bin/env bash 170 + set -euo pipefail 171 + 172 + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 173 + helm repo update 174 + 175 + GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-relay-metrics.waow.tech}" 176 + 177 + echo "==> installing monitoring stack" 178 + kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 179 + kubectl create configmap relay-dashboard \ 180 + --namespace monitoring \ 181 + --from-file=relay-dashboard.json=deploy/relay-dashboard.json \ 182 + --dry-run=client -o yaml | kubectl apply -f - 183 + helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 184 + --namespace monitoring \ 185 + --values deploy/monitoring-values.yaml \ 186 + --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 187 + --wait --timeout 5m 188 + kubectl apply -f deploy/relay-servicemonitor.yaml 189 + 190 + echo "==> applying grafana ingress" 191 + sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$GRAFANA_DOMAIN|g" ../shared/deploy/grafana-ingress.yaml \ 192 + | kubectl apply -f - 193 + 194 + echo "done." 195 + 196 + # seed the relay with hosts from the network (includes restart so slurper picks them up) 197 + bootstrap: 198 + kubectl exec -n relay deploy/relay -- /relay pull-hosts --relay-host https://relay1.us-west.bsky.network 199 + kubectl rollout restart deploy/relay -n relay 200 + kubectl rollout status deploy/relay -n relay --timeout=2m 201 + 202 + # sync PDS host list from upstream (run periodically to discover new hosts) 203 + sync-hosts: 204 + kubectl exec -n relay deploy/relay -- /relay pull-hosts --relay-host https://relay1.us-west.bsky.network 205 + 206 + # --- status --- 207 + 208 + # check the state of everything 209 + status: 210 + @echo "==> nodes" 211 + @kubectl get nodes 212 + @echo "" 213 + @echo "==> pods" 214 + @kubectl get pods -n relay 215 + @echo "" 216 + @echo "==> relay health (in-cluster)" 217 + @kubectl exec -n relay deploy/relay -- curl -sf localhost:2470/xrpc/_health 2>/dev/null || echo "(relay not ready yet)" 218 + 219 + # tail relay logs 220 + logs: 221 + kubectl logs -n relay deploy/relay -f 222 + 223 + # check relay health via public endpoint 224 + health: 225 + #!/usr/bin/env bash 226 + : "${RELAY_DOMAIN:?set RELAY_DOMAIN}" 227 + curl -sf "https://$RELAY_DOMAIN/xrpc/_health" | jq . 228 + 229 + # get the grafana admin password from the cluster 230 + grafana-password: 231 + @kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d && echo 232 + 233 + # --- images --- 234 + 235 + # build and push collectiondir image from indigo source 236 + collectiondir-publish: 237 + #!/usr/bin/env bash 238 + set -euo pipefail 239 + TMPDIR=$(mktemp -d) 240 + trap "rm -rf $TMPDIR" EXIT 241 + git clone --depth 1 https://github.com/bluesky-social/indigo "$TMPDIR" 242 + docker build --platform linux/amd64 \ 243 + -f "$TMPDIR/cmd/collectiondir/Dockerfile" \ 244 + -t atcr.io/zzstoatzz.io/collectiondir:latest "$TMPDIR" 245 + ATCR_AUTO_AUTH=1 docker push atcr.io/zzstoatzz.io/collectiondir:latest 246 + 247 + # --- scripts --- 248 + 249 + # reconnect relay to all known PDS hosts (run periodically, e.g. every 4 hours) 250 + reconnect *args: 251 + #!/usr/bin/env bash 252 + set -euo pipefail 253 + : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 254 + ../scripts/reconnect --password "$RELAY_ADMIN_PASSWORD" {{ args }} 255 + 256 + # consume the firehose (default: 10s of bsky posts) 257 + firehose *args: 258 + ../scripts/firehose {{ args }} 259 + 260 + # consume the jetstream (default: 10s of all events) 261 + jetstream *args: 262 + ../scripts/jetstream {{ args }} 263 + 264 + # backfill collectiondir with full network PDS hosts 265 + # pass --hosts <file> to use a specific host list, otherwise extracts from relay 266 + backfill *args: 267 + #!/usr/bin/env bash 268 + set -euo pipefail 269 + : "${COLLECTIONDIR_ADMIN_TOKEN:?set COLLECTIONDIR_ADMIN_TOKEN}" 270 + 271 + PIDS=() 272 + cleanup() { kill "${PIDS[@]}" 2>/dev/null; } 273 + trap cleanup EXIT 274 + 275 + # port-forward to collectiondir 276 + kubectl port-forward -n relay svc/collectiondir 2510:2510 >/dev/null 2>&1 & 277 + PIDS+=($!) 278 + 279 + EXTRA_ARGS=({{ args }}) 280 + 281 + # if --hosts not provided, extract from relay 282 + if ! printf '%s\n' "${EXTRA_ARGS[@]}" | grep -q '^--hosts$'; then 283 + : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 284 + kubectl port-forward -n relay svc/relay 12470:2470 >/dev/null 2>&1 & 285 + PIDS+=($!) 286 + sleep 3 287 + 288 + echo "==> extracting connected PDS host list from relay" 289 + curl -sf -u "admin:$RELAY_ADMIN_PASSWORD" http://localhost:12470/admin/pds/list \ 290 + | jq -r '.[] | select(.HasActiveConnection) | .Host' > /tmp/relay-hosts.txt 291 + TOTAL=$(curl -sf -u "admin:$RELAY_ADMIN_PASSWORD" http://localhost:12470/admin/pds/list | jq 'length') 292 + echo " $(wc -l < /tmp/relay-hosts.txt | tr -d ' ') connected hosts (of $TOTAL total)" 293 + echo 294 + EXTRA_ARGS+=(--hosts /tmp/relay-hosts.txt) 295 + else 296 + sleep 3 297 + fi 298 + 299 + ../scripts/backfill \ 300 + --token "$COLLECTIONDIR_ADMIN_TOKEN" \ 301 + "${EXTRA_ARGS[@]}"

infra/main.tf indigo/infra/main.tf

infra/outputs.tf indigo/infra/outputs.tf

infra/variables.tf indigo/infra/variables.tf

infra/versions.tf indigo/infra/versions.tf

infra/zlay/main.tf zlay/infra/main.tf

infra/zlay/outputs.tf zlay/infra/outputs.tf

infra/zlay/variables.tf zlay/infra/variables.tf

infra/zlay/versions.tf zlay/infra/versions.tf

+4 -478

justfile

··· 1 1 # ATProto relay deployment 2 - # required env vars: HCLOUD_TOKEN, RELAY_DOMAIN, RELAY_ADMIN_PASSWORD, POSTGRES_PASSWORD, LETSENCRYPT_EMAIL 3 - # optional env vars: GRAFANA_DOMAIN (default: relay-metrics.waow.tech), GRAFANA_ADMIN_PASSWORD, JETSTREAM_DOMAIN (default: jetstream.waow.tech) 4 - # zlay env vars: ZLAY_DOMAIN, ZLAY_ADMIN_PASSWORD, ZLAY_POSTGRES_PASSWORD, LETSENCRYPT_EMAIL 2 + # usage: just indigo <recipe> | just zlay <recipe> 5 3 6 - set dotenv-load 4 + set dotenv-load := true 7 5 8 - export KUBECONFIG := justfile_directory() / "kubeconfig.yaml" 6 + mod indigo 7 + mod zlay 9 8 10 9 # show available recipes 11 10 default: 12 11 @just --list 13 12 14 - # --- infrastructure --- 15 - 16 - # initialize terraform 17 - init: 18 - terraform -chdir=infra init 19 - 20 - # create the hetzner server with k3s 21 - infra: 22 - terraform -chdir=infra apply -var="hcloud_token=$HCLOUD_TOKEN" 23 - 24 - # destroy all infrastructure 25 - destroy: 26 - terraform -chdir=infra destroy -var="hcloud_token=$HCLOUD_TOKEN" 27 - 28 - # get the server IP from terraform 29 - server-ip: 30 - @terraform -chdir=infra output -raw server_ip 31 - 32 - # ssh into the server 33 - ssh: 34 - ssh root@$(just server-ip) 35 - 36 - # --- cluster access --- 37 - 38 - # fetch kubeconfig from the server (run after cloud-init finishes, ~2 min) 39 - kubeconfig: 40 - #!/usr/bin/env bash 41 - set -euo pipefail 42 - IP=$(just server-ip) 43 - echo "fetching kubeconfig from $IP..." 44 - 45 - # wait for k3s to be ready 46 - until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new root@$IP test -f /run/k3s-ready 2>/dev/null; do 47 - echo " waiting for k3s..." 48 - sleep 5 49 - done 50 - 51 - scp root@$IP:/etc/rancher/k3s/k3s.yaml kubeconfig.yaml 52 - # replace localhost with public IP 53 - if [[ "$(uname)" == "Darwin" ]]; then 54 - sed -i '' "s|127.0.0.1|$IP|g" kubeconfig.yaml 55 - else 56 - sed -i "s|127.0.0.1|$IP|g" kubeconfig.yaml 57 - fi 58 - chmod 600 kubeconfig.yaml 59 - echo "kubeconfig written to kubeconfig.yaml" 60 - kubectl get nodes 61 - 62 - # --- deployment --- 63 - 64 13 # add required helm repos 65 14 helm-repos: 66 15 helm repo add bjw-s https://bjw-s-labs.github.io/helm-charts ··· 68 17 helm repo add jetstack https://charts.jetstack.io 69 18 helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 70 19 helm repo update 71 - 72 - # deploy everything to the cluster 73 - deploy: helm-repos 74 - #!/usr/bin/env bash 75 - set -euo pipefail 76 - 77 - : "${RELAY_DOMAIN:?set RELAY_DOMAIN}" 78 - : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 79 - : "${POSTGRES_PASSWORD:?set POSTGRES_PASSWORD}" 80 - : "${LETSENCRYPT_EMAIL:?set LETSENCRYPT_EMAIL}" 81 - 82 - echo "==> creating namespace" 83 - kubectl create namespace relay --dry-run=client -o yaml | kubectl apply -f - 84 - 85 - echo "==> installing cert-manager" 86 - helm upgrade --install cert-manager jetstack/cert-manager \ 87 - --namespace cert-manager --create-namespace \ 88 - --set crds.enabled=true \ 89 - --wait 90 - 91 - echo "==> applying cluster issuer" 92 - sed "s|you@example.com|$LETSENCRYPT_EMAIL|g" deploy/cluster-issuer.yaml \ 93 - | kubectl apply -f - 94 - 95 - echo "==> installing postgresql" 96 - helm upgrade --install relay-db bitnami/postgresql \ 97 - --namespace relay \ 98 - --values deploy/postgres-values.yaml \ 99 - --set auth.password="$POSTGRES_PASSWORD" \ 100 - --wait 101 - 102 - echo "==> creating relay secret" 103 - kubectl create secret generic relay-secret \ 104 - --namespace relay \ 105 - --from-literal=DATABASE_URL="postgres://relay:${POSTGRES_PASSWORD}@relay-db-postgresql.relay.svc.cluster.local:5432/relay" \ 106 - --from-literal=RELAY_ADMIN_PASSWORD="$RELAY_ADMIN_PASSWORD" \ 107 - --dry-run=client -o yaml | kubectl apply -f - 108 - 109 - echo "==> installing relay" 110 - helm upgrade --install relay bjw-s/app-template \ 111 - --namespace relay \ 112 - --values deploy/relay-values.yaml \ 113 - --wait --timeout 5m 114 - 115 - echo "==> applying ingress" 116 - sed "s|RELAY_DOMAIN_PLACEHOLDER|$RELAY_DOMAIN|g" deploy/ingress.yaml \ 117 - | kubectl apply -f - 118 - 119 - GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-relay-metrics.waow.tech}" 120 - 121 - echo "==> installing monitoring stack" 122 - kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 123 - kubectl create configmap relay-dashboard \ 124 - --namespace monitoring \ 125 - --from-file=relay-dashboard.json=deploy/relay-dashboard.json \ 126 - --dry-run=client -o yaml | kubectl apply -f - 127 - helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 128 - --namespace monitoring \ 129 - --values deploy/monitoring-values.yaml \ 130 - --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 131 - --wait --timeout 5m 132 - kubectl apply -f deploy/relay-servicemonitor.yaml 133 - 134 - echo "==> applying grafana ingress" 135 - sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$GRAFANA_DOMAIN|g" deploy/grafana-ingress.yaml \ 136 - | kubectl apply -f - 137 - 138 - echo "==> creating collectiondir secret" 139 - kubectl create secret generic collectiondir-secret \ 140 - --namespace relay \ 141 - --from-literal=COLLECTIONS_ADMIN_TOKEN="${COLLECTIONDIR_ADMIN_TOKEN:-}" \ 142 - --dry-run=client -o yaml | kubectl apply -f - 143 - 144 - echo "==> installing collectiondir" 145 - helm upgrade --install collectiondir bjw-s/app-template \ 146 - --namespace relay \ 147 - --values deploy/collectiondir-values.yaml \ 148 - --wait --timeout 5m 149 - kubectl apply -f deploy/collectiondir-servicemonitor.yaml 150 - 151 - echo "==> installing reconnect cronjob" 152 - kubectl apply -f deploy/reconnect-cronjob.yaml 153 - 154 - echo "==> installing jetstream" 155 - JETSTREAM_DOMAIN="${JETSTREAM_DOMAIN:-jetstream.waow.tech}" 156 - helm upgrade --install jetstream bjw-s/app-template \ 157 - --namespace relay \ 158 - --values deploy/jetstream-values.yaml \ 159 - --wait --timeout 5m 160 - 161 - echo "==> applying jetstream ingress" 162 - sed "s|JETSTREAM_DOMAIN_PLACEHOLDER|$JETSTREAM_DOMAIN|g" deploy/jetstream-ingress.yaml \ 163 - | kubectl apply -f - 164 - kubectl apply -f deploy/jetstream-servicemonitor.yaml 165 - 166 - echo "" 167 - echo "done. point DNS:" 168 - echo " $RELAY_DOMAIN -> $(just server-ip)" 169 - echo " $GRAFANA_DOMAIN -> $(just server-ip)" 170 - echo " $JETSTREAM_DOMAIN -> $(just server-ip)" 171 - echo "then check:" 172 - echo " curl https://$RELAY_DOMAIN/xrpc/_health" 173 - echo " curl https://$GRAFANA_DOMAIN" 174 - echo " curl https://$JETSTREAM_DOMAIN" 175 - 176 - # deploy only the monitoring stack (grafana + prometheus) 177 - deploy-monitoring: helm-repos 178 - #!/usr/bin/env bash 179 - set -euo pipefail 180 - 181 - GRAFANA_DOMAIN="${GRAFANA_DOMAIN:-relay-metrics.waow.tech}" 182 - 183 - echo "==> installing monitoring stack" 184 - kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 185 - kubectl create configmap relay-dashboard \ 186 - --namespace monitoring \ 187 - --from-file=relay-dashboard.json=deploy/relay-dashboard.json \ 188 - --dry-run=client -o yaml | kubectl apply -f - 189 - helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 190 - --namespace monitoring \ 191 - --values deploy/monitoring-values.yaml \ 192 - --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 193 - --wait --timeout 5m 194 - kubectl apply -f deploy/relay-servicemonitor.yaml 195 - 196 - echo "==> applying grafana ingress" 197 - sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$GRAFANA_DOMAIN|g" deploy/grafana-ingress.yaml \ 198 - | kubectl apply -f - 199 - 200 - echo "done." 201 - 202 - # seed the relay with hosts from the network (includes restart so slurper picks them up) 203 - bootstrap: 204 - kubectl exec -n relay deploy/relay -- /relay pull-hosts --relay-host https://relay1.us-west.bsky.network 205 - kubectl rollout restart deploy/relay -n relay 206 - kubectl rollout status deploy/relay -n relay --timeout=2m 207 - 208 - # sync PDS host list from upstream (run periodically to discover new hosts) 209 - sync-hosts: 210 - kubectl exec -n relay deploy/relay -- /relay pull-hosts --relay-host https://relay1.us-west.bsky.network 211 - 212 - # --- status --- 213 - 214 - # check the state of everything 215 - status: 216 - @echo "==> nodes" 217 - @kubectl get nodes 218 - @echo "" 219 - @echo "==> pods" 220 - @kubectl get pods -n relay 221 - @echo "" 222 - @echo "==> relay health (in-cluster)" 223 - @kubectl exec -n relay deploy/relay -- curl -sf localhost:2470/xrpc/_health 2>/dev/null || echo "(relay not ready yet)" 224 - 225 - # tail relay logs 226 - logs: 227 - kubectl logs -n relay deploy/relay -f 228 - 229 - # check relay health via public endpoint 230 - health: 231 - #!/usr/bin/env bash 232 - : "${RELAY_DOMAIN:?set RELAY_DOMAIN}" 233 - curl -sf "https://$RELAY_DOMAIN/xrpc/_health" | jq . 234 - 235 - # get the grafana admin password from the cluster 236 - grafana-password: 237 - @kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d && echo 238 - 239 - # --- images --- 240 - 241 - # build and push collectiondir image from indigo source 242 - collectiondir-publish: 243 - #!/usr/bin/env bash 244 - set -euo pipefail 245 - TMPDIR=$(mktemp -d) 246 - trap "rm -rf $TMPDIR" EXIT 247 - git clone --depth 1 https://github.com/bluesky-social/indigo "$TMPDIR" 248 - docker build --platform linux/amd64 \ 249 - -f "$TMPDIR/cmd/collectiondir/Dockerfile" \ 250 - -t atcr.io/zzstoatzz.io/collectiondir:latest "$TMPDIR" 251 - ATCR_AUTO_AUTH=1 docker push atcr.io/zzstoatzz.io/collectiondir:latest 252 - 253 - # --- scripts --- 254 - 255 - # reconnect relay to all known PDS hosts (run periodically, e.g. every 4 hours) 256 - reconnect *args: 257 - #!/usr/bin/env bash 258 - set -euo pipefail 259 - : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 260 - ./scripts/reconnect --password "$RELAY_ADMIN_PASSWORD" {{ args }} 261 - 262 - # consume the firehose (default: 10s of bsky posts) 263 - firehose *args: 264 - ./scripts/firehose {{ args }} 265 - 266 - # consume the jetstream (default: 10s of all events) 267 - jetstream *args: 268 - ./scripts/jetstream {{ args }} 269 - 270 - # backfill collectiondir with full network PDS hosts 271 - # pass --hosts <file> to use a specific host list, otherwise extracts from relay 272 - backfill *args: 273 - #!/usr/bin/env bash 274 - set -euo pipefail 275 - : "${COLLECTIONDIR_ADMIN_TOKEN:?set COLLECTIONDIR_ADMIN_TOKEN}" 276 - 277 - PIDS=() 278 - cleanup() { kill "${PIDS[@]}" 2>/dev/null; } 279 - trap cleanup EXIT 280 - 281 - # port-forward to collectiondir 282 - kubectl port-forward -n relay svc/collectiondir 2510:2510 >/dev/null 2>&1 & 283 - PIDS+=($!) 284 - 285 - EXTRA_ARGS=({{ args }}) 286 - 287 - # if --hosts not provided, extract from relay 288 - if ! printf '%s\n' "${EXTRA_ARGS[@]}" | grep -q '^--hosts$'; then 289 - : "${RELAY_ADMIN_PASSWORD:?set RELAY_ADMIN_PASSWORD}" 290 - kubectl port-forward -n relay svc/relay 12470:2470 >/dev/null 2>&1 & 291 - PIDS+=($!) 292 - sleep 3 293 - 294 - echo "==> extracting connected PDS host list from relay" 295 - curl -sf -u "admin:$RELAY_ADMIN_PASSWORD" http://localhost:12470/admin/pds/list \ 296 - | jq -r '.[] | select(.HasActiveConnection) | .Host' > /tmp/relay-hosts.txt 297 - TOTAL=$(curl -sf -u "admin:$RELAY_ADMIN_PASSWORD" http://localhost:12470/admin/pds/list | jq 'length') 298 - echo " $(wc -l < /tmp/relay-hosts.txt | tr -d ' ') connected hosts (of $TOTAL total)" 299 - echo 300 - EXTRA_ARGS+=(--hosts /tmp/relay-hosts.txt) 301 - else 302 - sleep 3 303 - fi 304 - 305 - ./scripts/backfill \ 306 - --token "$COLLECTIONDIR_ADMIN_TOKEN" \ 307 - "${EXTRA_ARGS[@]}" 308 - 309 - # === zlay (zig relay) === 310 - 311 - export ZLAY_KUBECONFIG := justfile_directory() / "zlay-kubeconfig.yaml" 312 - 313 - # initialize zlay terraform 314 - zlay-init: 315 - terraform -chdir=infra/zlay init 316 - 317 - # create the zlay hetzner server with k3s 318 - zlay-infra: 319 - terraform -chdir=infra/zlay apply -var="hcloud_token=$HCLOUD_TOKEN" 320 - 321 - # destroy zlay infrastructure 322 - zlay-destroy: 323 - terraform -chdir=infra/zlay destroy -var="hcloud_token=$HCLOUD_TOKEN" 324 - 325 - # get the zlay server IP 326 - zlay-server-ip: 327 - @terraform -chdir=infra/zlay output -raw server_ip 328 - 329 - # ssh into the zlay server 330 - zlay-ssh: 331 - ssh root@$(just zlay-server-ip) 332 - 333 - # fetch zlay kubeconfig 334 - zlay-kubeconfig: 335 - #!/usr/bin/env bash 336 - set -euo pipefail 337 - IP=$(just zlay-server-ip) 338 - echo "fetching kubeconfig from $IP..." 339 - until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new root@$IP test -f /run/k3s-ready 2>/dev/null; do 340 - echo " waiting for k3s..." 341 - sleep 5 342 - done 343 - scp root@$IP:/etc/rancher/k3s/k3s.yaml zlay-kubeconfig.yaml 344 - if [[ "$(uname)" == "Darwin" ]]; then 345 - sed -i '' "s|127.0.0.1|$IP|g" zlay-kubeconfig.yaml 346 - else 347 - sed -i "s|127.0.0.1|$IP|g" zlay-kubeconfig.yaml 348 - fi 349 - chmod 600 zlay-kubeconfig.yaml 350 - echo "kubeconfig written to zlay-kubeconfig.yaml" 351 - KUBECONFIG=zlay-kubeconfig.yaml kubectl get nodes 352 - 353 - # build and push zlay image via docker (slow on mac — prefer zlay-publish-remote) 354 - zlay-publish-docker: 355 - #!/usr/bin/env bash 356 - set -euo pipefail 357 - TMPDIR=$(mktemp -d) 358 - trap "rm -rf $TMPDIR" EXIT 359 - git clone --depth 1 https://tangled.org/zzstoatzz.io/zlay "$TMPDIR" 360 - cd "$TMPDIR" 361 - TAG=$(git rev-parse --short HEAD) 362 - IMAGE="atcr.io/zzstoatzz.io/zlay:${TAG}" 363 - docker build --platform linux/amd64 -t "${IMAGE}" . 364 - ATCR_AUTO_AUTH=1 docker push "${IMAGE}" 365 - echo "==> pushed ${IMAGE}" 366 - 367 - # build zlay on the server and import into k3s containerd (fast — native x86_64 build) 368 - # usage: just zlay-publish-remote (debug build) 369 - # just zlay-publish-remote ReleaseSafe (optimized — needs 8 MiB stacks) 370 - zlay-publish-remote optimize="": 371 - #!/usr/bin/env bash 372 - set -euo pipefail 373 - ssh root@$(just zlay-server-ip) <<'DEPLOY' 374 - set -euo pipefail 375 - cd /opt/zlay 376 - git pull --ff-only 377 - 378 - TAG=$(git rev-parse --short HEAD) 379 - IMAGE="atcr.io/zzstoatzz.io/zlay:{{ if optimize != "" { optimize + "-" } else { "debug-" } }}${TAG}" 380 - 381 - echo "==> building binary (${TAG}{{ if optimize != "" { ", " + optimize } else { ", debug" } }})" 382 - zig build {{ if optimize != "" { "-Doptimize=" + optimize + " " } else { "" } }}-Dtarget=x86_64-linux-gnu 383 - 384 - echo "==> building container image (${IMAGE})" 385 - buildah bud -t "${IMAGE}" -f Dockerfile.runtime . 386 - 387 - echo "==> importing into k3s containerd" 388 - buildah push "${IMAGE}" docker-archive:/tmp/zlay.tar:"${IMAGE}" 389 - ctr -n k8s.io images import /tmp/zlay.tar 390 - rm -f /tmp/zlay.tar 391 - 392 - echo "==> updating deployment image" 393 - kubectl set image deployment/zlay -n zlay main="${IMAGE}" 394 - kubectl rollout status deployment/zlay -n zlay --timeout=120s 395 - 396 - echo "==> deployed ${IMAGE}" 397 - DEPLOY 398 - 399 - # deploy zlay to its k3s cluster 400 - zlay-deploy: helm-repos 401 - #!/usr/bin/env bash 402 - set -euo pipefail 403 - export KUBECONFIG="$ZLAY_KUBECONFIG" 404 - 405 - : "${ZLAY_DOMAIN:?set ZLAY_DOMAIN}" 406 - : "${ZLAY_POSTGRES_PASSWORD:?set ZLAY_POSTGRES_PASSWORD}" 407 - : "${LETSENCRYPT_EMAIL:?set LETSENCRYPT_EMAIL}" 408 - ZLAY_ADMIN_PASSWORD="${ZLAY_ADMIN_PASSWORD:-}" 409 - 410 - echo "==> creating namespace" 411 - kubectl create namespace zlay --dry-run=client -o yaml | kubectl apply -f - 412 - 413 - echo "==> installing cert-manager" 414 - helm upgrade --install cert-manager jetstack/cert-manager \ 415 - --namespace cert-manager --create-namespace \ 416 - --set crds.enabled=true \ 417 - --wait 418 - 419 - echo "==> applying cluster issuer" 420 - sed "s|you@example.com|$LETSENCRYPT_EMAIL|g" deploy/cluster-issuer.yaml \ 421 - | kubectl apply -f - 422 - 423 - echo "==> installing postgresql" 424 - helm upgrade --install zlay-db bitnami/postgresql \ 425 - --namespace zlay \ 426 - --values deploy/postgres-values.yaml \ 427 - --set auth.password="$ZLAY_POSTGRES_PASSWORD" \ 428 - --wait 429 - 430 - echo "==> creating zlay secret" 431 - kubectl create secret generic zlay-secret \ 432 - --namespace zlay \ 433 - --from-literal=DATABASE_URL="postgres://relay:${ZLAY_POSTGRES_PASSWORD}@zlay-db-postgresql.zlay.svc.cluster.local:5432/relay" \ 434 - --from-literal=RELAY_ADMIN_PASSWORD="$ZLAY_ADMIN_PASSWORD" \ 435 - --dry-run=client -o yaml | kubectl apply -f - 436 - 437 - echo "==> installing zlay" 438 - helm upgrade --install zlay bjw-s/app-template \ 439 - --namespace zlay \ 440 - --values deploy/zlay-values.yaml \ 441 - --wait --timeout 5m 442 - 443 - echo "==> applying ingress" 444 - sed "s|ZLAY_DOMAIN_PLACEHOLDER|$ZLAY_DOMAIN|g" deploy/zlay-ingress.yaml \ 445 - | kubectl apply -f - 446 - 447 - echo "==> installing monitoring" 448 - ZLAY_METRICS_DOMAIN="${ZLAY_METRICS_DOMAIN:-zlay-metrics.waow.tech}" 449 - kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 450 - kubectl create configmap zlay-dashboard \ 451 - --namespace monitoring \ 452 - --from-file=zlay-dashboard.json=deploy/zlay-dashboard.json \ 453 - --dry-run=client -o yaml | kubectl apply -f - 454 - helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 455 - --namespace monitoring \ 456 - --values deploy/zlay-monitoring-values.yaml \ 457 - --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 458 - --wait --timeout 5m 459 - kubectl apply -f deploy/zlay-servicemonitor.yaml 460 - 461 - echo "==> applying grafana ingress" 462 - sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$ZLAY_METRICS_DOMAIN|g" deploy/grafana-ingress.yaml \ 463 - | kubectl apply -f - 464 - 465 - echo "" 466 - echo "done. point DNS:" 467 - echo " $ZLAY_DOMAIN -> $(just zlay-server-ip)" 468 - echo " $ZLAY_METRICS_DOMAIN -> $(just zlay-server-ip)" 469 - echo "then check:" 470 - echo " curl https://$ZLAY_DOMAIN/_health" 471 - 472 - # check zlay status 473 - zlay-status: 474 - #!/usr/bin/env bash 475 - export KUBECONFIG="$ZLAY_KUBECONFIG" 476 - echo "==> nodes" 477 - kubectl get nodes 478 - echo "" 479 - echo "==> pods" 480 - kubectl get pods -n zlay 481 - echo "" 482 - echo "==> health" 483 - curl -sf "https://$ZLAY_DOMAIN/_health" | jq . || echo "(zlay not ready)" 484 - 485 - # tail zlay logs 486 - zlay-logs: 487 - KUBECONFIG="$ZLAY_KUBECONFIG" kubectl logs -n zlay deploy/zlay -f 488 - 489 - # check zlay health via public endpoint 490 - zlay-health: 491 - #!/usr/bin/env bash 492 - : "${ZLAY_DOMAIN:?set ZLAY_DOMAIN}" 493 - curl -sf "https://$ZLAY_DOMAIN/_health" | jq .

+205

zlay/justfile

··· 1 + # zlay (zig) relay deployment 2 + # required env vars: HCLOUD_TOKEN, ZLAY_DOMAIN, ZLAY_POSTGRES_PASSWORD, LETSENCRYPT_EMAIL 3 + # optional env vars: ZLAY_ADMIN_PASSWORD, ZLAY_METRICS_DOMAIN (default: zlay-metrics.waow.tech), GRAFANA_ADMIN_PASSWORD 4 + 5 + export KUBECONFIG := source_directory() / "kubeconfig.yaml" 6 + 7 + # --- infrastructure --- 8 + 9 + # initialize terraform 10 + init: 11 + terraform -chdir=infra init 12 + 13 + # create the hetzner server with k3s 14 + infra: 15 + terraform -chdir=infra apply -var="hcloud_token=$HCLOUD_TOKEN" 16 + 17 + # destroy all infrastructure 18 + destroy: 19 + terraform -chdir=infra destroy -var="hcloud_token=$HCLOUD_TOKEN" 20 + 21 + # get the server IP from terraform 22 + server-ip: 23 + @terraform -chdir=infra output -raw server_ip 24 + 25 + # ssh into the server 26 + ssh: 27 + ssh root@$(just server-ip) 28 + 29 + # --- cluster access --- 30 + 31 + # fetch kubeconfig from the server (run after cloud-init finishes, ~2 min) 32 + kubeconfig: 33 + #!/usr/bin/env bash 34 + set -euo pipefail 35 + IP=$(just server-ip) 36 + echo "fetching kubeconfig from $IP..." 37 + until ssh -o ConnectTimeout=5 -o StrictHostKeyChecking=accept-new root@$IP test -f /run/k3s-ready 2>/dev/null; do 38 + echo " waiting for k3s..." 39 + sleep 5 40 + done 41 + scp root@$IP:/etc/rancher/k3s/k3s.yaml kubeconfig.yaml 42 + if [[ "$(uname)" == "Darwin" ]]; then 43 + sed -i '' "s|127.0.0.1|$IP|g" kubeconfig.yaml 44 + else 45 + sed -i "s|127.0.0.1|$IP|g" kubeconfig.yaml 46 + fi 47 + chmod 600 kubeconfig.yaml 48 + echo "kubeconfig written to kubeconfig.yaml" 49 + kubectl get nodes 50 + 51 + # --- deployment --- 52 + 53 + # deploy zlay to its k3s cluster 54 + deploy: 55 + #!/usr/bin/env bash 56 + set -euo pipefail 57 + 58 + helm repo add bjw-s https://bjw-s-labs.github.io/helm-charts 59 + helm repo add bitnami https://charts.bitnami.com/bitnami 60 + helm repo add jetstack https://charts.jetstack.io 61 + helm repo add prometheus-community https://prometheus-community.github.io/helm-charts 62 + helm repo update 63 + 64 + : "${ZLAY_DOMAIN:?set ZLAY_DOMAIN}" 65 + : "${ZLAY_POSTGRES_PASSWORD:?set ZLAY_POSTGRES_PASSWORD}" 66 + : "${LETSENCRYPT_EMAIL:?set LETSENCRYPT_EMAIL}" 67 + ZLAY_ADMIN_PASSWORD="${ZLAY_ADMIN_PASSWORD:-}" 68 + 69 + echo "==> creating namespace" 70 + kubectl create namespace zlay --dry-run=client -o yaml | kubectl apply -f - 71 + 72 + echo "==> installing cert-manager" 73 + helm upgrade --install cert-manager jetstack/cert-manager \ 74 + --namespace cert-manager --create-namespace \ 75 + --set crds.enabled=true \ 76 + --wait 77 + 78 + echo "==> applying cluster issuer" 79 + sed "s|you@example.com|$LETSENCRYPT_EMAIL|g" ../shared/deploy/cluster-issuer.yaml \ 80 + | kubectl apply -f - 81 + 82 + echo "==> installing postgresql" 83 + helm upgrade --install zlay-db bitnami/postgresql \ 84 + --namespace zlay \ 85 + --values ../shared/deploy/postgres-values.yaml \ 86 + --set auth.password="$ZLAY_POSTGRES_PASSWORD" \ 87 + --wait 88 + 89 + echo "==> creating zlay secret" 90 + kubectl create secret generic zlay-secret \ 91 + --namespace zlay \ 92 + --from-literal=DATABASE_URL="postgres://relay:${ZLAY_POSTGRES_PASSWORD}@zlay-db-postgresql.zlay.svc.cluster.local:5432/relay" \ 93 + --from-literal=RELAY_ADMIN_PASSWORD="$ZLAY_ADMIN_PASSWORD" \ 94 + --dry-run=client -o yaml | kubectl apply -f - 95 + 96 + echo "==> installing zlay" 97 + helm upgrade --install zlay bjw-s/app-template \ 98 + --namespace zlay \ 99 + --values deploy/zlay-values.yaml \ 100 + --wait --timeout 5m 101 + 102 + echo "==> applying ingress" 103 + sed "s|ZLAY_DOMAIN_PLACEHOLDER|$ZLAY_DOMAIN|g" deploy/zlay-ingress.yaml \ 104 + | kubectl apply -f - 105 + 106 + echo "==> installing monitoring" 107 + ZLAY_METRICS_DOMAIN="${ZLAY_METRICS_DOMAIN:-zlay-metrics.waow.tech}" 108 + kubectl create namespace monitoring --dry-run=client -o yaml | kubectl apply -f - 109 + kubectl create configmap zlay-dashboard \ 110 + --namespace monitoring \ 111 + --from-file=zlay-dashboard.json=deploy/zlay-dashboard.json \ 112 + --dry-run=client -o yaml | kubectl apply -f - 113 + helm upgrade --install kube-prometheus-stack prometheus-community/kube-prometheus-stack \ 114 + --namespace monitoring \ 115 + --values deploy/zlay-monitoring-values.yaml \ 116 + --set grafana.adminPassword="${GRAFANA_ADMIN_PASSWORD:-prom-operator}" \ 117 + --wait --timeout 5m 118 + kubectl apply -f deploy/zlay-servicemonitor.yaml 119 + 120 + echo "==> applying grafana ingress" 121 + sed "s|GRAFANA_DOMAIN_PLACEHOLDER|$ZLAY_METRICS_DOMAIN|g" ../shared/deploy/grafana-ingress.yaml \ 122 + | kubectl apply -f - 123 + 124 + echo "" 125 + echo "done. point DNS:" 126 + echo " $ZLAY_DOMAIN -> $(just server-ip)" 127 + echo " $ZLAY_METRICS_DOMAIN -> $(just server-ip)" 128 + echo "then check:" 129 + echo " curl https://$ZLAY_DOMAIN/_health" 130 + 131 + # --- images --- 132 + 133 + # build and push zlay image via docker (slow on mac — prefer publish-remote) 134 + publish-docker: 135 + #!/usr/bin/env bash 136 + set -euo pipefail 137 + TMPDIR=$(mktemp -d) 138 + trap "rm -rf $TMPDIR" EXIT 139 + git clone --depth 1 https://tangled.org/zzstoatzz.io/zlay "$TMPDIR" 140 + cd "$TMPDIR" 141 + TAG=$(git rev-parse --short HEAD) 142 + IMAGE="atcr.io/zzstoatzz.io/zlay:${TAG}" 143 + docker build --platform linux/amd64 -t "${IMAGE}" . 144 + ATCR_AUTO_AUTH=1 docker push "${IMAGE}" 145 + echo "==> pushed ${IMAGE}" 146 + 147 + # build zlay on the server and import into k3s containerd (fast — native x86_64 build) 148 + # usage: just zlay publish-remote (debug build) 149 + # just zlay publish-remote ReleaseSafe (optimized — needs 8 MiB stacks) 150 + publish-remote optimize="": 151 + #!/usr/bin/env bash 152 + set -euo pipefail 153 + ssh root@$(just server-ip) <<'DEPLOY' 154 + set -euo pipefail 155 + cd /opt/zlay 156 + git pull --ff-only 157 + 158 + TAG=$(git rev-parse --short HEAD) 159 + IMAGE="atcr.io/zzstoatzz.io/zlay:{{ if optimize != "" { optimize + "-" } else { "debug-" } }}${TAG}" 160 + 161 + echo "==> building binary (${TAG}{{ if optimize != "" { ", " + optimize } else { ", debug" } }})" 162 + zig build {{ if optimize != "" { "-Doptimize=" + optimize + " " } else { "" } }}-Dtarget=x86_64-linux-gnu 163 + 164 + echo "==> building container image (${IMAGE})" 165 + buildah bud -t "${IMAGE}" -f Dockerfile.runtime . 166 + 167 + echo "==> importing into k3s containerd" 168 + buildah push "${IMAGE}" docker-archive:/tmp/zlay.tar:"${IMAGE}" 169 + ctr -n k8s.io images import /tmp/zlay.tar 170 + rm -f /tmp/zlay.tar 171 + 172 + echo "==> updating deployment image" 173 + kubectl set image deployment/zlay -n zlay main="${IMAGE}" 174 + kubectl rollout status deployment/zlay -n zlay --timeout=120s 175 + 176 + echo "==> deployed ${IMAGE}" 177 + DEPLOY 178 + 179 + # --- status --- 180 + 181 + # check zlay status 182 + status: 183 + #!/usr/bin/env bash 184 + echo "==> nodes" 185 + kubectl get nodes 186 + echo "" 187 + echo "==> pods" 188 + kubectl get pods -n zlay 189 + echo "" 190 + echo "==> health" 191 + curl -sf "https://$ZLAY_DOMAIN/_health" | jq . || echo "(zlay not ready)" 192 + 193 + # tail zlay logs 194 + logs: 195 + kubectl logs -n zlay deploy/zlay -f 196 + 197 + # check zlay health via public endpoint 198 + health: 199 + #!/usr/bin/env bash 200 + : "${ZLAY_DOMAIN:?set ZLAY_DOMAIN}" 201 + curl -sf "https://$ZLAY_DOMAIN/_health" | jq . 202 + 203 + # get the grafana admin password from the cluster 204 + grafana-password: 205 + @kubectl get secret -n monitoring kube-prometheus-stack-grafana -o jsonpath="{.data.admin-password}" | base64 -d && echo