# SBOM Scanning ATCR supports optional Software Bill of Materials (SBOM) generation for container images stored in holds. This feature enables automated security scanning and vulnerability analysis while maintaining the decentralized architecture. ## Overview When enabled, holds automatically generate SBOMs for uploaded container images in the background. The scanning process: - **Async execution**: Scanning happens after upload completes (non-blocking) - **ORAS artifacts**: SBOMs stored as OCI Registry as Storage (ORAS) artifacts - **ATProto integration**: Scan results stored as `io.atcr.manifest` records in hold's embedded PDS - **Tool agnostic**: Results accessible via XRPC, ATProto queries, and direct blob URLs - **Opt-in**: Disabled by default, enabled per-hold via configuration ### Default Scanner: Syft ATCR uses [Anchore Syft](https://github.com/anchore/syft) for SBOM generation: - Industry-standard SBOM generator - Supports SPDX and CycloneDX formats - Comprehensive package detection (OS packages, language libraries, etc.) - Active maintenance and CVE database updates Future enhancements may include [Grype](https://github.com/anchore/grype) for vulnerability scanning and [Trivy](https://github.com/aquasecurity/trivy) for comprehensive security analysis. ## Trust Model ### Same Trust as Docker Hub SBOM scanning follows the same trust model as Docker Hub or other centralized registries: **Docker Hub model:** - Docker Hub scans your image on their infrastructure - Results stored in their database - You trust Docker Hub's scanner version and scan integrity **ATCR hold model:** - Hold scans image on their infrastructure - Results stored in hold's embedded PDS - You trust hold operator's scanner version and scan integrity The security comes from **reproducibility** and **transparency**, not storage location: - Anyone can re-scan the same digest and verify results - Multiple holds scanning the same image provide independent verification - Scanner version and scan timestamp are recorded in ATProto records ### Why Hold's PDS? Scan results are stored in the **hold's embedded PDS** rather than the user's PDS: **Advantages:** 1. **No OAuth expiry issues**: Hold owns its PDS, no service tokens needed 2. **Hold-scoped metadata**: Scanner version, scan time, hold configuration 3. **Multiple perspectives**: Different holds can scan the same image independently 4. **Simpler auth**: Hold writes directly to its own PDS 5. **Keeps user PDS lean**: Potentially large SBOM data doesn't bloat user's repo **Security properties:** - Same trust level as trusting hold to serve correct blobs - DID signatures prove which hold generated the SBOM - Reproducible scans enable independent verification - Multiple holds scanning same digest → compare results for tampering detection ## ORAS Manifest Format SBOMs are stored as ORAS artifacts that reference their subject image using the OCI referrers specification. ### Example Manifest Record ```json { "$type": "io.atcr.manifest", "repository": "alice/myapp", "digest": "sha256:4a5e...", "holdDid": "did:web:hold01.atcr.io", "holdEndpoint": "https://hold01.atcr.io", "schemaVersion": 2, "mediaType": "application/vnd.oci.image.manifest.v1+json", "artifactType": "application/spdx+json", "subject": { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:abc123...", "size": 1234 }, "config": { "mediaType": "application/vnd.oci.empty.v1+json", "digest": "sha256:44136f...", "size": 2 }, "layers": [ { "mediaType": "application/spdx+json", "digest": "sha256:def456...", "size": 5678, "annotations": { "org.opencontainers.image.title": "sbom.spdx.json" } } ], "manifestBlob": { "$type": "blob", "ref": { "$link": "bafyrei..." }, "mimeType": "application/vnd.oci.image.manifest.v1+json", "size": 789 }, "ownerDid": "did:plc:alice123", "scannedAt": "2025-10-20T12:34:56.789Z", "scannerVersion": "syft-v1.0.0", "createdAt": "2025-10-20T12:34:56.789Z" } ``` ### Key Fields - `artifactType`: Distinguishes SBOM artifact from regular image manifest - `application/spdx+json` for SPDX format - `application/vnd.cyclonedx+json` for CycloneDX format - `subject`: Reference to the original image manifest - `ownerDid`: DID of the image owner (for multi-tenant holds) - `scannedAt`: ISO 8601 timestamp of when scan completed - `scannerVersion`: Tool version for reproducibility tracking ### SBOM Blob The actual SBOM document is stored as a blob in the hold's storage backend and referenced in the manifest's `layers` array. The blob contains the full SPDX or CycloneDX JSON document. ## Configuration SBOM scanning is configured via environment variables on the hold service. ### Environment Variables ```bash # Enable SBOM scanning (opt-in) HOLD_SBOM_ENABLED=true # Number of concurrent scan workers (default: 2) # Higher values = faster scanning, more CPU/memory usage HOLD_SBOM_WORKERS=4 # SBOM output format (default: spdx-json) # Options: spdx-json, cyclonedx-json HOLD_SBOM_FORMAT=spdx-json # Future: Enable vulnerability scanning with Grype # HOLD_VULN_ENABLED=true ``` ### Example Configuration ```bash # .env.hold HOLD_PUBLIC_URL=https://hold01.atcr.io STORAGE_DRIVER=s3 S3_BUCKET=my-hold-blobs HOLD_OWNER=did:plc:xyz123 HOLD_DATABASE_PATH=/var/lib/atcr/hold.db # Enable SBOM scanning HOLD_SBOM_ENABLED=true HOLD_SBOM_WORKERS=2 HOLD_SBOM_FORMAT=spdx-json ``` ## Scanning Workflow ### 1. Upload Completes When a container image is successfully pushed to a hold: ``` 1. Client: docker push atcr.io/alice/myapp:latest 2. AppView routes blobs to hold service 3. Hold receives multipart upload via XRPC 4. Hold completes upload and stores blobs 5. Hold checks: HOLD_SBOM_ENABLED=true? 6. If yes: enqueue scan job (non-blocking) 7. Upload completes immediately ``` ### 2. Background Scanning Scan workers process jobs from the queue: ``` 1. Worker pulls job from queue 2. Extracts image layers from storage 3. Runs Syft on extracted filesystem 4. Generates SBOM in configured format 5. Uploads SBOM blob to storage 6. Creates ORAS manifest record in hold's PDS 7. Job complete ``` ### 3. Result Storage SBOM results are stored in two places: 1. **SBOM blob**: Full JSON document in hold's blob storage 2. **ORAS manifest**: Metadata record in hold's embedded PDS - Collection: `io.atcr.manifest` - Record key: SBOM manifest digest - Contains reference to subject image ## Accessing SBOMs Multiple methods for discovering and retrieving SBOM data. ### 1. XRPC Query Endpoint Query for SBOMs by image digest: ```bash # Get SBOM for a specific image curl "https://hold01.atcr.io/xrpc/io.atcr.hold.getSBOM?\ digest=sha256:abc123&\ ownerDid=did:plc:alice123&\ repository=alice/myapp" # Response: ORAS manifest JSON { "manifest": { "schemaVersion": 2, "mediaType": "application/vnd.oci.image.manifest.v1+json", "artifactType": "application/spdx+json", "subject": { "digest": "sha256:abc123...", ... }, "layers": [ { "digest": "sha256:def456...", ... } ] }, "scannedAt": "2025-10-20T12:34:56.789Z", "scannerVersion": "syft-v1.0.0" } ``` ### 2. ATProto Repository Queries Use standard ATProto XRPC to list all SBOMs: ```bash # List all SBOM manifests in hold's PDS curl "https://hold01.atcr.io/xrpc/com.atproto.repo.listRecords?\ repo=did:web:hold01.atcr.io&\ collection=io.atcr.manifest" # Filter by artifactType (requires AppView indexing) # Returns all SBOM artifacts ``` ### 3. Direct SBOM Blob Download Download the full SBOM JSON file: ```bash # Get SBOM blob CID from manifest layers[0].digest SBOM_DIGEST="sha256:def456..." # Request presigned download URL curl "https://hold01.atcr.io/xrpc/com.atproto.sync.getBlob?\ did=did:web:hold01.atcr.io&\ cid=$SBOM_DIGEST" # Response: presigned S3 URL or direct blob { "url": "https://s3.amazonaws.com/bucket/blob?signature=...", "expiresAt": "2025-10-20T12:49:56Z" } # Download SBOM JSON curl "$URL" > sbom.spdx.json ``` ### 4. ORAS CLI Integration Use the ORAS CLI to discover and pull SBOMs: ```bash # Discover referrers (SBOMs) for an image oras discover atcr.io/alice/myapp:latest # Output shows SBOM artifacts: # digest: sha256:abc123... # referrers: # - artifactType: application/spdx+json # digest: sha256:4a5e... # Pull SBOM artifact oras pull atcr.io/alice/myapp@sha256:4a5e... # Downloads sbom.spdx.json to current directory ``` ### 5. AppView Web UI (Future) Future enhancement: AppView web interface will display SBOM information on repository pages: - Link to SBOM JSON download - Vulnerability count (if Grype enabled) - Scanner version and scan timestamp - Comparison across multiple holds ## Tool Integration ### SPDX/CycloneDX Tools Any tool that understands SPDX or CycloneDX formats can consume the SBOMs: **Example tools:** - [OSV Scanner](https://github.com/google/osv-scanner) - Vulnerability scanning - [Grype](https://github.com/anchore/grype) - Vulnerability scanning - [Dependency-Track](https://dependencytrack.org/) - Software composition analysis - [SBOM Quality Score](https://github.com/eBay/sbom-scorecard) - SBOM completeness **Usage:** ```bash # Download SBOM curl "https://hold01.atcr.io/xrpc/io.atcr.hold.getSBOM?..." | \ jq -r '.manifest.layers[0].digest' | \ # ... fetch blob ... > sbom.spdx.json # Scan with OSV osv-scanner --sbom sbom.spdx.json # Scan with Grype grype sbom:./sbom.spdx.json ``` ### OCI Registry API ORAS manifests are fully OCI-compliant and discoverable via standard registry APIs: ```bash # Discover referrers for an image curl -H "Accept: application/vnd.oci.image.index.v1+json" \ "https://atcr.io/v2/alice/myapp/referrers/sha256:abc123" # Returns referrers index with SBOM manifests { "schemaVersion": 2, "mediaType": "application/vnd.oci.image.index.v1+json", "manifests": [ { "mediaType": "application/vnd.oci.image.manifest.v1+json", "digest": "sha256:4a5e...", "artifactType": "application/spdx+json" } ] } ``` ### Programmatic Access Use the ATProto SDK to query SBOMs: ```go import "github.com/bluesky-social/indigo/atproto" // List all SBOMs for a hold records, err := client.RepoListRecords(ctx, "did:web:hold01.atcr.io", "io.atcr.manifest", 100, // limit "", // cursor ) // Filter for SBOM artifacts for _, record := range records.Records { manifest := record.Value.(ManifestRecord) if manifest.ArtifactType == "application/spdx+json" { // Process SBOM manifest } } ``` ## Future Enhancements ### Vulnerability Scanning (Grype) Add vulnerability scanning to SBOM generation: ```bash # Configuration HOLD_VULN_ENABLED=true HOLD_VULN_DB_UPDATE_INTERVAL=24h # Extended manifest with vulnerability count { "artifactType": "application/spdx+json", "annotations": { "io.atcr.vuln.critical": "2", "io.atcr.vuln.high": "15", "io.atcr.vuln.medium": "42", "io.atcr.vuln.low": "8", "io.atcr.vuln.scannedWith": "grype-v0.74.0", "io.atcr.vuln.dbVersion": "2025-10-20" } } ``` ### Multi-Scanner Support (Trivy) Support multiple scanner backends: ```bash HOLD_SBOM_SCANNER=trivy # syft (default), trivy, grype HOLD_TRIVY_SCAN_TYPE=os,library,config,secret ``` ### Multi-Hold Verification Compare SBOMs from different holds for the same image: ```bash # Alice pushes to hold1 and hold2 docker push atcr.io/alice/myapp:latest # Both holds scan independently # Compare results: atcr-cli compare-sboms \ --image atcr.io/alice/myapp:latest \ --holds hold1.atcr.io,hold2.atcr.io # Output: Package count differences, version mismatches, etc. ``` ### Signature Verification (Cosign) Sign SBOMs with Sigstore Cosign: ```bash HOLD_SBOM_SIGN=true HOLD_COSIGN_KEY_PATH=/var/lib/atcr/cosign.key # SBOM artifacts get signed # Verification: cosign verify --key cosign.pub atcr.io/alice/myapp@sha256:4a5e... ``` ## Security Considerations ### Reproducibility SBOMs should be reproducible for the same image digest: **Best practices:** - Pin scanner versions in production holds - Record scanner version in manifest annotations - Document vulnerability database versions - Re-scan periodically to catch new CVEs **Validation:** ```bash # Compare SBOMs from different holds diff <(curl hold1/sbom.json | jq -S) \ <(curl hold2/sbom.json | jq -S) # Differences indicate: # - Different scanner versions # - Different scan times (new CVEs discovered) # - Potential tampering (investigate) ``` ### Multiple Hold Verification Running multiple holds provides defense in depth: 1. User pushes to hold1 (uses hold1 by default) 2. User also pushes to hold2 (backup/verification) 3. Both holds scan independently 4. Compare SBOM results: - Similar results = confidence in accuracy - Divergent results = investigate discrepancy ### Transparency Hold operators should publish scanning policies: - Scanner version and update schedule - Vulnerability database update frequency - SBOM format and schema version - Data retention policies ### Trust Anchors Users can verify scanner integrity: 1. **Scanner version**: Check `scannerVersion` field matches expected version 2. **DID signature**: ATProto record signed by hold's DID 3. **Timestamp**: Check `scannedAt` for stale scans 4. **Reproducibility**: Re-scan locally and compare results ## Example Workflows ### Enable Scanning on Your Hold ```bash # 1. Configure hold with SBOM enabled cat > .env.hold <