A container registry that uses the AT Protocol for manifest storage and S3 for blob storage. atcr.io
docker container atproto go

ATCR Troubleshooting Guide#

This document provides troubleshooting guidance for common ATCR deployment and operational issues.

OAuth Authentication Failures#

JWT Timestamp Validation Errors#

Symptom:

error: invalid_client
error_description: Validation of "client_assertion" failed: "iat" claim timestamp check failed (it should be in the past)

Root Cause: The AppView server's system clock is ahead of the PDS server's clock. When the AppView generates a JWT for OAuth client authentication (confidential client mode), the "iat" (issued at) claim appears to be in the future from the PDS's perspective.

Diagnosis:

  1. Check AppView system time:
date -u
timedatectl status
  1. Check if NTP is active and synchronized:
timedatectl show-timesync --all
  1. Compare AppView time with PDS time (if accessible):
# On AppView
date +%s

# On PDS (or via HTTP headers)
curl -I https://your-pds.example.com | grep -i date
  1. Check AppView logs for clock information (logged at startup):
docker logs atcr-appview 2>&1 | grep "Configured confidential OAuth client"

Example log output:

level=INFO msg="Configured confidential OAuth client"
  key_id=did:key:z...
  system_time_unix=1731844215
  system_time_rfc3339=2025-11-17T14:30:15Z
  timezone=UTC

Solution:

  1. Enable NTP synchronization (recommended):

    On most Linux systems using systemd:

    # Enable and start systemd-timesyncd
    sudo timedatectl set-ntp true
    
    # Verify NTP is active
    timedatectl status
    

    Expected output:

    System clock synchronized: yes
    NTP service: active
    
  2. Alternative: Use chrony (if systemd-timesyncd is not available):

    # Install chrony
    sudo apt-get install chrony  # Debian/Ubuntu
    sudo yum install chrony       # RHEL/CentOS
    
    # Enable and start chronyd
    sudo systemctl enable chronyd
    sudo systemctl start chronyd
    
    # Check sync status
    chronyc tracking
    
  3. Force immediate sync:

    # systemd-timesyncd
    sudo systemctl restart systemd-timesyncd
    
    # Or with chrony
    sudo chronyc makestep
    
  4. In Docker/Kubernetes environments:

    The container inherits the host's system clock, so fix NTP on the host machine:

    # On Docker host
    sudo timedatectl set-ntp true
    
    # Restart AppView container to pick up correct time
    docker restart atcr-appview
    
  5. Verify clock skew is resolved:

    # Should show clock offset < 1 second
    timedatectl timesync-status
    

Acceptable Clock Skew:

  • Most OAuth implementations tolerate ±30-60 seconds of clock skew
  • DPoP proof validation is typically stricter (±10 seconds)
  • Aim for < 1 second skew for reliable operation

Prevention:

  • Configure NTP synchronization in your infrastructure-as-code (Terraform, Ansible, etc.)
  • Monitor clock skew in production (e.g., Prometheus node_exporter includes clock metrics)
  • Use managed container platforms (ECS, GKE, AKS) that handle NTP automatically

DPoP Nonce Mismatch Errors#

Symptom:

error: use_dpop_nonce
error_description: DPoP "nonce" mismatch

Repeated multiple times, potentially followed by:

error: server_error
error_description: Server error

Root Cause: DPoP (Demonstrating Proof-of-Possession) requires a server-provided nonce for replay protection. These errors typically occur when:

  1. Multiple concurrent requests create a DPoP nonce race condition
  2. Clock skew causes DPoP proof timestamps to fail validation
  3. PDS session state becomes corrupted after repeated failures

Diagnosis:

  1. Check if errors occur during concurrent operations:
# During docker push with multiple layers
docker logs atcr-appview 2>&1 | grep "use_dpop_nonce" | wc -l
  1. Check for clock skew (see section above):
timedatectl status
  1. Look for session lock acquisition in logs:
docker logs atcr-appview 2>&1 | grep "Acquired session lock"

Solution:

  1. If caused by clock skew: Fix NTP synchronization (see section above)

  2. If caused by session corruption:

    # The AppView will automatically delete corrupted sessions
    # User just needs to re-authenticate
    docker login atcr.io
    
  3. If persistent despite clock sync:

    • Check PDS health and logs (may be a PDS-side issue)
    • Verify network connectivity between AppView and PDS
    • Check if PDS supports latest OAuth/DPoP specifications

What ATCR does automatically:

  • Per-DID locking prevents concurrent DPoP nonce races
  • Indigo library automatically retries with fresh nonces
  • Sessions are auto-deleted after repeated failures
  • Service token cache prevents excessive PDS requests

Prevention:

  • Ensure reliable NTP synchronization
  • Use a stable, well-maintained PDS implementation
  • Monitor AppView error rates for DPoP-related issues

OAuth Session Not Found#

Symptom:

error: failed to get OAuth session: no session found for DID

Root Cause:

  • User has never authenticated via OAuth
  • OAuth session was deleted due to corruption or expiry
  • Database migration cleared sessions

Solution:

  1. User re-authenticates via OAuth flow:

    docker login atcr.io
    # Or for web UI: visit https://atcr.io/login
    
  2. If using app passwords (legacy), verify token is cached:

    # Check if app-password token exists
    docker logout atcr.io
    docker login atcr.io -u your.handle -p your-app-password
    

AppView Deployment Issues#

Client Metadata URL Not Accessible#

Symptom:

error: unauthorized_client
error_description: Client metadata endpoint returned 404

Root Cause: PDS cannot fetch OAuth client metadata from {ATCR_BASE_URL}/client-metadata.json

Diagnosis:

  1. Verify client metadata endpoint is accessible:

    curl https://your-atcr-instance.com/client-metadata.json
    
  2. Check AppView logs for startup errors:

    docker logs atcr-appview 2>&1 | grep "client-metadata"
    
  3. Verify ATCR_BASE_URL is set correctly:

    echo $ATCR_BASE_URL
    

Solution:

  1. Ensure ATCR_BASE_URL matches your public URL:

    export ATCR_BASE_URL=https://atcr.example.com
    
  2. Verify reverse proxy (nginx, Caddy, etc.) routes /.well-known/* and /client-metadata.json:

    location / {
        proxy_pass http://localhost:5000;
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
    }
    
  3. Check firewall rules allow inbound HTTPS:

    sudo ufw status
    sudo iptables -L -n | grep 443
    

Hold Service Issues#

Blob Storage Connectivity#

Symptom:

error: failed to upload blob: connection refused

Diagnosis:

  1. Check hold service logs:

    docker logs atcr-hold 2>&1 | grep -i error
    
  2. Verify S3 credentials are correct:

    # Test S3 access
    aws s3 ls s3://your-bucket --endpoint-url=$S3_ENDPOINT
    
  3. Check hold configuration:

    env | grep -E "(S3_|AWS_|STORAGE_)"
    

Solution:

  1. Verify environment variables in hold service:

    export AWS_ACCESS_KEY_ID=your-key
    export AWS_SECRET_ACCESS_KEY=your-secret
    export S3_BUCKET=your-bucket
    export S3_ENDPOINT=https://s3.us-west-2.amazonaws.com
    
  2. Test S3 connectivity from hold container:

    docker exec atcr-hold curl -v $S3_ENDPOINT
    
  3. Check S3 bucket permissions (requires PutObject, GetObject, DeleteObject)


Performance Issues#

High Database Lock Contention#

Symptom: Slow Docker push/pull operations, high CPU usage on AppView

Diagnosis:

  1. Check SQLite database size:

    ls -lh /var/lib/atcr/ui.db
    
  2. Look for long-running queries:

    docker logs atcr-appview 2>&1 | grep "database is locked"
    

Solution:

  1. For production, migrate to PostgreSQL (recommended):

    export ATCR_UI_DATABASE_TYPE=postgres
    export ATCR_UI_DATABASE_URL=postgresql://user:pass@localhost/atcr
    
  2. Or increase SQLite busy timeout:

    // In code: db.SetMaxOpenConns(1) for SQLite
    
  3. Vacuum the database to reclaim space:

    sqlite3 /var/lib/atcr/ui.db "VACUUM;"
    

Logging and Debugging#

Enable Debug Logging#

Set log level to debug for detailed troubleshooting:

export ATCR_LOG_LEVEL=debug
docker restart atcr-appview

Useful Log Queries#

OAuth token exchange errors:

docker logs atcr-appview 2>&1 | grep "OAuth callback failed"

Service token request failures:

docker logs atcr-appview 2>&1 | grep "OAuth authentication failed during service token request"

Clock diagnostics:

docker logs atcr-appview 2>&1 | grep "system_time"

DPoP nonce issues:

docker logs atcr-appview 2>&1 | grep -E "(use_dpop_nonce|DPoP)"

Health Checks#

AppView health:

curl http://localhost:5000/v2/
# Should return: {"errors":[{"code":"UNAUTHORIZED",...}]}

Hold service health:

curl http://localhost:8080/.well-known/did.json
# Should return DID document

Getting Help#

If issues persist after following this guide:

  1. Check GitHub Issues: https://github.com/ericvolp12/atcr/issues
  2. Collect logs: Include output from docker logs for AppView and Hold services
  3. Include diagnostics:
    • timedatectl status output
    • AppView version: docker exec atcr-appview cat /VERSION (if available)
    • PDS version and implementation (Bluesky PDS, other)
  4. File an issue with reproducible steps

Common Error Reference#

Error Code Component Common Cause Fix
invalid_client (iat timestamp) OAuth Clock skew Enable NTP sync
use_dpop_nonce OAuth/DPoP Concurrent requests or clock skew Fix NTP, wait for auto-retry
server_error (500) PDS PDS internal error Check PDS logs
invalid_grant OAuth Expired auth code Retry OAuth flow
unauthorized_client OAuth Client metadata unreachable Check ATCR_BASE_URL and firewall
RecordNotFound ATProto Manifest doesn't exist Verify repository name
Connection refused Hold/S3 Network/credentials Check S3 config and connectivity